The Art and Science of Analyzing Software Data

The Art and Science of Analyzing Software Data

1st Edition - August 27, 2015

Write a review

  • Editors: Christian Bird, Tim Menzies, Thomas Zimmermann
  • eBook ISBN: 9780124115439
  • Paperback ISBN: 9780124115194

Purchase options

Purchase options
DRM-free (PDF, Mobi, EPub)
Available
Sales tax will be calculated at check-out

Institutional Subscription

Free Global Shipping
No minimum order

Description

The Art and Science of Analyzing Software Data provides valuable information on analysis techniques often used to derive insight from software data. This book shares best practices in the field generated by leading data scientists, collected from their experience training software engineering students and practitioners to master data science. The book covers topics such as the analysis of security data, code reviews, app stores, log files, and user telemetry, among others. It covers a wide variety of techniques such as co-change analysis, text analysis, topic analysis, and concept analysis, as well as advanced topics such as release planning and generation of source code comments. It includes stories from the trenches from expert data scientists illustrating how to apply data analysis in industry and open source, present results to stakeholders, and drive decisions.

Key Features

  • Presents best practices, hints, and tips to analyze data and apply tools in data science projects
  • Presents research methods and case studies that have emerged over the past few years to further
    understanding of software data
  • Shares stories from the trenches of successful data science initiatives in industry

Readership

Practicing software engineers, researchers and graduate software engineering students with an interest in data science.

Table of Contents

    • List of Contributors
    • Chapter 1: Past, Present, and Future of Analyzing Software Data
      • Abstract
      • Acknowledgments
      • 1.1 Definitions
      • 1.2 The Past: Origins
      • 1.3 Present Day
      • 1.4 Conclusion
    • Part 1: Tutorial-Techniques
      • Chapter 2: Mining Patterns and Violations Using Concept Analysis
        • Abstract
        • Acknowledgments
        • 2.1 Introduction
        • 2.2 Patterns and Blocks
        • 2.3 Computing All Blocks
        • 2.4 Mining Shopping Carts with Colibri
        • 2.5 Violations
        • 2.6 Finding Violations
        • 2.7 Two Patterns or One Violation?
        • 2.8 Performance
        • 2.9 Encoding Order
        • 2.10 Inlining
        • 2.11 Related Work
        • 2.12 Conclusions
      • Chapter 3: Analyzing Text in Software Projects
        • Abstract
        • 3.1 Introduction
        • 3.2 Textual Software Project Data and Retrieval
        • 3.3 Manual Coding
        • 3.4 Automated Analysis
        • 3.5 Two Industrial Studies
        • 3.6 Summary
      • Chapter 4: Synthesizing Knowledge from Software Development Artifacts
        • Abstract
        • 4.1 Problem Statement
        • 4.2 Artifact Lifecycle Models
        • 4.3 Code Review
        • 4.4 Lifecycle Analysis
        • 4.5 Other Applications
        • 4.6 Conclusion
      • Chapter 5: A Practical Guide to Analyzing IDE Usage Data
        • Abstract
        • Acknowledgments
        • 5.1 Introduction
        • 5.2 Usage Data Research Concepts
        • 5.3 How to Collect Data
        • 5.4 How to Analyze Usage Data
        • 5.5 Limits of What You Can Learn from Usage Data
        • 5.6 Conclusion
        • 5.7 Code Listings
      • Chapter 6: Latent Dirichlet Allocation: Extracting Topics from Software Engineering Data
        • Abstract
        • 6.1 Introduction
        • 6.2 Applications of LDA in Software Analysis
        • 6.3 How LDA Works
        • 6.4 LDA Tutorial
        • 6.5 Pitfalls and Threats to Validity
        • 6.6 Conclusions
      • Chapter 7: Tools and Techniques for Analyzing Product and Process Data
        • Abstract
        • 7.1 Introduction
        • 7.2 A Rational Analysis Pipeline
        • 7.3 Source Code Analysis
        • 7.4 Compiled Code Analysis
        • 7.5 Analysis of Configuration Management Data
        • 7.6 Data Visualization
        • 7.7 Concluding Remarks
    • Part 2: Data/Problem Focussed
      • Chapter 8: Analyzing Security Data
        • Abstract
        • 8.1 Vulnerability
        • 8.2 Security Data “Gotchas”
        • 8.3 Measuring Vulnerability Severity
        • 8.4 Method of Collecting and Analyzing Vulnerability Data
        • 8.5 What Security Data has Told Us Thus Far
        • 8.6 Summary
      • Chapter 9: A Mixed Methods Approach to Mining Code Review Data: Examples and a Study of Multicommit Reviews and Pull Requests
        • Abstract
        • 9.1 Introduction
        • 9.2 Motivation for a Mixed Methods Approach
        • 9.3 Review Process and Data
        • 9.4 Quantitative Replication Study: Code Review on Branches
        • 9.5 Qualitative Approaches
        • 9.6 Triangulation
        • 9.7 Conclusion
      • Chapter 10: Mining Android Apps for Anomalies
        • Abstract
        • Acknowledgments
        • 10.1 Introduction
        • 10.2 Clustering Apps by Description
        • 10.3 Identifying Anomalies by APIs
        • 10.4 Evaluation
        • 10.5 Related Work
        • 10.6 Conclusion and Future Work
      • Chapter 11: Change Coupling Between Software Artifacts: Learning from Past Changes
        • Abstract
        • 11.1 Introduction
        • 11.2 Change Coupling
        • 11.3 Change Coupling Identification Approaches
        • 11.4 Challenges in Change Coupling Identification
        • 11.5 Change Coupling Applications
        • 11.6 Conclusion
    • Part 3: Stories from the Trenches
      • Chapter 12: Applying Software Data Analysis in Industry Contexts: When Research Meets Reality
        • Abstract
        • 12.1 Introduction
        • 12.2 Background
        • 12.3 Six Key Issues when Implementing a Measurement Program in Industry
        • 12.4 Conclusions
      • Chapter 13: Using Data to Make Decisions in Software Engineering: Providing a Method to our Madness
        • Abstract
        • 13.1 Introduction
        • 13.2 Short History of Software Engineering Metrics
        • 13.3 Establishing Clear Goals
        • 13.4 Review of Metrics
        • 13.5 Challenges with Data Analysis on Software Projects
        • 13.6 Example of Changing Product Development Through the Use of Data
        • 13.7 Driving Software Engineering Processes with Data
      • Chapter 14: Community Data for OSS Adoption Risk Management
        • Abstract
        • Acknowledgments
        • 14.1 Introduction
        • 14.2 Background
        • 14.3 An Approach to OSS Risk Adoption Management
        • 14.4 OSS Communities Structure and Behavior Analysis: The XWiki Case
        • 14.5 A Risk Assessment Example: The Moodbile Case
        • 14.6 Related Work
        • 14.7 Conclusions
      • Chapter 15: Assessing the State of Software in a Large Enterprise: A 12-Year Retrospective
        • Abstract
        • Acknowledgments
        • 15.1 Introduction
        • 15.2 Evolution of the Process and the Assessment
        • 15.3 Impact Summary of the State of Avaya Software Report
        • 15.4 Assessment Approach and Mechanisms
        • 15.5 Data Sources
        • 15.6 Examples of Analyses
        • 15.7 Software Practices
        • 15.8 Assessment Follow-up: Recommendations and Impact
        • 15.9 Impact of the Assessments
        • 15.10 Conclusions
        • 15.11 Appendix
        • Author Biographies
      • Chapter 16: Lessons Learned from Software Analytics in Practice
        • Abstract
        • 16.1 Introduction
        • 16.2 Problem Selection
        • 16.3 Data Collection
        • 16.4 Descriptive Analytics
        • 16.5 Predictive Analytics
        • 16.6 Road Ahead
    • Part 4: Advanced Topics
      • Chapter 17: Code Comment Analysis for Improving Software Quality
        • Abstract
        • 17.1 Introduction
        • 17.2 Text Analytics: Techniques, Tools, and Measures
        • 17.3 Studies of Code Comments
        • 17.4 Automated Code Comment Analysis for Specification Mining and Bug Detection
        • 17.5 Studies and Analysis of API Documentation
        • 17.6 Future Directions and Challenges
      • Chapter 18: Mining Software Logs for Goal-Driven Root Cause Analysis
        • Abstract
        • 18.1 Introduction
        • 18.2 Approaches to Root Cause Analysis
        • 18.3 Root Cause Analysis Framework Overview
        • 18.4 Modeling Diagnostics for Root Cause Analysis
        • 18.5 Log Reduction
        • 18.6 Reasoning Techniques
        • 18.7 Root Cause Analysis for Failures Induced by Internal Faults
        • 18.8 Root Cause Analysis for Failures due to External Threats
        • 18.9 Experimental Evaluations
        • 18.10 Conclusions
      • Chapter 19: Analytical Product Release Planning
        • Abstract
        • Acknowledgments
        • 19.1 Introduction and Motivation
        • 19.2 Taxonomy of Data-intensive Release Planning Problems
        • 19.3 Information Needs for Software Release Planning
        • 19.4 The Paradigm of Analytical Open Innovation
        • Analysis phase
        • Synthesize phase
        • 19.5 Analytical Release Planning—A Case Study
        • 19.6 Summary and Future Research
        • 19.7 Appendix: Feature Dependency Constraints
    • Part 5: Data Analysis at Scale (Big Data)
      • Chapter 20: Boa: An Enabling Language and Infrastructure for Ultra-Large-Scale MSR Studies
        • Abstract
        • 20.1 Objectives
        • 20.2 Getting Started with Boa
        • 20.3 Boa’s Syntax and Semantics
        • 20.4 Mining Project and Repository Metadata
        • 20.5 Mining Source Code with Visitors
        • 20.6 Guidelines for Replicable Research
        • 20.7 Conclusions
        • 20.8 Practice Problems
        • Project and Repository Metadata Problems
        • Source Code Problems
      • Chapter 21: Scalable Parallelization of Specification Mining Using Distributed Computing
        • Abstract
        • 21.1 Introduction
        • 21.2 Background
        • 21.3 Distributed Specification Mining
        • 21.4 Implementation and Empirical Evaluation
        • 21.5 Related Work
        • 21.6 Conclusion and Future Work

Product details

  • No. of pages: 672
  • Language: English
  • Copyright: © Morgan Kaufmann 2015
  • Published: August 27, 2015
  • Imprint: Morgan Kaufmann
  • eBook ISBN: 9780124115439
  • Paperback ISBN: 9780124115194

About the Editors

Christian Bird

is a researcher in the empirical software engineering group at Microsoft Research. He is primarily interested in the relationship between software design, social dynamics, and processes in large development projects. He has studied software development teams at Microsoft, IBM, and in the Open Source realm, examining the effects of distributed development, ownership policies, and the ways in which teams complete software tasks. He has published in the top Software Engineering venues and is the recipient of the ACM SIGSOFT distinguished paper award.

Affiliations and Expertise

Researcher, Microsoft Research, Redmond, VA

Tim Menzies

Tim Menzies, Full Professor, CS, NC State and a former software research chair at NASA. He has published 200+ publications, many in the area of software analytics. He is an editorial board member (1) IEEE Trans on SE; (2) Automated Software Engineering journal; (3) Empirical Software Engineering Journal. His research includes artificial intelligence, data mining and search-based software engineering. He is best known for his work on the PROMISE open source repository of data for reusable software engineering experiments.

Affiliations and Expertise

Professor, Computer Science, North Carolina State University, Raleigh, NC, USA

Thomas Zimmermann

is a researcher in the Research in Software Engineering (RiSE) group at Microsoft Research, adjunct assistant professor at the University of Calgary, and affiliate faculty at University of Washington. He is best known for his work on systematic mining of version archives and bug databases to conduct empirical studies and to build tools to support developers and managers. He received two ACM SIGSOFT Distinguished Paper Awards for his work published at the ICSE '07 and FSE '08 conferences.

Affiliations and Expertise

Researcher, Microsoft Research, Redmond, VA

Ratings and Reviews

Write a review

There are currently no reviews for "The Art and Science of Analyzing Software Data"