Sharing Data and Models in Software Engineering - 1st Edition - ISBN: 9780124172951, 9780124173071

Sharing Data and Models in Software Engineering

1st Edition

Authors: Tim Menzies Ekrem Kocaguneli Burak Turhan Leandro Minku Fayola Peters
eBook ISBN: 9780124173071
Paperback ISBN: 9780124172951
Imprint: Morgan Kaufmann
Published Date: 15th December 2014
Page Count: 406
Tax/VAT will be calculated at check-out Price includes VAT (GST)
30% off
30% off
30% off
30% off
30% off
20% off
20% off
30% off
30% off
30% off
30% off
30% off
20% off
20% off
30% off
30% off
30% off
30% off
30% off
20% off
20% off
68.95
48.27
48.27
48.27
48.27
48.27
55.16
55.16
89.95
62.97
62.97
62.97
62.97
62.97
71.96
71.96
54.99
38.49
38.49
38.49
38.49
38.49
43.99
43.99
Unavailable
Price includes VAT (GST)
× DRM-Free

Easy - Download and start reading immediately. There’s no activation process to access eBooks; all eBooks are fully searchable, and enabled for copying, pasting, and printing.

Flexible - Read on multiple operating systems and devices. Easily read eBooks on smart phones, computers, or any eBook readers, including Kindle.

Open - Buy once, receive and download all available eBook formats, including PDF, EPUB, and Mobi (for Kindle).

Institutional Access

Secure Checkout

Personal information is secured with SSL technology.

Free Shipping

Free global shipping
No minimum order.

Description

Data Science for Software Engineering: Sharing Data and Models presents guidance and procedures for reusing data and models between projects to produce results that are useful and relevant. Starting with a background section of practical lessons and warnings for beginner data scientists for software engineering, this edited volume proceeds to identify critical questions of contemporary software engineering related to data and models. Learn how to adapt data from other organizations to local problems, mine privatized data, prune spurious information, simplify complex results, how to update models for new platforms, and more. Chapters share largely applicable experimental results discussed with the blend of practitioner focused domain expertise, with commentary that highlights the methods that are most useful, and applicable to the widest range of projects. Each chapter is written by a prominent expert and offers a state-of-the-art solution to an identified problem facing data scientists in software engineering. Throughout, the editors share best practices collected from their experience training software engineering students and practitioners to master data science, and highlight the methods that are most useful, and applicable to the widest range of projects.

Key Features

  • Shares the specific experience of leading researchers and techniques developed to handle data problems in the realm of software engineering
  • Explains how to start a project of data science for software engineering as well as how to identify and avoid likely pitfalls
  • Provides a wide range of useful qualitative and quantitative principles ranging from very simple to cutting edge research
  • Addresses current challenges with software engineering data such as lack of local data, access issues due to data privacy, increasing data quality via cleaning of spurious chunks in data

Readership

Researchers, graduate software engineering students, and practitioners with an interest in data science.

Table of Contents

  • Why this book?
  • Foreword
  • List of Figures
  • Chapter 1: Introduction
    • 1.1 Why read this book?
    • 1.2 What do we mean by “sharing”?
    • 1.3 What? (our executive summary)
    • 1.4 How to read this book
    • 1.5 But what about …? (what is not in this book)
    • 1.6 Who? (about the authors)
    • 1.7 Who else? (acknowledgments)
  • Part I: Data Mining for Managers
    • Chapter 2: Rules for Managers
      • Abstract
      • 2.1 The inductive engineering manifesto
      • 2.2 More rules
    • Chapter 3: Rule #1: Talk to the Users
      • Abstract
      • 3.1 Users biases
      • 3.2 Data mining biases
      • 3.3 Can we avoid bias?
      • 3.4 Managing biases
      • 3.5 Summary
    • Chapter 4: Rule #2: Know The Domain
      • Abstract
      • 4.1 Cautionary tale #1: “discovering” random noise
      • 4.2 Cautionary tale #2: jumping at shadows
      • 4.3 Cautionary tale #3: it pays to ask
      • 4.4 Summary
    • Chapter 5: Rule #3: Suspect Your Data
      • Abstract
      • 5.1 Controlling Data Collection
      • 5.2 Problems With Controlled Data Collection
      • 5.3 Rinse (and Prune) Before Use
      • 5.4 On the Value of Pruning
      • 5.5 Summary
    • Chapter 6: Rule #4: Data Science is Cyclic
      • Abstract
      • 6.1 The Knowledge Discovery Cycle
      • 6.2 Evolving Cyclic Development
      • 6.3 Summary
  • Part II: Data Mining: A Technical Tutorial
    • Chapter 7: Data Mining and SE
      • Abstract
      • 7.1 Some Definitions
      • 7.2 Some Application Areas
    • Chapter 8: Defect Prediction
      • Abstract
      • 8.1 Defect Detection Economics
      • 8.2 Static Code Defect Prediction
    • Chapter 9: Effort Estimation
      • Abstract
      • 9.1 The Estimation Problem
      • 9.2 How To Make Estimates
    • Chapter 10: Data Mining (Under The Hood)
      • Abstract
      • 10.1 Data carving
      • 10.2 About the data
      • 10.3 Cohen pruning
      • 10.4 Discretization
      • 10.5 Column pruning
      • 10.6 Row pruning
      • 10.7 Cluster pruning
      • 10.8 Contrast pruning
      • 10.9 Goal pruning
      • 10.10 Extensions for continuous classes
  • Part III: Sharing Data
    • Chapter 11: Sharing Data: Challenges and Methods
      • Abstract
      • 11.1 Houston, We Have A Problem
      • 11.2 Good News, Everyone
    • Chapter 12: Learning Contexts
      • Abstract
      • 12.1 Background
      • 12.2 Manual Methods for Contextualization
      • 12.3 Automatic Methods
      • 12.4 Other Motivation To Find Contexts
      • 12.5 How To Find Local Regions
      • 12.6 Inside Chunk
      • 12.7 Putting It All Together
      • 12.8 Using Chunk
      • 12.9 Closing Remarks
    • Chapter 13: Cross-Company Learning: Handling The Data Drought
      • Abstract
      • 13.1 Motivation
      • 13.2 Setting the ground for analyses
      • 13.3 Analysis #1: can CC data be useful for an organization?
      • 13.4 Analysis #2: how to cleanup CC data for local tuning?
      • 13.5 Analysis #3: how much local data does an organization need for a local model?
      • 13.6 How trustworthy are these results?
      • 13.7 Are these useful in practice or just number crunching?
      • 13.8 What's new on cross-learning?
      • 13.9 What's the takeaway?
    • Chapter 14: Building Smarter Transfer Learners
      • Abstract
      • 14.1 What is actually the problem?
      • 14.2 What do we know so far?
      • 14.3 An example technology: TEAK
      • 14.4 The details of the experiments
      • 14.5 Results
      • 14.6 Discussion
      • 14.7 What are the takeaways?
    • Chapter 15: Sharing Less Data (Is a Good Thing)
      • Abstract
      • 15.1 Can We Share Less Data?
      • 15.2 Using Less Data
      • 15.3 Why Share Less Data?
      • 15.4 How To Find Less Data
      • 15.5 What's Next?
    • Chapter 16: How To Keep Your Data Private
      • Abstract
      • 16.1 Motivation
      • 16.2 What is PPDP and why is it important?
      • 16.3 What is considered a breach of privacy?
      • 16.4 How to avoid privacy breaches?
      • 16.5 How are privacy-preserving algorithms evaluated?
      • 16.6 Case study: privacy and cross-company defect prediction
    • Chapter 17: Compensating for Missing Data
      • Abstract
      • 17.1 Background notes on see and instance selection
      • 17.2 Data sets and performance measures
      • 17.3 Experimental conditions
      • 17.4 Results
      • 17.5 Summary
    • Chapter 18: Active Learning: Learning More With Less
      • Abstract
      • 18.1 How does the quick algorithm work?
      • 18.2 Notes on active learning
      • 18.3 The application and implementation details of quick
      • 18.4 How the experiments are designed
      • 18.5 Results
      • 18.6 Summary
  • Part IV: Sharing Models
    • Chapter 19: Sharing Models: Challenges and Methods
      • Abstract
    • Chapter 20: Ensembles of Learning Machines
      • Abstract
      • 20.1 When and why ensembles work
      • 20.2 Bootstrap aggregating (bagging)
      • 20.3 Regression trees (RTs) for bagging
      • 20.4 Evaluation framework
      • 20.5 Evaluation of bagging + RTs in SEE
      • 20.6 Further understanding of bagging + RTs in SEE
      • 20.7 Summary
    • Chapter 21: How to Adapt Models in a Dynamic World
      • Abstract
      • 21.1 Cross-company data and questions tackled
      • 21.2 Related work
      • 21.3 Formulation of the problem
      • 21.4 Databases
      • 21.5 Potential benefit of CC data
      • 21.6 Making better use of CC data
      • 21.7 Experimental analysis
      • 21.8 Discussion and implications
      • 21.9 Summary
    • Chapter 22: Complexity: Using Assemblies of Multiple Models
      • Abstract
      • 22.1 Ensemble of methods
      • 22.2 Solo methods and multimethods
      • 22.2.3 Experimental conditions
      • 22.3 Methodology
      • 22.4 Results
      • 22.5 Summary
    • Chapter 23: The Importance of Goals in Model-Based Reasoning
      • Abstract
      • 23.1 Introduction
      • 23.2 Value-based modeling
      • 23.3 Setting up
      • 23.4 Details
      • 23.5 An experiment
      • 23.6 Inside the models
      • 23.7 Results
      • 23.8 Discussion
    • Chapter 24: Using Goals in Model-Based Reasoning
      • Abstract
      • 24.1 Multilayer Perceptrons
      • 24.2 Multiobjective evolutionary algorithms
      • 24.3 HaD-MOEA
      • 24.4 Using MOEAs for creating see models
      • 24.5 Experimental setup
      • 24.6 The relationship among different performance measures
      • 24.7 Ensembles based on concurrent optimization of performance measures
      • 24.8 Emphasizing particular performance measures
      • 24.9 Further analysis of the model choice
      • 24.10 Comparison against other types of models
      • 24.11 Summary
    • Chapter 25: A Final Word
      • Abstract
  • Bibliography

Details

No. of pages:
406
Language:
English
Copyright:
© Morgan Kaufmann 2015
Published:
Imprint:
Morgan Kaufmann
eBook ISBN:
9780124173071
Paperback ISBN:
9780124172951

About the Author

Tim Menzies

Tim Menzies, Full Professor, CS, NC State and a former software research chair at NASA. He has published 200+ publications, many in the area of software analytics. He is an editorial board member (1) IEEE Trans on SE; (2) Automated Software Engineering journal; (3) Empirical Software Engineering Journal. His research includes artificial intelligence, data mining and search-based software engineering. He is best known for his work on the PROMISE open source repository of data for reusable software engineering experiments.

Affiliations and Expertise

Professor, Computer Science, North Carolina State University, Raleigh, NC, USA

Ekrem Kocaguneli

Ekrem Kocaguneli received his Ph.D. from the Lane Department of Computer Science and Electrical Engineering, West Virginia University. His research focuses on empirical software engineering, data/model problems associated with software estimation and tackling them with smarter machine learning algorithms.

Affiliations and Expertise

Software Development Engineer at Microsoft

Burak Turhan

Burak Turhan is a Professor of Software Engineering, University of Oulu, Finland. His research interests include empirical studies of software engineering on software quality, defect prediction, and cost estimation, as well as data mining for software engineering.

Affiliations and Expertise

Burak Turhan, Professor of Software Engineering, University of Oulu, Finland

Leandro Minku

Leandro L. Minku is a Research Fellow II at the Centre of Excellence for Research in Computational Intelligence and Applications (CERCIA), School of Computer Science, the University of Birmingham (UK). His research focuses on software prediction models, and he is the co-author of the first approach able to improve the performance of software predictors based on cross-company data over single-company data by taking into account the changeability of software prediction tasks' environments.

Affiliations and Expertise

Research Fellow II, Centre of Excellence for Research in Computational Intelligence and Applications (CERCIA), University of Birmingham, UK

Fayola Peters

Fayola Peters is a PostDoctoral Researcher at LERO, the Irish Software Engineering Research Center, University of Limerick, Ireland. Along with Mark Grechanik, she is the author of one of the two known algorithms (presented at ICSE’12) that can privatize algorithms while still preserving the data mining properties of that data.

Affiliations and Expertise

PostDoctoral Researcher at LERO, the Irish Software Engineering Research Center, University of Limerick, Ireland