Guerrilla Analytics - 1st Edition - ISBN: 9780128002186, 9780128005033

Guerrilla Analytics

1st Edition

A Practical Approach to Working with Data

Authors: Enda Ridge
eBook ISBN: 9780128005033
Paperback ISBN: 9780128002186
Imprint: Morgan Kaufmann
Published Date: 15th September 2014
Page Count: 276
Sales tax will be calculated at check-out Price includes VAT/GST
25% off
25% off
25% off
25% off
25% off
20% off
20% off
25% off
25% off
25% off
25% off
25% off
20% off
20% off
25% off
25% off
25% off
25% off
25% off
20% off
20% off
34.95
26.21
26.21
26.21
26.21
26.21
27.96
27.96
27.99
20.99
20.99
20.99
20.99
20.99
22.39
22.39
44.95
33.71
33.71
33.71
33.71
33.71
35.96
35.96
Unavailable
Price includes VAT/GST
× DRM-Free

Easy - Download and start reading immediately. There’s no activation process to access eBooks; all eBooks are fully searchable, and enabled for copying, pasting, and printing.

Flexible - Read on multiple operating systems and devices. Easily read eBooks on smart phones, computers, or any eBook readers, including Kindle.

Open - Buy once, receive and download all available eBook formats, including PDF, EPUB, and Mobi (for Kindle).

Institutional Access

Secure Checkout

Personal information is secured with SSL technology.

Free Shipping

Free global shipping
No minimum order.

Description

Doing data science is difficult. Projects are typically very dynamic with requirements that change as data understanding grows. The data itself arrives piecemeal, is added to, replaced, contains undiscovered flaws and comes from a variety of sources. Teams also have mixed skill sets and tooling is often limited. Despite these disruptions, a data science team must get off the ground fast and begin demonstrating value with traceable, tested work products. This is when you need Guerrilla Analytics.

 In this book, you will learn about:

The Guerrilla Analytics Principles: simple rules of thumb for maintaining data provenance across the entire analytics life cycle from data extraction, through analysis to reporting.

Reproducible, traceable analytics: how to design and implement work products that are reproducible, testable and stand up to external scrutiny.

Practice tips and war stories: 90 practice tips and 16 war stories based on real-world project challenges encountered in consulting, pre-sales and research.

Preparing for battle: how to set up your team's analytics environment in terms of tooling, skill sets, workflows and conventions.

Data gymnastics: over a dozen analytics patterns that your team will encounter again and again in projects

Key Features

  • The Guerrilla Analytics Principles: simple rules of thumb for maintaining data provenance across the entire analytics life cycle from data extraction, through analysis to reporting
  • Reproducible, traceable analytics: how to design and implement work products that are reproducible, testable and stand up to external scrutiny
  • Practice tips and war stories: 90 practice tips and 16 war stories based on real-world project challenges encountered in consulting, pre-sales and research
  • Preparing for battle: how to set up your team's analytics environment in terms of tooling, skill sets, workflows and conventions
  • Data gymnastics: over a dozen analytics patterns that your team will encounter again and again in projects

Readership

Data Analytics consultants and contractors, Industry Data Analysts in internal Business Intelligence roles, Data Analytics Managers; (business) students and academics studying data analytics

Table of Contents

  • Preface
  • Part 1: Principles
    • Chapter 1: Introducing Guerrilla Analytics
      • Summary
      • 1.1. What is data analytics?
      • 1.2. Types of data analytics projects
      • 1.3. Introducing Guerrilla Analytics projects
      • 1.4. Guerrilla Analytics definition
      • 1.5. Example Guerrilla Analytics projects
      • 1.6. Some terminology
      • 1.7. Wrap up
    • Chapter 2: Guerrilla Analytics: Challenges and Risks
      • Summary
      • 2.1. The Guerrilla Analytics workflow
      • 2.2. Challenges of managing analytics projects
      • 2.3. Risks
      • 2.4. Impact of failure to address analytics risks
      • 2.5. Wrap up
    • Chapter 3: Guerrilla Analytics Principles
      • Summary
      • 3.1. Maintain data provenance despite disruptions
      • 3.2. The principles
      • 3.3. Applying the principles
      • 3.4. Wrap up
  • Part 2: Practice
    • Chapter 4: Stage 1: Data Extraction
      • Summary
      • 4.1. Guerrilla Analytics workflow
      • 4.2. Pitfalls and risks
      • 4.3. Practice tip 1: freeze the source system during data extraction
      • 4.4. Practice tip 2: extract data into an agreed file format
      • 4.5. Practice tip 3: calculate checksums before data extraction
      • 4.6. Practice tip 4: capture front-end reports
      • 4.7. Practice tip 5: save raw copies of web pages
      • 4.8. Practice tip 6: consistency check OCR data
      • 4.9. Wrap up
    • Chapter 5: Stage 2: Data Receipt
      • Summary
      • 5.1. Guerrilla Analytics workflow
      • 5.2. Pitfalls and risks
      • 5.3. Practice tip 7: have a single location for all data received
      • 5.4. Practice tip 8: create unique identifiers for received data
      • 5.5. Practice tip 9: store data tracking information in a data log
      • 5.6. Practice tip 10: never modify raw data files
      • 5.7. Practice tip 11: keep supporting material near the data
      • 5.8. Practice tip 12: version-control data received
      • 5.9. Bringing it all together
      • 5.10. Wrap up
    • Chapter 6: Stage 3: Data Load
      • Summary
      • 6.1. Guerrilla Analytics Workflow
      • 6.2. Pitfalls and risks
      • 6.3. Practice tip 13: minimize modifications to data before load
      • 6.4. Practice tip 14: do data load preparations on a copy of raw data files
      • 6.5. Practice tip 15: add identifiers to raw data before loading
      • 6.6. Practice tip 16: prefer one-to-one Data Loads
      • 6.7. Practice tip 17: preserve the raw file name and data UID
      • 6.8. Practice tip 18: load data as plain text
      • 6.9. Common challenges
      • 6.10. Wrap up
    • Chapter 7: Stage 4: Analytics Coding for Ease of Review
      • Summary
      • 7.1. Guerrilla Analytics workflow
      • 7.2. Pitfalls and risks
      • 7.3. Practice tip 19: use one code file per data output
      • 7.4. Practice tip 20: produce clearly identifiable data outputs
      • 7.5. Practice tip 21: write code that runs from start to finish
      • 7.6. Practice tip 22: favor code that is not embedded in proprietary file formats
      • 7.7. Practice tip 23: clearly label the running order of code files
      • 7.8. Practice tip 24: drop all datasets at the start of code execution
      • 7.9. Practice tip 25: break up data flows into “data steps”
      • 7.10. Practice tip 26: don’t jump in and out of a code file
      • 7.11. Practice tip 27: log code execution
      • 7.12. Common Challenges
      • 7.13. Wrap up
    • Chapter 8: Stage 4: Analytics Coding to Maintain Data Provenance
      • Summary
      • 8.1. Guerrilla Analytics workflow
      • 8.2. Examples
      • 8.3. Pitfalls and risks
      • 8.4. Practice tip 28: clean data at a minimum of locations in a data flow
      • 8.5. Practice tip 29: when cleaning a data field, keep the original raw field
      • 8.6. Practice tip 30: filter data with flags, not deletions
      • 8.7. Practice tip 31: identify fields with metadata
      • 8.8. Practice tip 32: create a unique identifier for DATA records
      • 8.9. Practice tip 33: rename data fields with a field mapping
      • 8.10. Wrap up
    • Chapter 9: Stage 6: Creating Work Products
      • Summary
      • 9.1. Guerrilla Analytics workflow
      • 9.2. Examples
      • 9.3. The essence of a work product
      • 9.4. Pitfalls and risks
      • 9.5. Practice tip 34: track work products with a Unique Identifier (UID)
      • 9.6. Practice tip 35: keep work product generators and outputs close together
      • 9.7. Practice tip 36: avoid clutter in the file system
      • 9.8. Practice tip 37: avoid clutter in the DME
      • 9.9. Practice tip 38: give output data records a UID
      • 9.10. Practice tip 39: version control work products
      • 9.11. Practice tip 40: use a convention to name complex outputs
      • 9.12. Practice tip 41: log all Work Products
      • 9.13. Wrap up
    • Chapter 10: Stage 7: Reporting
      • Summary
      • 10.1. Guerrilla Analytics workflow
      • 10.2. What is a report?
      • 10.3. Why reports are complicated
      • 10.4. Report components
      • 10.5. Pitfalls and risks
      • 10.6. Practice tip 42: liaise with report writers
      • 10.7. Practice tip 43: create one work product per report component
      • 10.8. Practice tip 44: make presentation quality work products
      • 10.9. Extreme reporting
      • 10.10. Wrap up
    • Chapter 11: Stage 5: Consolidating Knowledge in Builds
      • Summary
      • 11.1. Introduction
      • 11.2. Pitfalls and risks
      • 11.3. Example: the customer address problem
      • 11.4. Sources of variation
      • 11.5. Definition of a build
      • 11.6. The customer address example using a Build
      • 11.7. Data Builds
      • 11.8. Service Builds
      • 11.9. When to start a build
      • 11.10. Wrap up
  • Part 3: Testing
    • Chapter 12: Introduction to Testing
      • Summary
      • 12.1. Guerrilla Analytics workflow
      • 12.2. What is testing?
      • 12.3. Why do testing?
      • 12.4. Areas of testing
      • 12.5. Comparing expected and actual
      • 12.6. The challenge of testing Guerrilla Analytics
      • 12.7. Practice Tip 61: establish a testing culture
      • 12.8. Practice Tip 62: test early
      • 12.9. Practice Tip 63: test often
      • 12.10. Practice Tip 64: give tests unique identifiers
      • 12.11. Practice Tip 65: organize test data by test UID
      • 12.12. Next chapters on testing
      • 12.13. Wrap up
    • Chapter 13: Testing Data
      • Summary
      • 13.1. Guerrilla Analytics workflow
      • 13.2. The five C’s of testing data
      • 13.3. Testing data completeness
      • 13.4. Testing data correctness
      • 13.5. Testing consistency
      • 13.6. Testing data coherence
      • 13.7. Testing accountability
      • 13.8. Implementing data testing
      • 13.9. Wrap up
    • Chapter 14: Testing Builds
      • Summary
      • 14.1. Structure of a data build
      • 14.2. An illustrative example
      • 14.3. Types of build tests
      • 14.4. Test code development
      • 14.5. Organizing build test code
      • 14.6. Organizing test data
      • 14.7. Wrap up
    • Chapter 15: Testing Work Products
      • Summary
      • 15.1. Types of testable work products
      • 15.2. Ordinary work products
      • 15.3. General tips on testing ordinary work products
      • 15.4. Testing statistical models
      • 15.5. General tips on testing models
      • 15.6. Wrap up
  • Part 4: Building Guerrilla Analytics Capability
    • Introduction
      • Chapter 16: People
        • Summary
        • 16.1. That question again – what is data analytics?
        • 16.2. Guerrilla Analytics skills
        • 16.3. Programming
        • 16.4. Substantive expertise
        • 16.5. Communication
        • 16.6. “Maths and stats”
        • 16.7. Visualization
        • 16.8. Software engineering
        • 16.9. Mindset
        • 16.10. Wrap up
      • Chapter 17: Process
        • Summary
        • 17.1. What is workflow management?
        • 17.2. Workflows in Analytics
        • 17.3. Levels of review
        • 17.4. Linking work products
        • 17.5. Classifying work products
        • 17.6. Granularity
        • 17.7. When to use workflow management
        • 17.8. Wrap up
      • Chapter 18: Technology
        • Summary
        • 18.1. Analytics capabilities
        • 18.2. Data manipulation environment
        • 18.3. Source code control
        • 18.4. Access to the command line
        • 18.5. High-level scripting language
        • 18.6. Visualization
        • 18.7. Build tool
        • 18.8. Access to the internet
        • 18.9. Encryption
        • 18.10. Code libraries for data wrangling
        • 18.11. Machine learning and statistics libraries
        • 18.12. Centralized and controlled file system
        • 18.13. Additional technology capabilities
        • 18.14. Wrap up
      • Chapter 19: Closing Remarks
        • 19.1. What was this book about?
        • 19.2. Next steps for Guerrilla Analytics
        • 19.3. Keep in touch
        • Acknowledgments
      • Appendix: Data Gymnastics
      • References
      • Index

    Details

    No. of pages:
    276
    Language:
    English
    Copyright:
    © Morgan Kaufmann 2015
    Published:
    Imprint:
    Morgan Kaufmann
    eBook ISBN:
    9780128005033
    Paperback ISBN:
    9780128002186

    About the Author

    Enda Ridge

    Enda Ridge is an accomplished data scientist whose experience spans consulting, pre-sales of analytics software and research in academia.

    He has consulted to clients in the public and private sectors including financial services, insurance, audit and IT security. Enda is an expert in agile analytics for real world projects where data and requirements change often, resources and tooling are sometimes very limited and results must be traceable and auditable for high profile stakeholders. His experience includes analytics to support the forensic investigation of a major US bankruptcy and the remediation a UK bank’s mis-selling of financial products. He has also applied machine learning and NoSQL approaches to problems in document classification, surveillance and IT access controls. His PhD used Design of Experiments techniques to methodically evaluate algorithm performance.

    Enda has authored or co-authored 12 academic research papers, is an invited contributor to edited books and has spoken at several analytics practitioner conferences.

    Enda holds a Bachelor’s degree in Mechanical Engineering and Master’s in Applied Computing from the National University of Ireland at Galway and was awarded the National University of Ireland’s Travelling Studentship in Engineering. His PhD was awarded by the University of York, UK.

    Affiliations and Expertise

    Data Scientist, London, United Kingdom

    Reviews

    "... a very pleasant read…very useful to practitioners and managers who are newly responsible for data analytics or who have had difficulty in previous projects." --Computing Reviews

    Ratings and Reviews