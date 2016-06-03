Big Data
1st Edition
Principles and Paradigms
Big Data: Principles and Paradigms captures the state-of-the-art research on the architectural aspects, technologies, and applications of Big Data. The book identifies potential future directions and technologies that facilitate insight into numerous scientific, business, and consumer applications.
To help realize Big Data’s full potential, the book addresses numerous challenges, offering the conceptual and technological solutions for tackling them. These challenges include life-cycle data management, large-scale storage, flexible processing infrastructure, data modeling, scalable machine learning, data analysis algorithms, sampling techniques, and privacy and ethical issues.
- Covers computational platforms supporting Big Data applications
- Addresses key principles underlying Big Data computing
- Examines key developments supporting next generation Big Data platforms
- Explores the challenges in Big Data computing and ways to overcome them
- Contains expert contributors from both academia and industry
Data Scientists, Data Architects, DevOps Engineers, Cloud developers and more. Graduate Data Science students and other academic researchers
Table of Contents
- List of contributors
- About the Editors
- Preface
- Organization of the Book
- Part I: Big Data Science
- Part II: Big Data Infrastructures and Platforms
- Part III: Big Data Security and Privacy
- Part IV: Big Data Applications
- Acknowledgments
- Part I: Big Data Science
- Chapter 1: BDA = ML + CC
- Abstract
- 1.1 Introduction
- 1.2 A Historical Review of Big Data
- 1.3 Historical Interpretation of Big Data
- 1.4 Defining Big Data From 3Vs to 32Vs
- 1.5 Big Data Analytics and Machine Learning
- 1.6 Big Data Analytics and Cloud Computing
- 1.7 Hadoop, HDFS, MapReduce, Spark, and Flink
- 1.8 ML + CC → BDA and Guidelines
- 1.9 Conclusion
- Chapter 2: Real-Time Analytics
- Abstract
- 2.1 Introduction
- 2.2 Computing Abstractions for Real-Time Analytics
- 2.3 Characteristics of Real-Time Systems
- 2.4 Real-Time Processing for Big Data — Concepts and Platforms
- 2.5 Data Stream Processing Platforms
- 2.6 Data Stream Analytics Platforms
- 2.7 Data Analysis and Analytic Techniques
- 2.8 Finance Domain Requirements and a Case Study
- 2.9 Future Research Challenges
- Chapter 3: Big Data Analytics for Social Media
- Abstract
- Acknowledgments
- 3.1 Introduction
- 3.2 NLP and Its Applications
- 3.3 Text Mining
- 3.4 Anomaly Detection
- Chapter 4: Deep Learning and Its Parallelization
- Abstract
- 4.1 Introduction
- 4.2 Concepts and Categories of Deep Learning
- 4.3 Parallel Optimization for Deep Learning
- 4.4 Discussions
- Chapter 5: Characterization and Traversal of Large Real-World Networks
- Abstract
- Acknowledgments
- 5.1 Introduction
- 5.2 Background
- 5.3 Characterization and Measurement
- 5.4 Efficient Complex Network Traversal
- 5.5 k-Core-Based Partitioning for Heterogeneous Graph Processing
- 5.6 Future Directions
- 5.7 Conclusions
- Part II: Big Data Infrastructures and Platforms
- Chapter 6: Database Techniques for Big Data
- Abstract
- 6.1 Introduction
- 6.2 Background
- 6.3 NoSQL Movement
- 6.4 NoSQL Solutions for Big Data Management
- 6.5 NoSQL Data Models
- 6.6 Future Directions
- 6.7 Conclusions
- Chapter 7: Resource Management in Big Data Processing Systems
- Abstract
- 7.1 Introduction
- 7.2 Types of Resource Management
- 7.3 Big Data Processing Systems and Platforms
- 7.4 Single-Resource Management in the Cloud
- 7.5 Multiresource Management in the Cloud
- 7.6 Related Work on Resource Management
- 7.7 Open Problems
- 7.8 Summary
- Chapter 8: Local Resource Consumption Shaping: A Case for MapReduce
- Abstract
- 8.1 Introduction
- 8.2 Motivation
- 8.3 Local Resource Shaper
- 8.4 Evaluation
- 8.5 Related Work
- 8.6 Conclusions
- Appendix CPU Utilization With Different Slot Configurations and LRS
- Chapter 9: System Optimization for Big Data Processing
- Abstract
- 9.1 Introduction
- 9.2 Basic Framework of the Hadoop Ecosystem
- 9.3 Parallel Computation Framework: MapReduce
- 9.4 Job Scheduling of Hadoop
- 9.5 Performance Optimization of HDFS
- 9.6 Performance Optimization of HBase
- 9.7 Performance Enhancement of Hadoop System
- 9.8 Conclusions and Future Directions
- Chapter 10: Packing Algorithms for Big Data Replay on Multicore
- Abstract
- 10.1 Introduction
- 10.2 Performance Bottlenecks
- 10.3 The Big Data Replay Method
- 10.4 Packing Algorithms
- 10.5 Performance Analysis
- 10.6 Summary and Future Directions
- Part III: Big Data Security and Privacy
- Chapter 11: Spatial Privacy Challenges in Social Networks
- Abstract
- Acknowledgments
- 11.1 Introduction
- 11.2 Background
- 11.3 Spatial Aspects of Social Networks
- 11.4 Cloud-Based Big Data Infrastructure
- 11.5 Spatial Privacy Case Studies
- 11.6 Conclusions
- Chapter 12: Security and Privacy in Big Data
- Abstract
- 12.1 Introduction
- 12.2 Secure Queries Over Encrypted Big Data
- 12.3 Other Big Data Security
- 12.4 Privacy on Correlated Big Data
- 12.5 Future Directions
- 12.6 Conclusions
- Chapter 13: Location Inferring in Internet of Things and Big Data
- Abstract
- Acknowledgements
- 13.1 Introduction
- 13.2 Device-based Sensing Using Big Data
- 13.3 Device-free Sensing Using Big Data
- 13.4 Conclusion
- Part IV: Big Data Applications
- Chapter 14: A Framework for Mining Thai Public Opinions
- Abstract
- Acknowledgments
- 14.1 Introduction
- 14.2 XDOM
- 14.3 Implementation
- 14.4 Validation
- 14.5 Case Studies
- 14.6 Summary and Conclusions
- Chapter 15: A Case Study in Big Data Analytics: Exploring Twitter Sentiment Analysis and the Weather
- Abstract
- Acknowledgments
- 15.1 Background
- 15.2 Big Data System Components
- 15.3 Machine-Learning Methodology
- 15.4 System Implementation
- 15.5 Key Findings
- 15.6 Summary and Conclusions
- Chapter 16: Dynamic Uncertainty-Based Analytics for Caching Performance Improvements in Mobile Broadband Wireless Networks
- Abstract
- 16.1 Introduction
- 16.2 Background
- 16.3 Related Work
- 16.4 VoD Architecture
- 16.5 Overview
- 16.6 Data Generation
- 16.7 Edge and Core Components
- 16.8 INCA Caching Algorithm
- 16.9 QoE Estimation
- 16.10 Theoretical Framework
- 16.11 Experiments and Results
- 16.12 Synthetic Dataset
- 16.13 Conclusions and Future Directions
- Chapter 17: Big Data Analytics on a Smart Grid: Mining PMU Data for Event and Anomaly Detection
- Abstract
- Acknowledgments
- 17.1 Introduction
- 17.2 Smart Grid With PMUs and PDCs
- 17.3 Improving Traditional Workflow
- 17.4 Characterizing Normal Operation
- 17.5 Identifying Unusual Phenomena
- 17.6 Identifying Known Events
- 17.7 Related Efforts
- 17.8 Conclusion and Future Directions
- Chapter 18: eScience and Big Data Workflows in Clouds: A Taxonomy and Survey
- Abstract
- 18.1 Introduction
- 18.2 Background
- 18.3 Taxonomy and Review of eScience Services in the Cloud
- 18.4 Resource Provisioning for eScience Workflows in Clouds
- 18.5 Open Problems
- 18.6 Summary
- Index
- No. of pages:
- 494
- Language:
- English
- Copyright:
- © Morgan Kaufmann 2016
- Published:
- 3rd June 2016
- Imprint:
- Morgan Kaufmann
- eBook ISBN:
- 9780128093467
- Paperback ISBN:
- 9780128053942
Rajkumar Buyya
Dr. Rajkumar Buyya is a Fellow of IEEE, Professor of Computer Science and Software Engineering, Future Fellow of the Australian Research Council, and Director of the Cloud Computing and Distributed Systems (CLOUDS) Laboratory at the University of Melbourne, Australia. He is also serving as the founding CEO of Manjrasoft, a spin-off company of the University, commercializing its innovations in Cloud Computing. He has authored over 500 publications and four text books including "Mastering Cloud Computing" published by McGraw Hill, China Machine Press, and Elsevier/Morgan Kaufmann for Indian, Chinese and international markets respectively. He is one of the highly cited authors in computer science and software engineering worldwide. "A Scientometric Analysis of Cloud Computing Literature" by German scientists ranked Dr. Buyya as the World's Top-Cited (#1) Author and the World's Most-Productive (#1) Author in Cloud Computing. Software technologies for Grid and Cloud computing developed under Dr. Buyya's leadership have gained rapid acceptance and are in use at several academic institutions and commercial enterprises in 40 countries around the world. Dr. Buyya has led the establishment and development of key community activities, including serving as foundation Chair of the IEEE Technical Committee on Scalable Computing and five IEEE/ACM conferences. These contributions and international research leadership of Dr. Buyya are recognized through the award of "2009 IEEE TCSC Medal for Excellence in Scalable Computing" from the IEEE Computer Society TCSC. Manjrasoft's Aneka Cloud technology developed under his leadership has received "2010 Frost & Sullivan New Product Innovation Award" and recently Manjrasoft has been recognised as one of the Top 20 Cloud Computing companies by the Silicon Review Magazine. He served as the foundation Editor-in-Chief of IEEE Transactions on Cloud Computing. He is currently serving as Co-Editor-in-Chief of Journal of Software: Practice and Experience, which was established 40+ years ago. For further information on Dr.Buyya, please visit his cyberhome: www.buyya.com
Professor of Computer Science, University of Melbourne, Australia, and founding CEO, Manjrasoft Pty Ltd.
Rodrigo Calheiros
Dr. Rodrigo Calheiros is a Research Fellow in the Department of Computing and Information Systems, The University of Melbourne, Australia. He has been doing contribution in the fields of big data and cloud computing since 2009. He designed and developed CloudSim, an Open Source tool for simulation of cloud platforms used research centers, universities, and companies worldwide.
Department of Computing and Information Systems, The University of Melbourne, Australia
Amir Vahid Dastjerdi
Dr. Amir Vahid Dastjerdi is a research fellow with the Cloud Computing and Distributed Systems (CLOUDS) laboratory at the University of Melbourne. He received his PhD in computer science from the University of Melbourne and his areas of interest include Internet of Things, Big data, and cloud computing. He is a technology enthusiast who has over a decade experience in distributed systems.
Cloud Computing and Distributed Systems (CLOUDS) laboratory, University of Melbourne, Australia