Big Data Analytics - 1st Edition - ISBN: 9780444634924, 9780444634979

Big Data Analytics, Volume 33

1st Edition

Series Volume Editors: Venu Govindaraju Vijay Raghavan C.R. Rao
eBook ISBN: 9780444634979
Hardcover ISBN: 9780444634924
Imprint: North Holland
Published Date: 7th July 2015
Page Count: 390
Sales tax will be calculated at check-out Price includes VAT/GST
15% off
15% off
15% off
Price includes VAT/GST
× DRM-Free

Easy - Download and start reading immediately. There’s no activation process to access eBooks; all eBooks are fully searchable, and enabled for copying, pasting, and printing.

Flexible - Read on multiple operating systems and devices. Easily read eBooks on smart phones, computers, or any eBook readers, including Kindle.

Open - Buy once, receive and download all available eBook formats, including PDF, EPUB, and Mobi (for Kindle).

Institutional Access

Secure Checkout

Personal information is secured with SSL technology.

Free Shipping

Free global shipping
No minimum order.

Table of Contents

  • Preface
  • A: Modeling and Analytics
    • Chapter 1: Document Informatics for Scientific Learning and Accelerated Discovery
      • Abstract
      • 1 Introduction
      • 2 How Document Informatics Will Aid Materials Discovery
      • 3 The General Research Framework
      • 4 Pilot Implementation
    • Chapter 2: An Introduction to Rare Event Simulation and Importance Sampling
      • Abstract
      • 1 Introduction: Monte Carlo Methods, Rare Event Simulation, and Variance Reduction Techniques
      • 2 MC Methods and the Problem of Rare Events
      • 3 Importance Sampling
      • 4 Multiple IS
      • 5 The Cross-Entropy Method
      • 6 MCMC: Rejection Sampling, the Metropolis Method, and Gibbs Sampling
      • 7 Applications of VRTs to Error Estimation in Optical Fiber Communication Systems
      • 8 Large Deviations Theory, Asymptotic Efficiency, and Final Remarks
    • Chapter 3: A Large-Scale Study of Language Usage as a Cognitive Biometric Trait
      • Abstract
      • 1 Introduction
      • 2 Cognitive Fingerprints: Problem Description
      • 3 Data Description
      • 4 Methodology
      • 5 Results
      • 6 Discussions
      • 7 Related Work
      • 8 Conclusions and Future Work
      • Acknowledgment
    • Chapter 4: Customer Selection Utilizing Big Data Analytics
      • Abstract
      • 1 Introduction
      • 2 Methodology
      • 3 Experiments
      • 4 Conclusion
    • Chapter 5: Continuous Model Selection for Large-Scale Recommender Systems
      • Abstract
      • 1 Introduction
      • 2 Related Work
      • 3 Preference Prediction
      • 4 Proposed Continuous Modeling
      • 5 Experimental Evaluations
      • 6 Conclusion and Future Work
    • Chapter 6: Zero-Knowledge Mechanisms for Private Release of Social Graph Summarization
      • Abstract
      • 1 Introduction
      • 2 Related Work
      • 3 Graph Summarization
      • 4 Background on ε-Zero-Knowledge Privacy
      • 5 ZKP Mechanism for Graph Summarization
      • 6 Evaluation
      • 7 From Privacy Level to Noise Scale
      • 8 Private Probabilistic A-GS
      • 9 Conclusions
    • Chapter 7: Distributed Confidence-Weighted Classification on Big Data Platforms
      • Abstract
      • 1 Introduction
      • 2 Classification with Linear SVM Models
      • 3 MapReduce Framework for Distributed Computations
      • 4 CW Classification Using MapReduce
      • 5 Experiments
      • 6 Conclusion
      • Acknowledgments
  • B: Applications and Infrastructure
    • Chapter 8: Big Data Applications in Health Sciences and Epidemiology
      • Abstract
      • 1 Introduction
      • 2 Mathematical Framework for Epidemiology
      • 3 Dynamics and Analysis Problems
      • 4 Inference Problems
      • 5 Disease Surveillance, Molecular Epidemiology, and Pathogen Phylodynamics
      • 6 High-Performance Synthetic Information Environments and Tools
      • 7 Summary
      • Acknowledgments
    • Chapter 9: Big Data Driven Natural Language Processing Research and Applications
      • Abstract
      • 1 Introduction
      • 2 NLP Core Tasks
      • 3 NLP Applications
      • 4 Data Sources for NLP Research
      • 5 Big Data Driven NLP Research and Applications
      • 6 Trends and Future Research Directions
      • 7 Conclusions
    • Chapter 10: Analyzing Big Spatial and Big Spatiotemporal Data: A Case Study of Methods and Applications
      • Abstract
      • 1 Introduction
      • 2 Algorithms
      • 3 Applications
      • 4 Conclusions
    • Chapter 11: Experimental Computational Simulation Environments for Big Data Analytic in Social Sciences
      • Abstract
      • 1 Introduction
      • 2 Big Data Analytics
      • 3 Sociofinancial-Economic Simulations
      • 4 Software Infrastructure for Social Sciences
      • 5 Market Simulators for Financial Economics Modeling
      • 6 Statistical Simulations of AT Models
      • 7 DRACUS
      • 8 Summary
    • Chapter 12: Terabyte-Scale Image Similarity Search
      • Abstract
      • 1 Introduction
      • 2 Big-Data Processing
      • 3 Application Workload (Distributed Indexing + Searching)
      • 4 Hadoop in Practice
      • 5 Large-Scale Hadoop
      • 6 Conclusion
      • Acknowledgments
    • Chapter 13: Measuring Inter-site Engagement in a Network of Sites
      • Abstract
      • 1 Introduction
      • 2 Related Work
      • 3 Data, Networks, and Metrics
      • 4 Evaluating Inter-site Metrics
      • 5 Studying Inter-site Engagement
      • 6 The Network Effect
      • 7 Hyperlink Performance
      • 8 Conclusions
      • 9 Future Work
      • Acknowledgments
    • Chapter 14: Scaling RDF Triple Stores in Size and Performance: Modeling SPARQL Queries as Graph Homomorphism Routines
      • Abstract
      • 1 Introduction
      • 2 SPARQL Queries as Graph Homomorphism Routines
      • 3 GEMS: Graph Database Engine for Multithreaded Systems
      • 4 Related Work
      • 5 Experimental Results
      • 6 Conclusions
  • Index


While the term Big Data is open to varying interpretation, it is quite clear that the Volume, Velocity, and Variety (3Vs) of data have impacted every aspect of computational science and its applications. The volume of data is increasing at a phenomenal rate and a majority of it is unstructured. With big data, the volume is so large that processing it using traditional database and software techniques is difficult, if not impossible. The drivers are the ubiquitous sensors, devices, social networks and the all-pervasive web. Scientists are increasingly looking to derive insights from the massive quantity of data to create new knowledge. In common usage, Big Data has come to refer simply to the use of predictive analytics or other certain advanced methods to extract value from data, without any required magnitude thereon. Challenges include analysis, capture, curation, search, sharing, storage, transfer, visualization, and information privacy. While there are challenges, there are huge opportunities emerging in the fields of Machine Learning, Data Mining, Statistics, Human-Computer Interfaces and Distributed Systems to address ways to analyze and reason with this data. The edited volume focuses on the challenges and opportunities posed by "Big Data" in a variety of domains and how statistical techniques and innovative algorithms can help glean insights and accelerate discovery. Big data has the potential to help companies improve operations and make faster, more intelligent decisions.

Key Features

  • Review of big data research challenges from diverse areas of scientific endeavor
  • Rich perspective on a range of data science issues from leading researchers
  • Insight into the mathematical and statistical theory underlying the computational methods used to address big data analytics problems in a variety of domains


Computer scientists, statisticians, data scientists, and Artificial Intelligence researchers


No. of pages:
© North Holland 2015
North Holland
eBook ISBN:
Hardcover ISBN:


"...a handbook meant for researchers and practitioners that are familiar with the basic concepts and techniques of data mining and statistics...a consistent and smooth flowing writing style." --KDnuggets

"A book that balances the numeric, text, and categorical data mining with a true big data perspective. " --Kdnuggets

"Professors Venu Givindaraju, Vijay Raghavan & C. R. Rao, have done a brilliant job by tackling these complex issues in their edited book: Big Data Analytics (Handbook of Statistics, Volume 33), under Elsevier." --The Indian Economist

Ratings and Reviews

About the Series Volume Editors

Venu Govindaraju Series Volume Editor

Dr. Venu Govindaraju, SUNY Distinguished Professor of Computer Science and Engineering, is the Vice President of Research and Economic Development of the University at Buffalo and founding director of the Center for Unified Biometrics and Sensors. He received his Bachelor’s degree with honors from the Indian Institute of Technology (IIT) in 1986, and his Ph.D. from UB in 1992. His research focus is on machine learning and pattern recognition in the domains of Document Image Analysis and Biometrics. Dr. Govindaraju has co-authored about 400 refereed scientific papers. His seminal work in handwriting recognition was at the core of the first handwritten address interpretation system used by the US Postal Service. He was also the prime technical lead responsible for technology transfer to the Postal Services in US, Australia, and UK. He has been a Principal or Co-Investigator of sponsored projects funded for about 65 million dollars. Dr. Govindaraju has supervised the dissertations of 30 doctoral students. He has served on the editorial boards of premier journals such as the IEEE Transactions on Pattern Analysis and Machine Intelligence and is currently the Editor-in-Chief of the IEEE Biometrics Council Compendium. Dr. Govindaraju is a Fellow of the ACM (Association of Computing Machinery), IEEE (Institute of Electrical and Electronics Engineers), AAAS (American Association for the Advancement of Science), the IAPR (International Association of Pattern Recognition), and the SPIE (International Society of Optics and Photonics). He is recipient of the 2004 MIT Global Indus Technovator award and the 2010 IEEE Technical Achievement award.

Affiliations and Expertise

The State University of New York, Buffalo, NY, USA

Vijay Raghavan Series Volume Editor

Dr. Vijay Raghavan is the Alfred and Helen Lamson/ BoRSF Endowed Professor in Computer Science at the Center for Advanced Computer Studies and the Director of the NSF-sponsored Industry/ University Cooperative Research Center for Visual and Decision Informatics. As the director, he co-ordinates several multi-institutional, industry-driven research projects and manages a budget of over $500K/year. From 1997 to 2003, he led a $2.3M research and development project in close collaboration with the USGS National Wetlands Research Center and with the Department of Energy's Office of Science and Technical Information on creating a digital library with data mining capabilities incorporated. His research interests are in data mining, information retrieval, machine learning and Internet computing. He has published over 250 peer-reviewed research papers- appearing in top-level journals and proceedings- that cumulatively accord him an h-index of 31, based on citations. He has served as major advisor for 24 doctoral students. Besides substantial technical expertise, Dr. Raghavan has vast experience managing interdisciplinary and multi- institutional collaborative projects. He has also directed industry-sponsored research, on projects pertaining to Neuro-imaging based dementia detection and literature-based biomedical hypotheses generation, respectively. He received the IEEE International Conference on Data Mining (ICDM) 2005 Outstanding Service Award. Dr. Raghavan serves as a member of the Executive Committee of the IEEE Technical Committee on Intelligent Informatics (IEEE-TCII), the Web Intelligence Consortium (WIC) Technical Committee and the Web Intelligence and Intelligent Agent Technology Conferences’ Steering Committee. He was one of the Conference Co-Chairs of IEEE 2013 Big Data Conference. For many years of service to the community, he received the WIC 2013 Outstanding Service Award. He was a member of the Steering Committee of IEEE BigData 2014 conference held on Oct. 27 – 30, 2014 at Washington, D.C. He is one of the Editors-in-Chief of the Web Intelligence journal, an Associate Editor of the ACM Transactions on Internet Technology and the International J. of Computer Science & Applications, and a member of the International Rough Set Society Advisory Board. He is an ACM Distinguished Scientist and served as an ACM Distinguished Lecturer from 1993 – 2006. In addition, he served as a member of the Advisory Committee of the NSF Computer and Information Science and Engineering directorate (CISE-AC) during 2008 – 2010.

Affiliations and Expertise

University of Louisiana, Lafayette, LA, USA

C.R. Rao Series Volume Editor

C. R. Rao, born in India, is one of this century's foremost statisticians, and received his MA degree in statistics from Calcutta University. He is Emeritus Holder of the Eberly Family Chair in Statistics at Penn State and Director of the Center for Multivariate Analysis. He has long been recognized as one of the world's top statisticians, and has been awarded 37 honorary doctorates from universities in 19 countries spanning 6 continents. His research has influenced not only statistics, but also the physical, social and natural sciences and engineering. In 2011 he was recipient of the Royal Statistical Society's Guy Medal in Gold which is awarded triennially to those "who are judged to have merited a signal mark of distinction by reason of their innovative contributions to the theory or application of statistics". It can be awarded both to fellows (members) of the Society and to non-fellows. Since its inception 120 years ago the Gold Medal has been awarded to 34 distinguished statisticians. The first medal was awarded to Charles Booth in 1892. Only two statisticians, H. Cramer (Norwegian) and J. Neyman (Polish), outside Great Britain were awarded the Gold medal and C. R. Rao is the first non-European and non-American to receive the award. Other awards he has received are the Gold Medal of Calcutta University, Wilks Medal of the American Statistical Association, Wilks Army Medal, Guy Medal in Silver of the Royal Statistical Society (UK), Megnadh Saha Medal and Srinivasa Ramanujan Medal of the Indian National Science Academy, J.C.Bose Gold Medal of Bose Institute and Mahalanobis Centenary Gold Medal of the Indian Science Congress, the Bhatnagar award of the Council of Scientific and Industrial Research, India, the National Medal of Science, awarded by the president of USA, the National Medal of Science of the govt. of India. The government of India also honored him with the second highest civilian award, Padma Vibhushan, for “outstanding contributions to Science and Engineering / Statistics”, and also instituted a cash award in honor of C R Rao, “to be given once in two years to a young statistician for work done during the preceding 3 years in any field of statistics”. For his outstanding achievements Rao has been honored with the establishment of an institute named after him, C.R.Rao Advanced Institute for Mathematics, Statistics and Computer Science, in the campus of the University of Hyderabad, India. C. R. Rao is a Fellow of the Royal Society, UK, and a member of the National Academy of Science, USA.

Affiliations and Expertise

State University of New York, Buffalo, NY, USA and C.R.Rao Advanced Institute of Mathematics, Statistics and Computer Science, Hyderabad, India