Big Data Analytics

1st Edition, Volume 33 - July 7, 2015
Editors: Venu Govindaraju, Vijay Raghavan, C.R. Rao
Language: English
Hardback ISBN:
9 7 8 - 0 - 4 4 4 - 6 3 4 9 2 - 4
eBook ISBN:
9 7 8 - 0 - 4 4 4 - 6 3 4 9 7 - 9

While the term Big Data is open to varying interpretation, it is quite clear that the Volume, Velocity, and Variety (3Vs) of data have impacted every aspect of computational sc… Read more

Purchase options

LIMITED OFFER

Save 50% on book bundles

Immediately download your ebook while waiting for your print delivery. No promo code is needed.

Institutional subscription on ScienceDirect

Request a sales quote

While the term Big Data is open to varying interpretation, it is quite clear that the Volume, Velocity, and Variety (3Vs) of data have impacted every aspect of computational science and its applications. The volume of data is increasing at a phenomenal rate and a majority of it is unstructured. With big data, the volume is so large that processing it using traditional database and software techniques is difficult, if not impossible. The drivers are the ubiquitous sensors, devices, social networks and the all-pervasive web. Scientists are increasingly looking to derive insights from the massive quantity of data to create new knowledge. In common usage, Big Data has come to refer simply to the use of predictive analytics or other certain advanced methods to extract value from data, without any required magnitude thereon. Challenges include analysis, capture, curation, search, sharing, storage, transfer, visualization, and information privacy. While there are challenges, there are huge opportunities emerging in the fields of Machine Learning, Data Mining, Statistics, Human-Computer Interfaces and Distributed Systems to address ways to analyze and reason with this data. The edited volume focuses on the challenges and opportunities posed by "Big Data" in a variety of domains and how statistical techniques and innovative algorithms can help glean insights and accelerate discovery. Big data has the potential to help companies improve operations and make faster, more intelligent decisions.

A: Modeling and Analytics

Chapter 1: Document Informatics for Scientific Learning and Accelerated Discovery

Abstract
1 Introduction
2 How Document Informatics Will Aid Materials Discovery
3 The General Research Framework
4 Pilot Implementation

Chapter 2: An Introduction to Rare Event Simulation and Importance Sampling

Abstract
1 Introduction: Monte Carlo Methods, Rare Event Simulation, and Variance Reduction Techniques
2 MC Methods and the Problem of Rare Events
3 Importance Sampling
4 Multiple IS
5 The Cross-Entropy Method
6 MCMC: Rejection Sampling, the Metropolis Method, and Gibbs Sampling
7 Applications of VRTs to Error Estimation in Optical Fiber Communication Systems
8 Large Deviations Theory, Asymptotic Efficiency, and Final Remarks

Chapter 3: A Large-Scale Study of Language Usage as a Cognitive Biometric Trait

Abstract
1 Introduction
2 Cognitive Fingerprints: Problem Description
3 Data Description
4 Methodology
5 Results
6 Discussions
7 Related Work
8 Conclusions and Future Work
Acknowledgment

Chapter 4: Customer Selection Utilizing Big Data Analytics

Abstract
1 Introduction
2 Methodology
3 Experiments
4 Conclusion

Chapter 5: Continuous Model Selection for Large-Scale Recommender Systems

Abstract
1 Introduction
2 Related Work
3 Preference Prediction
4 Proposed Continuous Modeling
5 Experimental Evaluations
6 Conclusion and Future Work

Chapter 6: Zero-Knowledge Mechanisms for Private Release of Social Graph Summarization

Abstract
1 Introduction
2 Related Work
3 Graph Summarization
4 Background on ε-Zero-Knowledge Privacy
5 ZKP Mechanism for Graph Summarization
6 Evaluation
7 From Privacy Level to Noise Scale
8 Private Probabilistic A-GS
9 Conclusions

Chapter 7: Distributed Confidence-Weighted Classification on Big Data Platforms

Abstract
1 Introduction
2 Classification with Linear SVM Models
3 MapReduce Framework for Distributed Computations
4 CW Classification Using MapReduce
5 Experiments
6 Conclusion
Acknowledgments

B: Applications and Infrastructure

Chapter 8: Big Data Applications in Health Sciences and Epidemiology

Abstract
1 Introduction
2 Mathematical Framework for Epidemiology
3 Dynamics and Analysis Problems
4 Inference Problems
5 Disease Surveillance, Molecular Epidemiology, and Pathogen Phylodynamics
6 High-Performance Synthetic Information Environments and Tools
7 Summary
Acknowledgments

Chapter 9: Big Data Driven Natural Language Processing Research and Applications

Abstract
1 Introduction
2 NLP Core Tasks
3 NLP Applications
4 Data Sources for NLP Research
5 Big Data Driven NLP Research and Applications
6 Trends and Future Research Directions
7 Conclusions

Chapter 10: Analyzing Big Spatial and Big Spatiotemporal Data: A Case Study of Methods and Applications

Abstract
1 Introduction
2 Algorithms
3 Applications
4 Conclusions

Chapter 11: Experimental Computational Simulation Environments for Big Data Analytic in Social Sciences

Abstract
1 Introduction
2 Big Data Analytics
3 Sociofinancial-Economic Simulations
4 Software Infrastructure for Social Sciences
5 Market Simulators for Financial Economics Modeling
6 Statistical Simulations of AT Models
7 DRACUS
8 Summary

Chapter 12: Terabyte-Scale Image Similarity Search

Abstract
1 Introduction
2 Big-Data Processing
3 Application Workload (Distributed Indexing + Searching)
4 Hadoop in Practice
5 Large-Scale Hadoop
6 Conclusion
Acknowledgments

Chapter 13: Measuring Inter-site Engagement in a Network of Sites

Abstract
1 Introduction
2 Related Work
3 Data, Networks, and Metrics
4 Evaluating Inter-site Metrics
5 Studying Inter-site Engagement
6 The Network Effect
7 Hyperlink Performance
8 Conclusions
9 Future Work
Acknowledgments

Chapter 14: Scaling RDF Triple Stores in Size and Performance: Modeling SPARQL Queries as Graph Homomorphism Routines

Abstract
1 Introduction
2 SPARQL Queries as Graph Homomorphism Routines
3 GEMS: Graph Database Engine for Multithreaded Systems
4 Related Work
5 Experimental Results
6 Conclusions

Venu Govindaraju

Dr. Venu Govindaraju, SUNY Distinguished Professor of Computer Science and Engineering, is the Vice President of Research and Economic Development of the University at Buffalo and founding director of the Center for Unified Biometrics and Sensors. He received his Bachelor’s degree with honors from the Indian Institute of Technology (IIT) in 1986, and his Ph.D. from UB in 1992. His research focus is on machine learning and pattern recognition in the domains of Document Image Analysis and Biometrics. Dr. Govindaraju has co-authored about 400 refereed scientific papers. His seminal work in handwriting recognition was at the core of the first handwritten address interpretation system used by the US Postal Service. He was also the prime technical lead responsible for technology transfer to the Postal Services in US, Australia, and UK. He has been a Principal or Co-Investigator of sponsored projects funded for about 65 million dollars. Dr. Govindaraju has supervised the dissertations of 30 doctoral students. He has served on the editorial boards of premier journals such as the IEEE Transactions on Pattern Analysis and Machine Intelligence and is currently the Editor-in-Chief of the IEEE Biometrics Council Compendium. Dr. Govindaraju is a Fellow of the ACM (Association of Computing Machinery), IEEE (Institute of Electrical and Electronics Engineers), AAAS (American Association for the Advancement of Science), the IAPR (International Association of Pattern Recognition), and the SPIE (International Society of Optics and Photonics). He is recipient of the 2004 MIT Global Indus Technovator award and the 2010 IEEE Technical Achievement award.

Affiliations and expertise

The State University of New York, Buffalo, NY, USA

Vijay Raghavan

Dr. Vijay Raghavan is the Alfred and Helen Lamson/ BoRSF Endowed Professor in Computer Science at the Center for Advanced Computer Studies and the Director of the NSF-sponsored Industry/ University Cooperative Research Center for Visual and Decision Informatics. As the director, he co-ordinates several multi-institutional, industry-driven research projects and manages a budget of over $500K/year. From 1997 to 2003, he led a $2.3M research and development project in close collaboration with the USGS National Wetlands Research Center and with the Department of Energy's Office of Science and Technical Information on creating a digital library with data mining capabilities incorporated. His research interests are in data mining, information retrieval, machine learning and Internet computing. He has published over 250 peer-reviewed research papers- appearing in top-level journals and proceedings- that cumulatively accord him an h-index of 31, based on citations. He has served as major advisor for 24 doctoral students. Besides substantial technical expertise, Dr. Raghavan has vast experience managing interdisciplinary and multi- institutional collaborative projects. He has also directed industry-sponsored research, on projects pertaining to Neuro-imaging based dementia detection and literature-based biomedical hypotheses generation, respectively. He received the IEEE International Conference on Data Mining (ICDM) 2005 Outstanding Service Award. Dr. Raghavan serves as a member of the Executive Committee of the IEEE Technical Committee on Intelligent Informatics (IEEE-TCII), the Web Intelligence Consortium (WIC) Technical Committee and the Web Intelligence and Intelligent Agent Technology Conferences’ Steering Committee. He was one of the Conference Co-Chairs of IEEE 2013 Big Data Conference. For many years of service to the community, he received the WIC 2013 Outstanding Service Award. He was a member of the Steering Committee of IEEE BigData 2014 conference held on Oct. 27 – 30, 2014 at Washington, D.C. He is one of the Editors-in-Chief of the Web Intelligence journal, an Associate Editor of the ACM Transactions on Internet Technology and the International J. of Computer Science & Applications, and a member of the International Rough Set Society Advisory Board. He is an ACM Distinguished Scientist and served as an ACM Distinguished Lecturer from 1993 – 2006. In addition, he served as a member of the Advisory Committee of the NSF Computer and Information Science and Engineering directorate (CISE-AC) during 2008 – 2010.

Affiliations and expertise

University of Louisiana, Lafayette, LA, USA

C.R. Rao

book “Ancient Inhabitants of Jebel Moya” published by the Cambridge Press under the joint authorship of Rao and two anthropologists. On the basis of work done at CU during the two year period, 1946-1948, Rao earned a Ph.D. degree and a few years later Sc.D. degree of CU and the rare honor of life fellowship of Kings College, Cambridge.

He retired from ISI in 1980 at the mandatory age of 60 after working for 40 years during which period he developed ISI as an international center for statistical education and research. He also took an active part in establishing state statistical bureaus to collect local statistics and transmitting them to Central Statistical Organization in New Delhi. Rao played a pivitol role in launching undergraduate and postgraduate courses at ISI. He is the author of 475 research publications and several breakthrough papers contributing to statistical theory and methodology for applications to problems in all areas of human endeavor. There are a number of classical statistical terms named after him, the most popular of which are Cramer-Rao inequality, Rao-Blackwellization, Rao’s Orthogonal arrays used in quality control, Rao’s score test, Rao’s Quadratic Entropy used in ecological work, Rao’s metric and distance which are incorporated in most statistical books.

He is the author of 10 books, of which two important books are, Linear Statistical Inference which is translated into German, Russian, Czec, Polish and Japanese languages,and Statistics and Truth which is translated into, French, German, Japanese, Mainland Chinese, Taiwan Chinese, Turkish and Korean languages.

He directed the research work of 50 students for the Ph.D. degrees who in turn produced 500 Ph.D.’s. Rao received 38 hon. Doctorate degree from universities in 19 countries spanning 6 continents. He received the highest awards in statistics in USA,UK and India: National Medal of Science awarded by the president of USA, Indian National Medal of Science awarded by the Prime Minister of India and the Guy Medal in Gold awarded by the Royal Statistical Society, UK. Rao was a recipient of the first batch of Bhatnagar awards in 1959 for mathematical sciences and and numerous medals in India and abroad from Science Academies. He is a Fellow of Royal Society (FRS),UK, and member of National Academy of Sciences, USA, Lithuania and Europe. In his honor a research Institute named as CRRAO ADVANCED INSTITUTE OF MATHEMATICS, STATISTICS AND COMPUTER SCIENCE was established in the campus of Hyderabad University.

Affiliations and expertise

University of Hyderabad Campus, India