Practical Text Mining and Statistical Analysis for Non-structured Text Data Applications brings together all the information, tools and methods a professional will need to efficiently use text mining applications and statistical analysis.
Winner of a 2012 PROSE Award in Computing and Information Sciences from the Association of American Publishers, this book presents a comprehensive how-to reference that shows the user how to conduct text mining and statistically analyze results. In addition to providing an in-depth examination of core text mining and link detection tools, methods and operations, the book examines advanced preprocessing techniques, knowledge representation considerations, and visualization approaches. Finally, the book explores current real-world, mission-critical applications of text mining and link detection using real world example tutorials in such varied fields as corporate, finance, business intelligence, genomics research, and counterterrorism activities.
The world contains an unimaginably vast amount of digital information which is getting ever vaster ever more rapidly. This makes it possible to do many things that previously could not be done: spot business trends, prevent diseases, combat crime and so on. Managed well, the textual data can be used to unlock new sources of economic value, provide fresh insights into science and hold governments to account. As the Internet expands and our natural capacity to process the unstructured text that it contains diminishes, the value of text mining for information retrieval and search will increase dramatically.
- Extensive case studies, most in a tutorial format, allow the reader to 'click through' the example using a software program, thus learning to conduct text mining analyses in the most rapid manner of learning possible
- Numerous examples, tutorials, power points and datasets available via companion website on Elsevierdirect.com
- Glossary of text mining terms provided in the appendix
In one comprehensive resource, this book provides complete coverage of statistical analysis and text mining applications to aid professionals, practitioners, researchers and upper level undergraduate and graduate students for those who need to learn how to rapidly do text mining to incorporate into information distillation and thus good decision making.
Endorsements for Practical Text Mining & Statistical Analysis for Non-structured Text Data Applications
About the Authors
Building the Workshop Manual
The Structure of this Book
Part I: Basic Text Mining Principles
Part II: Tutorials
Part III: Advanced Topics
Why Did We Write This Book?
What Are the Benefits of Text Mining?
List of Tutorials by Guest Authors
Part I: Basic Text Mining Principles
Chapter 1. The History of Text Mining
The Roots of Text Mining: Information Retrieval, Extraction, and Summarization
Information Extraction and Modern Text Mining
Major Innovations in Text Mining since 2000
The Development of Enabling Technology in Text Mining
Emerging Applications in Text Mining
Sentiment Analysis and Opinion Mining
IBM’s Watson: An “Intelligent” Text Mining Machine?
Chapter 2. The Seven Practice Areas of Text Analytics
What is Text Mining?
The Seven Practice Areas of Text Analytics
Five Questions for Finding the Right Practice Area
The Seven Practice Areas in Depth
Interactions between the Practice Areas
Scope of This Book
Chapter 3. Conceptual Foundations of Text Mining and Preprocessing Steps
Syntax versus Semantics
The Generalized Vector-Space Model
Creating Vectors from Processed Text
Chapter 4. Applications
- No. of pages:
- © Academic Press 2012
- 11th January 2012
- Academic Press
- eBook ISBN:
- Hardcover ISBN:
Dr. Gary Miner received a B.S. from Hamline University, St. Paul, MN, with biology, chemistry, and education majors; an M.S. in zoology and population genetics from the University of Wyoming; and a Ph.D. in biochemical genetics from the University of Kansas as the recipient of a NASA pre-doctoral fellowship. He pursued additional National Institutes of Health postdoctoral studies at the U of Minnesota and U of Iowa eventually becoming immersed in the study of affective disorders and Alzheimer's disease. In 1985, he and his wife, Dr. Linda Winters-Miner, founded the Familial Alzheimer's Disease Research Foundation, which became a leading force in organizing both local and international scientific meetings, bringing together all the leaders in the field of genetics of Alzheimer's from several countries, resulting in the first major book on the genetics of Alzheimer’s disease. In the mid-1990s, Dr. Miner turned his data analysis interests to the business world, joining the team at StatSoft and deciding to specialize in data mining. He started developing what eventually became the Handbook of Statistical Analysis and Data Mining Applications (co-authored with Drs. Robert A. Nisbet and John Elder), which received the 2009 American Publishers Award for Professional and Scholarly Excellence (PROSE). Their follow-up collaboration, Practical Text Mining and Statistical Analysis for Non-structured Text Data Applications, also received a PROSE award in February of 2013. Overall, Dr. Miner’s career has focused on medicine and health issues, so serving as the ‘project director’ for this current book on ‘Predictive Analytics of Medicine – Healthcare Issues’ fit his knowledge and skills perfectly. Gary also serves as VP & Scientific Director of Healthcare Predictive Analytics Corp; as Merit Reviewer for PCORI (Patient Centered Outcomes Research Institute) that awards grants for predictive analytics research into the comparative effectiveness and heterogeneous treatment
StatSoft, Inc., Tulsa, OK, USA
Dr. John Elder heads the United States’ leading data mining consulting team, with offices in Charlottesville, Virginia; Washington, D.C.; and Baltimore, Maryland (www.datamininglab.com). Founded in 1995, Elder Research, Inc. focuses on investment, commercial, and security applications of advanced analytics, including text mining, image recognition, process optimization, cross-selling, biometrics, drug efficacy, credit scoring, market sector timing, and fraud detection. John obtained a B.S. and an M.E.E. in electrical engineering from Rice University and a Ph.D. in systems engineering from the University of Virginia, where he’s an adjunct professor teaching Optimization or Data Mining. Prior to 16 years at ERI, he spent five years in aerospace defense consulting, four years heading research at an investment management firm, and two years in Rice's Computational & Applied Mathematics Department.
Elder Research, Inc. and the University of Virginia, Charlottesville, USA
Dr. Andrew Fast leads research in text mining and social network analysis at Elder Research. Dr. Fast graduated magna cum laude from Bethel University and earned an M.S. and a Ph.D. in computer science from the University of Massachusetts Amherst. There, his research focused on causal data mining and mining complex relational data such as social networks. At ERI, Andrew leads the development of new tools and algorithms for data and text mining for applications of capabilities assessment, fraud detection, and national security. Dr. Fast has published on an array of applications, including detecting securities fraud using the social network among brokers and understanding the structure of criminal and violent groups. Other publications cover modeling peer-to-peer music file sharing networks, understanding how collective classification works, and predicting playoff success of NFL head coaches (work featured on ESPN.com).
Thomas Hill received his Vordiplom in psychology from Kiel University in Germany and earned an M.S. in industrial psychology and a Ph.D. in psychology and quantitative methods from the University of Kansas. He was associate professor (and then research professor) at the University of Tulsa from 1984 to 2009, where he taught data analysis and data mining courses. He also has been vice president for Research and Development and then Analytic Solutions at StatSoft Inc., where he has been involved for over 20 years in the development of data analysis, data and text mining algorithms, and the delivery of analytic solutions. Dr. Hill joined Dell through Dell’s acquisition of StatSoft in April 2014, and he is currently the Executive Director for Analytics at Dell’s Information Management Group. Dr. Hill has received numerous academic grants and awards from the National Science Foundation, the National Institute of Health, the Center for Innovation Management, the Electric Power Research Institute, and other institutions. He has completed diverse consulting projects with companies from practically all industries and has worked with the leading financial services, insurance, manufacturing, pharmaceutical, retailing, and other companies in the United States and internationally on identifying and refining effective data mining and predictive modeling solutions for diverse applications. Dr. Hill has published widely on innovative applications for data mining and predictive analytics. He is the author (with Paul Lewicki, 2005) of Statistics: Methods and Applications, the Electronic Statistics Textbook (a popular on-line resource on statistics and data mining), a co-author of Practical Text Mining and Statistical Analysis for Non-Structured Text Data Applications (2012); he is also a contributing author to the popular Handbook of Statistical Analysis and Data Mining Applications (2009).
StatSoft, Inc., Tulsa, OK, USA
Dr. Robert Nisbet was trained initially in Ecology and Ecosystems Analysis. He has over 30 years’ experience in complex systems analysis and modeling, most recently as a Researcher (University of California, Santa Barbara). In business, he pioneered the design and development of configurable data mining applications for retail sales forecasting, and Churn, Propensity-to-buy, and Customer Acquisition in Telecommunications Insurance, Banking, and Credit industries. In addition to data mining, he has expertise in data warehousing technology for Extract, Transform, and Load (ETL) operations, Business Intelligence reporting, and data quality analyses. He is lead author of the “Handbook of Statistical Analysis & Data Mining Applications” (Academic Press, 2009), and a co-author of "Practical Text Mining" (Academic Press, 2012). Currently, he serves as an Instructor in the University of California, Irvine Predictive Analytics Certification Program, teaching online courses in Effective Data preparation, and co-teaching Introduction to Predictive Analytics.
Pacific Capital Bank Corporation, Santa Barbara, CA, USA
Dr. Dursun Delen is the William S. Spears Chair in Business Administration and Associate Professor of Management Science and Information Systems in the Spears School of Business at Oklahoma State University (OSU). He received his Ph.D. in industrial engineering and management from OSU in 1997. Prior to his appointment as an assistant professor at OSU in 2001, he worked for a privately owned research and consultancy company, Knowledge Based Systems Inc., in College Station, Texas, as a research scientist for five years, during which he led a number of decision support and other information systems-related research projects funded by federal agencies, including DoD, NASA, NIST and DOE.
PROSE Award 2012, Book: Best Physical Sciences and Mathematics - Computing and Information Sciences, American Association of Publishers
"They’ve done it again. From the same industry leaders who brought you the "bible" of data mining comes the definitive, go-to text mining resource. This book empowers you to dig in and seize value, with over two dozen hands-on tutorials that drive an incredible range of applications such as predicting marketing success and detecting customer sentiment, criminal lies, writing authorship, and patient schizophrenia. These step-by-step tutorials immediately place you in the practitioner’s driver’s seat, executing on text analytics. Beyond this, 17 more chapters cover the latest methods and the leading tools, making this the most comprehensive resource, and earning it a well-deserved place on your desk aside the authors’ data mining handbook." --Eric Siegel, Ph.D., Founder, Predictive Analytics World, Text Analytics World and Prediction Impact, Inc.
"Of the number of statistics books that are published each year, only a few stand out as really being important, meaning that they positively influence how future research is done in the subject area of the text. I believe that Practical Text Mining is just such a book." --Joseph M. Hilbe, JD, PhD, Arizona State University and Jet Propulsion Laboratory
"When you want real help extracting insight from the mountains of text that you’re facing, this is the book to turn to for immediate practical advice." --Karl Rexer, PhD, President, Rexer Analytics, Boston, MA
"The underlying premise is that almost all data in databases takes the form of unstructured text, or summaries of unstructured text, and that historians, marketers, crime investigators, and others need to know how to search that text for meaningful patterns — a very different process than reading. Contributors in a range of fields share their insights and experience with the process. After setting out the principles, they present tutorials and case studies, then move on to advance