This book brings all of the elements of data mining together in a single volume, saving the reader the time and expense of making multiple purchases. It consolidates both introductory and advanced topics, thereby covering the gamut of data mining and machine learning tactics ? from data integration and pre-processing, to fundamental algorithms, to optimization techniques and web mining methodology.
The proposed book expertly combines the finest data mining material from the Morgan Kaufmann portfolio. Individual chapters are derived from a select group of MK books authored by the best and brightest in the field. These chapters are combined into one comprehensive volume in a way that allows it to be used as a reference work for those interested in new and developing aspects of data mining.
This book represents a quick and efficient way to unite valuable content from leading data mining experts, thereby creating a definitive, one-stop-shopping opportunity for customers to receive the information they would otherwise need to round up from separate sources.
- Chapters contributed by various recognized experts in the field let the reader remain up to date and fully informed from multiple viewpoints.
- Presents multiple methods of analysis and algorithmic problem-solving techniques, enhancing the reader’s technical expertise and ability to implement practical solutions.
- Coverage of both theory and practice brings all of the elements of data mining together in a single volume, saving the reader the time and expense of making multiple purchases.
Data analysts, Data modelers, Database R&D professionals, data warehouse engineers, data mining professionals, undergraduate and graduate students who want to incorporate data mining as part of their data management knowledge base and expertise.
Chapter 1: Data Mining Overview
Chapter 2: Data Acquisition and Integration
Chapter 3: Data Pre-processing
Chapter 4: Physical Design for Decision Support, Warehousing, and OLAP Chapter 5: Algorithms - The Basic Methods Chapter 6: Further Techniques in Decision Analysis Chapter 7: Fundamental Concepts of Genetic Algorithms Chapter 8: Spatio-Temporal Data Structures and Algorithms for Moving Objects Types Chapter 9: Improving the Mined Model Chapter 10: Web Mining - Social Network Analysis
- No. of pages:
- © Morgan Kaufmann 2009
- 12th November 2008
- Morgan Kaufmann
- Hardcover ISBN:
- eBook ISBN:
Soumen Chakrabarti is assistant Professor in Computer Science and Engineering at the Indian Institute of Technology, Bombay. Prior to joining IIT, he worked on hypertext databases and data mining at IBM Almaden Research Center. He has developed three systems and holds five patents in this area. Chakrabarti has served as a vice-chair and program committee member for many conferences, including WWW, SIGIR, ICDE, and KDD, and as a guest editor of the IEEE TKDE special issue on mining and searching the Web. His work on focused crawling received the Best Paper award at the 8th International World Wide Web Conference (1999). He holds a Ph.D. from the University of California, Berkeley.
Asst. Prof. of Computer Science, Indian Institute of Technology, Bombay
Earl founded and serves as President of, Scianta Intelligence, a next generation machine intelligence and knowledge exploration company. He is a futurist, author, management consultant, and educator involved in discovering the epistemology of advanced intelligent systems, the redefinition of the machine mind, and, as a pioneer of Internet-based technologies, the way in which evolving inter-connected virtual worlds will affect the sociology of business and culture in the near and far future.
Earl has over thirty years experience in managing and participating in the software development process at the system as well as tightly integrated application level. In the area of advanced machine intelligence technologies, Earl is a recognized expert in fuzzy logic, and adaptive fuzzy systems as they are applied to information and decision theory. He has pioneered the integration of fuzzy neural systems with genetic algorithms and case-based reasoning. As an industry observer and futurist, Earl has written and talked extensively on the philosophy of the Response to Change, the nature of Emergent Intelligence, and the Meaning of Information Entropy in Mind and Machine.
Scianta Intelligence, LLC, Chapel Hill, NC
Eibe Frank lives in New Zealand with his Samoan spouse and two lovely boys, but originally hails from Germany, where he received his first degree in computer science from the University of Karlsruhe. He moved to New Zealand to pursue his Ph.D. in machine learning under the supervision of Ian H. Witten, and joined the Department of Computer Science at the University of Waikato as a lecturer on completion of his studies. He is now an associate professor at the same institution. As an early adopter of the Java programming language, he laid the groundwork for the Weka software described in this book. He has contributed a number of publications on machine learning and data mining to the literature and has refereed for many conferences and journals in these areas.>
Associate Professor, Department of Computer Science, University of Waikato, Hamilton, New Zealand
Jiawei Han is Professor in the Department of Computer Science at the University of Illinois at Urbana-Champaign. Well known for his research in the areas of data mining and database systems, he has received many awards for his contributions in the field, including the 2004 ACM SIGKDD Innovations Award. He has served as Editor-in-Chief of ACM Transactions on Knowledge Discovery from Data, and on editorial boards of several journals, including IEEE Transactions on Knowledge and Data Engineering and Data Mining and Knowledge Discovery.
University of Illinois, Urbana Champaign
University of Pittsburgh, PA, USA
Micheline Kamber is a researcher with a passion for writing in easy-to-understand terms. She has a master's degree in computer science (specializing in artificial intelligence) from Concordia University, Canada.
Simon Fraser University, Burnaby, Canada
Sam Lightstone is a Senior Technical Staff Member and Development Manager with IBM’s DB2 product development team. His work includes numerous topics in autonomic computing and relational database management systems. He is cofounder and leader of DB2’s autonomic computing R&D effort. He is Chair of the IEEE Data Engineering Workgroup on Self Managing Database Systems and a member of the IEEE Computer Society Task Force on Autonomous and Autonomic Computing. In 2003 he was elected to the Canadian Technical Excellence Council, the Canadian affiliate of the IBM Academy of Technology. He is an IBM Master Inventor with over 25 patents and patents pending; he has published widely on autonomic computing for relational database systems. He has been with IBM since 1991.
IBM, Toronto, Canada
Richard E. Neapolitan is professor and Chair of Computer Science at Northeastern Illinois University. He has previously written four books including the seminal 1990 Bayesian network text Probabilistic Reasoning in Expert Systems. More recently, he wrote the 2004 text Learning Bayesian Networks, the textbook Foundations of Algorithms, which has been translated to three languages and is one of the most widely-used algorithms texts world-wide, and the 2007 text Probabilistic Methods for Financial and Marketing Informatics (Morgan Kaufmann Publishers).
Northeastern Illinois University, Chicago, USA
Dorian Pyle is Chief Scientist and Founder of PTI (www.pti.com), which develops and markets Powerhouse™ predictive and explanatory analytics software. Dorian has over 20 years experience in artificial intelligence and machine learning techniques which are used in what is known today as “data mining” or “predictive analytics”. He has applied this knowledge as a consultant with Knowledge Stream Partners, Xchange, Naviant, Thinking Machines, and Data Miners and with various companies directly involved in credit card marketing for banks and with manufacturing companies using industrial automation. In 1976 he was involved in building artificially intelligent machine learning systems utilizing the pioneering technologies that are currently known as neural computing and associative memories. He is current in and familiar with using the most advanced technologies in data mining including: entropic analysis (information theory), chaotic and fractal decomposition, neural technologies, evolution and genetic optimization, algebra evolvers, case-based reasoning, concept induction and other advanced statistical techniques.
Mamdouh Refaat is a data mining and business analytics consultant advising major organizations in North America and Europe. He has held several positions in consulting organizations and software vendors, including the director of consulting services at ANGOSS Software Corporation, a global data mining software and service provider. During his career, Mamdouh has managed numerous data mining consulting projects in marketing, CRM, and credit risk for Fortune 500 organizations in North America and Europe. In addition, he has delivered over 50 professional training courses in data mining and business analytics. Mamdouh holds a Ph.D. in Engineering from the University of Toronto, and an MBA from the University of Leeds.
During his career, Mamdouh has managed numerous data mining consulting projects in marketing, CRM, and credit risk for Fortune 500 organizations in North America and Europe. In addition, he has delivered over 50 professional training courses in data mining and business analytics.
Mamdouh holds a PhD in Engineering from the University of Toronto, and an MBA from the University of Leeds.
Markus Schneider is an Assistant Professor in the Computer Science Department of the University of Florida and holds a doctoral degree in Computer Science from the University of Hagen, Germany. He is author of a monograph in the area of spatial databases and of a German textbook on implementation concepts for database systems, and has published about 40 articles on database systems. He is on the editorial board of GeoInformatica.
University of Florida at Gainesville
Toby J. Teorey is a professor in the Electrical Engineering and Computer Science Department at the University of Michigan, Ann Arbor. He received his B.S. and M.S. degrees in electrical engineering from the University of Arizona, Tucson, and a Ph.D. in computer sciences from the University of Wisconsin, Madison. He was general chair of the 1981 ACM SIGMOD Conference and program chair for the 1991 Entity-Relationship Conference. Professor Teorey’s current research focuses on database design and data warehousing, OLAP, advanced database systems, and performance of computer networks. He is a member of the ACM and the IEEE Computer Society.
University of Michigan, Ann Arbor, USA
Ian H. Witten is a professor of computer science at the University of Waikato in New Zealand. He directs the New Zealand Digital Library research project. His research interests include information retrieval, machine learning, text compression, and programming by demonstration. He received an MA in Mathematics from Cambridge University, England; an MSc in Computer Science from the University of Calgary, Canada; and a PhD in Electrical Engineering from Essex University, England. He is a fellow of the ACM and of the Royal Society of New Zealand. He has published widely on digital libraries, machine learning, text compression, hypertext, speech synthesis and signal processing, and computer typography. He has written several books, the latest being Managing Gigabytes (1999) and Data Mining (2000), both from Morgan Kaufmann.
Professor, Computer Science Department, University of Waikato, New Zealand.