Data Deduplication Approaches

Data Deduplication Approaches

Concepts, Strategies, and Challenges

1st Edition - November 25, 2020

Write a review

  • Editors: Tin Thein Thwel, G. R. Sinha
  • Paperback ISBN: 9780128233955
  • eBook ISBN: 9780128236338

Purchase options

Purchase options
DRM-free (PDF, Mobi, EPub)
Sales tax will be calculated at check-out

Institutional Subscription

Free Global Shipping
No minimum order


In the age of data science, the rapidly increasing amount of data is a major concern in numerous applications of computing operations and data storage. Duplicated data or redundant data is a main challenge in the field of data science research. Data Deduplication Approaches: Concepts, Strategies, and Challenges shows readers the various methods that can be used to eliminate multiple copies of the same files as well as duplicated segments or chunks of data within the associated files. Due to ever-increasing data duplication, its deduplication has become an especially useful field of research for storage environments, in particular persistent data storage. Data Deduplication Approaches provides readers with an overview of the concepts and background of data deduplication approaches, then proceeds to demonstrate in technical detail the strategies and challenges of real-time implementations of handling big data, data science, data backup, and recovery. The book also includes future research directions, case studies, and real-world applications of data deduplication, focusing on reduced storage, backup, recovery, and reliability.

Key Features

  • Includes data deduplication methods for a wide variety of applications
  • Includes concepts and implementation strategies that will help the reader to use the suggested methods
  • Provides a robust set of methods that will help readers to appropriately and judiciously use the suitable methods for their applications
  • Focuses on reduced storage, backup, recovery, and reliability, which are the most important aspects of implementing data deduplication approaches
  • Includes case studies


Biomedical Engineers and researchers in biomedical engineering, applied informatics, and data science

students and researchers in artificial intelligence, data analytics, and data science

Table of Contents

  • 1. Introduction to data deduplication approaches
    2. Data deduplication concepts
    3. Concepts, strategies, and challenges of data deduplication
    4. Existing mechanisms for data deduplication
    5. Classification criteria for data deduplication methods
    6. File chunking approaches
    7. Study of data deduplication for file chunking approaches
    8. Essentials of data deduplication using open-source toolkit
    9. Efficient data deduplication scheme for scale-out distributed storage
    10. Identification of duplicate bug reports in software bug repositories: a systematic review, challenges and future scope
    11. A survey and critical analysis on energy generation from datacenter
    12. Review of MODIS EVI and NDVI data for data mining applications
    13. Performance modeling for secure migration processes of legacy systems to the cloud computing
    14. DedupCloud: an optimized efficient virtual machine deduplication algorithm in cloud computing environment
    15. Data deduplication for cloud storage
    16. Data duplication using Amazon Web Services cloud storage
    17. Game-theoretic analysis of encrypted cloud data deduplication
    18. Data deduplication applications in cognitive science and computer vision research

Product details

  • No. of pages: 404
  • Language: English
  • Copyright: © Academic Press 2020
  • Published: November 25, 2020
  • Imprint: Academic Press
  • Paperback ISBN: 9780128233955
  • eBook ISBN: 9780128236338

About the Editors

Tin Thein Thwel

Tin Thein Thwel, PhD is a Professor at Myanmar Institute of Information Technology (MIIT), Mandalay, Myanmar. She received her PhD in Information Technology from the University of Computer Studies, Yangon (UCSY), Myanmar. She is a reviewer and technical committee member of the International Conference on Computer and Applications (ICCA) on data deduplication, cyber security, data mining, and information retrieval. She has 16 years of teaching experience at the university level and her research interests include data deduplication, cyber security, data mining and data science, information retrieval, and distributed computing.

Affiliations and Expertise

Professor, Myanmar Institute of Information Technology (MIIT), Mandalay, Myanmar

G. R. Sinha

Dr. G R Sinha is a Professor at Myanmar Institute of Information Technology (MIIT) Mandalay, Myanmar. To his credit are 255 research papers, book chapters, and books, including Analysis of Medical Modalities for Improved Diagnosis in Modern Healthcare, Biomedical Signal Processing for Healthcare Applications, Brain and Behavior Computing, and Data Science and Its Applications from Chapman and Hall/CRC Press, Advances in Biometrics from Springer, and Cognitive Informatics, Volumes 1 and 2, AI-Based Brain Computer Interfaces, and Data Deduplication Approaches from Elsevier Academic Press. He was Dean of Faculty and an Executive Council Member of CSVTU and has served as Distinguished Speaker in the field of Digital Image Processing for the Computer Society of India. His research interests include Applications of Machine Learning and Artificial Intelligence in Medical Image Analysis, Biomedical Signal Analysis, Computer Aided Diagnosis, Computer Vision, and Cognitive Science.

Affiliations and Expertise

Adjunct Professor, International Institute of Information Technology Bengaluru (IIITB), Bangalore, Karnataka, India.

Ratings and Reviews

Write a review

There are currently no reviews for "Data Deduplication Approaches"