A Guide to Design, Analysis and Discovery.


  • Ronald Forthofer, Boulder County, Colorado, U.S.A.
  • Eun Lee, Oregon Health Science University, Portland, U.S.A.
  • Mike Hernandez, Anderson Cancer Center, Houston, Texas, U.S.A.

Today, mathematics, biology, medicine, and statistics are closing the interdisciplinary gap in an unprecedented way and many of the important unanswered questions now emerge at the interface of these disciplines. Now in its Second Edition, this user-friendly guide on biostatistics focuses on the proper use and interpretation of statistical methods. This textbook does not require extensive background in mathematics, making it user-friendly for all students in the public health sciences field. Instead of highlighting derivations of formulas, the authors provide rationales for the formulas, allowing students to grasp a better understanding of the link between biology and statistics. The material on life tables and survival analysis allows students to better understand the recent literature in the health field, particularly in the study of chronic disease treatment. Biostatistics now includes a companion website to demonstrate the different applications of computer packages for performing the various analyses presented in this text.
View full description


Students in the health sciences, public health professionals, practitioners.


Book information

  • Published: December 2006
  • ISBN: 978-0-12-369492-8


"This book succeeds in all aspects: It is clearly written, it has excellent coverage of the fundamentals of statistical methodology...and it is informative through its novel statistical presentation...." - Journal of the American Statistical Association

Table of Contents

1. INTRODUCTION 1.1 What is Biostatistics? 1.2 Data – The Key Component of a Study 1.3 Design – The Road to Relevant Data 1.4 Replication – Part of the Scientific Method 1.5 Applying Statistical Methods Concluding Remarks Exercises References 2. DATA AND NUMBERS 2.1 Data: Numerical Representation 2.2 Observations and Variables 2.3 Scales Used with Variables 2.4 Reliability and Validity 2.5 Randomized Response Technique 2.6 Common Data Problems Concluding Remarks Exercises References 3. DESCRIPTIVE METHODS 3.1 Introduction to Descriptive Methods 3.2 Tabular and Graphic Presentation of Data 3.2.1 Frequency Tables 3.2.2 Line Graphs 3.2.3 Bar Charts 3.2.4 Histograms 3.2.5 Stem-and-Leaf Plots 3.2.6 Dot Plots 3.2.7 Scatter Plots 3.3 Measures of Central Tendency 3.3.1 Mean, Median, and Mode 3.3.2 Use of the Measures of Central Tendency 3.3.3 The Geometric Mean 3.4 Measures of Variability 3.4.1 Ranges and Percentiles 3.4.2 Box Plots 3.4.3 Variance and Standard Deviation 3.5 Rates and Ratios 3.5.1 Crude and Specific Rates 3.5.2 Adjusted Rates3.6 Measures of Change Over Time 3.6.1 Linear Growth 3.6.2 Geometric Growth 3.6.3 Exponential Growth 3.7 Correlation Coefficients 3.7.1 Pearson Correlation Coefficient 3.7.2 Spearman Rank Correlation Coefficient Concluding Remarks Exercises References 4. PROBABILITY AND LIFE TABLES 4.1 A Definition of Probability 4.2 Rules for Calculating Probabilities 4.2.1 Addition Rule for Probabilities 4.2.2 Conditional Probabilities 4.2.3 Independent Events 4.3 Definitions from Epidemiology 4.3.1 Rates and Probabilities 4.3.2 Sensitivity, Specificity, and Predicted Value Positive and Negative 4.3.3 Receiver Operating Characteristic Plot 4.4 Bayes’ Theorem 4.5 Probability in Sampling 4.5.1 Sampling With Replacement 4.5.2 Sampling Without Replacement 4.6 Estimating Probabilities by Simulation 4.7 Probability and the Life Table 4.7.1 The First Four Columns of the Life Table 4.7.2 Some Uses of the Life Table 4.7.3 Expected Values in the Life Table 4.7.4 Other Expected Values in the Life Table Concluding Remarks Exercises References5. PROBABILITY DISTRIBUTIONS 5.1 The Binomial Distribution5.1.1 Binomial Probabilities5.1.2 Mean and Variance of the Binomial Distribution 5.1.3. Shapes of the Binomial Distribution5.2 The Poisson Distribution5.2.1 Poisson Probabilities5.2.2 Mean and Variance of the Poisson Distribution5.2.3 Finding Poisson Probabilities5.3 The Normal Distribution5.3.1 Normal Probabilities 5.3.2 Transforming to the Standard Normal Distribution5.3.3 Calculation of Normal Probabilities5.3.4 Normal Probability Plot5.4 The Central Limit Theorem5.5 Approximations to the Binomial and Poisson Distributions5.5.1 Normal Approximation to the Binomial Distribution5.5.2 Normal Approximation to the Poisson DistributionConcluding RemarksExercisesReferences 6. STUDY DESIGNS 6.1 Design: Putting Chance to Work 6.2 Sample Surveys and Experiments 6.3 Sampling and Sample Designs 6.3.1 Sampling Frame 6.3.2 Importance of Probability Sampling 6.3.3 Simple Random Sampling 6.3.4 Systematic Sampling 6.3.5 Stratified Random Sampling 6.3.6 Cluster Sampling 6.3.7 Problems Due to Unintended Sampling 6.4 Designed Experiments 6.4.1 Comparison Groups and Randomization 6.4.2 Random Assignment 6.4.3 Sample Size 6.4.4 Single and Double Blind Experiments 6.4.5 Blocking and Extraneous Variables 6.4.6 Limitations of Experiments 6.5 Variations in Study Designs 6.5.1 The Cross-Over Design 6.5.2 The Case Control Design 6.5.3 The Cohort Study Design Concluding Remarks Exercises References7. INTERVAL ESTIMATION 7.1 Prediction, Confidence, and Tolerance Intervals7.2 Distribution-Free Intervals7.2.1 Prediction Interval 7.2.2 Confidence Interval 7.2.3 Tolerance Interval7.3 Confidence Intervals Based on the Normal Distribution7.3.1 Confidence Interval for the Mean 7.3.2 Confidence Interval for a Proportion 7.3.3 Confidence Interval for Crude and Adjusted Rates7.4 Confidence Interval for the Difference of Two Means and Proportions Difference of Two Independent Means7.4.1 Difference of Two Dependent Means7.4.2 Difference of Two Independent Proportions7.4.3 Difference of Two Dependent Proportions7.5 Confidence Interval and Sample Size7.6 Confidence Interval for Other Measures7.6.1 Confidence Interval for the Variance7.6.2 Confidence Interval for Pearson Correlation Coefficient7.7 Prediction and Tolerance Intervals Based on the Normal Distribution7.7.1 Prediction Interval7.7.2 Tolerance Interval Concluding Remarks Exercises References 8. TESTS OF HYPOTHESES 8.1 Preliminaries in Tests of Hyppotheses 8.1.1 Definitions of Terms Used in Hypothesis Testing 8.1.2 Determination of Decision Rule 8.1.3 Relationship of the Decision Rule, á and â 8.1.4 Conducting the Test 8.2 Testing Hypotheses about the Mean 8.2.1 Known Variance 8.2.2 Unknown Varinace 8.3 Testing Hypotheses about the Proportion and Rates 8.4 Testing Hypotheses about the Variance 8.5 Testing Hypotheses about the Pearson Correlation Coefficient 8.6 Testing Hypotheses about the Difference of Two Means 8.6.1 Difference of Two Independent Means 8.6.2 Difference of Two Dependent Means 8.7 Testing Hypotheses about the Difference of Two Proportions 8.7.1 Difference of Two Independent Proportions 8.7.2 Difference of Two Dependent Means 8.8 Tests of Hypotheses and Sample Size8.9 Statistical and Practical Significance Concluding Remarks Exercises References 9. NONPARAMETRIC TESTS 9.1 Why Nonparametric Tests? 9.2 The Sign Test 9.3 The Wilcoxon Signed Rank Test 9.4 The Wilcoxon Rank Sum Test 9.5 The Kruskal-Wallis Test 9.6 The Friedman Test Concluding Remarks Exercises References 10. ANALYSIS OF CATEGORICAL DATA 10.1 Goodness-of-Fit Test 10.2 The 2 by 2 Contingency Table 10.2.1 Comparing Two Independent Binomial Proportions 10.2.2 Expected Cell Counts Assuming No Association 10.2.3 The Odds Ratio – a Measure of Association 10.2.4 The Fisher’s Exact Test 10.2.5 Analysis of Paired Data: The McNemar Test 10.3 The r by c Contingency Table 10.3.1 Testing Hypothesis of Non Association: The Chi-Square Test 10.3.2 Testing Hypothesis of No Trend 10.4 Multiple 2 by 2 Tables 10.4.1 Analyzing the Tables Separately 10.4.2 The Cochran-Mantel-Haenszel Test 104.3 The Mantel-Haenszel Common Odds Ratio Concluding Remarks Exercises References 11. ANALYSIS OF SURVIVAL DATA 11.1 Data Collection in Follow-Up Studies11.2 The Life Table Method11.3 The Product-Limit Method11.4 Comparison of Two Survival Distributions 11.4.1 The Cochran-Mantel-Haenszel Test 11.4.2 The Log-Rank TestConcluding RemarksExercisesReferences 12. ANALYSIS OF VARIANCE 12.1 Assumptions for the Use of the ANOVA 12.2 One-Way ANOVA 12.2.1 Sums of Squares and Mean Squares 12.2.2 The F Statistics 12.2.3 The ANOVA Table 12.3 Multiple Comparisons 12.3.1 Error Rates: Individual and Family 12.3.2 Tukey-Kramer Method 12.3.3 Fisher’s Least Significant Difference Method 12.3.4 Dunnett’s Method 12.4 Two-Way ANOVA for the Randomized Block Design with m Replicates 12.5 Two-Way ANOVA with Interaction 12.6 Linear Model Representation of the ANOVA 12.6.1 The Completely Randomized Design 12.6.2 The Randomized Block Design with m Replicates 12.6.3 Two-Way ANOVA with Interaction 12.7 ANOVA with Unequal Numbers of Observations in Subgroups Concluding Remarks Exercises References 13. LINEAR REGRESSION 13.1 Simple Linear Regression 13.1.1 Estimation of Coefficients 13.1.2 The Variance of Y|X 13.1.3 The Coefficient of Determination (R2) 13.2 Inference about the Coefficients 13.2.1 Assumptions for Inference in Linear Regression 13.2.2 Regression Diagnostics 13.2.3 The Slope Coefficient 13.2.4 The Y-Intercept Coefficient 13.2.5 The ANOVA Summary Table 13.3 Interval Estimation for and 13.3.1 Confidence Interval for 13.3.2 Prediction Interval for 13.4 Multiple Linear Regression 13.4.1 The Multiple Linear Regression Model 13.4.2 Specification of a Multiple Linear Regression Model 13.4.3 The Parameter Estimates, ANOVA, and Diagnostics 13.4.4 Multicollinearity Problems 13.4.5 Extending the Regression Model: Dummy Variables Concluding Remarks Exercises References 14. LOGISTIC AND PROPORTIONAL HAZARD REGRESSION 14.1 Introduction to Logistic Regression 14.2 Simple Logistic Regression 14.3 Multiple Logistic Regression 14.4 Ordered Logistic Regression 14.5 Introduction to Proportional Hazard Regression Concluding Remarks Exercises References 15. ANALYSIS OF SURVEY DATA15.1 Introduction to Design-Based Inference15.2 Complex Design and Unequal Selection Probability 15.2.1 Sample Weight15.2.2 Poststratification15.2.3 The Design Effect15.3 Strategies for Variance Estimation 15.3.1 Replicated Sampling: A General Approach15.3.2 Balanced Repeated Replication15.3.3 Jackknife Repeated Replication15.3.4 Linearization Method15.4 Strategies for Analysis15.4.1 Preliminary Analysis15.4.2 Subpopulation Analysis15.4.3 Descriptive Analysis15.4.4 Contingency Table Analysis15.4.5 Linear and Logistic Regression AnalysisConcluding RemarksExercisesReferences Appendices A. BASIC MATHEMATIC CONCEPTS B. STATISTICAL TABLES B1. Random Digits B2. Binomial Probabilities B3. Poisson Probabilities B4. Critical Values for the t Distribution B6. Charts for Confidence Intervals for the Proportion B7. Critical Values for the Chi-Square Distribution B8. Factors, k, for Two-Sided Tolerance Limits for Normal Distribution B9. Critical Values for the Wilcoxon Signed Rank Test B10. Critical Values for the Wilcoxon Rank Sum Test B11. Critical Values for the F Distribution B12. The Studentized Range for the Kramer-Tukey Procedure B13. The Studentized Range for the Dunnett Procedure C. SELECTED GOVERNMENTAL BIOSTATISTICAL DATA C1. Population Census Data C2. Vital Statistics C3. Sample Surveys C4. Life Tables D. SOLUTIONS TO SELECTED EXERCISES