# Biostatistics

## 2nd Edition

### A Guide to Design, Analysis and Discovery

**Authors:**Ronald Forthofer Eun Lee Mike Hernandez

**eBook ISBN:**9780080467726

**Hardcover ISBN:**9780123694928

**Imprint:**Academic Press

**Published Date:**14th December 2006

**Page Count:**528

## Description

*Biostatistics, Second Edition, *is a user-friendly guide on biostatistics, which focuses on the proper use and interpretation of statistical methods.

This textbook does not require extensive background in mathematics, making it user-friendly for all students in the public health sciences field. Instead of highlighting derivations of formulas, the authors provide rationales for the formulas, allowing students to grasp a better understanding of the link between biology and statistics.

The material on life tables and survival analysis allows students to better understand the recent literature in the health field, particularly in the study of chronic disease treatment. This updated edition contains over 40% new material with modern real-life examples, exercises, and references, including new chapters on Logistic Regression, Analysis of Survey Data, and Study Designs.

The book is recommended for students in the health sciences, public health professionals, and practitioners.

## Key Features

- Over 40% new material with modern real-life examples, exercises and references
- New chapters on Logistic Regression; Analysis of Survey Data; and Study Designs
- Introduces strategies for analyzing complex sample survey data
- Written in a conversational style more accessible to students with real data

## Readership

Students in the health sciences, public health professionals, practitioners.

## Table of Contents

INTRODUCTION

1.1 What is Biostatistics? 1.2 Data – The Key Component of a Study 1.3 Design – The Road to Relevant Data 1.4 Replication – Part of the Scientific Method 1.5 Applying Statistical Methods Concluding Remarks Exercises References

DATA AND NUMBERS

2.1 Data: Numerical Representation 2.2 Observations and Variables 2.3 Scales Used with Variables 2.4 Reliability and Validity 2.5 Randomized Response Technique 2.6 Common Data Problems Concluding Remarks Exercises References

DESCRIPTIVE METHODS

3.1 Introduction to Descriptive Methods 3.2 Tabular and Graphic Presentation of Data 3.2.1 Frequency Tables 3.2.2 Line Graphs 3.2.3 Bar Charts 3.2.4 Histograms 3.2.5 Stem-and-Leaf Plots 3.2.6 Dot Plots 3.2.7 Scatter Plots 3.3 Measures of Central Tendency 3.3.1 Mean, Median, and Mode 3.3.2 Use of the Measures of Central Tendency 3.3.3 The Geometric Mean 3.4 Measures of Variability 3.4.1 Ranges and Percentiles 3.4.2 Box Plots 3.4.3 Variance and Standard Deviation 3.5 Rates and Ratios 3.5.1 Crude and Specific Rates 3.5.2 Adjusted Rates 3.6 Measures of Change Over Time 3.6.1 Linear Growth 3.6.2 Geometric Growth 3.6.3 Exponential Growth 3.7 Correlation Coefficients 3.7.1 Pearson Correlation Coefficient 3.7.2 Spearman Rank Correlation Coefficient Concluding Remarks Exercises References

PROBABILITY AND LIFE TABLES

4.1 A Definition of Probability 4.2 Rules for Calculating Probabilities 4.2.1 Addition Rule for Probabilities 4.2.2 Conditional Probabilities 4.2.3 Independent Events 4.3 Definitions from Epidemiology 4.3.1 Rates and Probabilities 4.3.2 Sensitivity, Specificity, and Predicted Value Positive and Negative 4.3.3 Receiver Operating Characteristic Plot 4.4 Bayes’ Theorem 4.5 Probability in Sampling 4.5.1 Sampling With Replacement 4.5.2 Sampling Without Replacement 4.6 Estimating Probabilities by Simulation 4.7 Probability and the Life Table 4.7.1 The First Four Columns of the Life Table 4.7.2 Some Uses of the Life Table 4.7.3 Expected Values in the Life Table 4.7.4 Other Expected Values in the Life Table Concluding Remarks Exercises References

- PROBABILITY DISTRIBUTIONS

5.1 The Binomial Distribution
5.1.1 Binomial Probabilities
5.1.2 Mean and Variance of the Binomial Distribution
5.1.3. Shapes of the Binomial Distribution
5.2 The Poisson Distribution
5.2.1 Poisson Probabilities
5.2.2 Mean and Variance of the Poisson Distribution
5.2.3 Finding Poisson Probabilities
5.3 The Normal Distribution
5.3.1 Normal Probabilities

5.3.2 Transforming to the Standard Normal Distribution
5.3.3 Calculation of Normal Probabilities
5.3.4 Normal Probability Plot
5.4 The Central Limit Theorem
5.5 Approximations to the Binomial and Poisson Distributions
5.5.1 Normal Approximation to the Binomial Distribution
5.5.2 Normal Approximation to the Poisson Distribution
Concluding Remarks
Exercises
References

STUDY DESIGNS

6.1 Design: Putting Chance to Work 6.2 Sample Surveys and Experiments 6.3 Sampling and Sample Designs 6.3.1 Sampling Frame 6.3.2 Importance of Probability Sampling 6.3.3 Simple Random Sampling 6.3.4 Systematic Sampling 6.3.5 Stratified Random Sampling 6.3.6 Cluster Sampling 6.3.7 Problems Due to Unintended Sampling 6.4 Designed Experiments 6.4.1 Comparison Groups and Randomization 6.4.2 Random Assignment 6.4.3 Sample Size 6.4.4 Single and Double Blind Experiments 6.4.5 Blocking and Extraneous Variables 6.4.6 Limitations of Experiments 6.5 Variations in Study Designs 6.5.1 The Cross-Over Design 6.5.2 The Case Control Design 6.5.3 The Cohort Study Design Concluding Remarks Exercises References

- INTERVAL ESTIMATION

7.1 Prediction, Confidence, and Tolerance Intervals 7.2 Distribution-Free Intervals 7.2.1 Prediction Interval 7.2.2 Confidence Interval 7.2.3 Tolerance Interval 7.3 Confidence Intervals Based on the Normal Distribution 7.3.1 Confidence Interval for the Mean 7.3.2 Confidence Interval for a Proportion 7.3.3 Confidence Interval for Crude and Adjusted Rates 7.4 Confidence Interval for the Difference of Two Means and Proportions Difference of Two Independent Means 7.4.1 Difference of Two Dependent Means 7.4.2 Difference of Two Independent Proportions 7.4.3 Difference of Two Dependent Proportions 7.5 Confidence Interval and Sample Size 7.6 Confidence Interval for Other Measures 7.6.1 Confidence Interval for the Variance 7.6.2 Confidence Interval for Pearson Correlation Coefficient 7.7 Prediction and Tolerance Intervals Based on the Normal Distribution 7.7.1 Prediction Interval 7.7.2 Tolerance Interval Concluding Remarks Exercises References

TESTS OF HYPOTHESES

8.1 Preliminaries in Tests of Hyppotheses 8.1.1 Definitions of Terms Used in Hypothesis Testing 8.1.2 Determination of Decision Rule 8.1.3 Relationship of the Decision Rule, á and â 8.1.4 Conducting the Test 8.2 Testing Hypotheses about the Mean 8.2.1 Known Variance 8.2.2 Unknown Varinace 8.3 Testing Hypotheses about the Proportion and Rates 8.4 Testing Hypotheses about the Variance 8.5 Testing Hypotheses about the Pearson Correlation Coefficient 8.6 Testing Hypotheses about the Difference of Two Means 8.6.1 Difference of Two Independent Means 8.6.2 Difference of Two Dependent Means 8.7 Testing Hypotheses about the Difference of Two Proportions 8.7.1 Difference of Two Independent Proportions 8.7.2 Difference of Two Dependent Means 8.8 Tests of Hypotheses and Sample Size 8.9 Statistical and Practical Significance Concluding Remarks Exercises References

NONPARAMETRIC TESTS

9.1 Why Nonparametric Tests? 9.2 The Sign Test 9.3 The Wilcoxon Signed Rank Test 9.4 The Wilcoxon Rank Sum Test 9.5 The Kruskal-Wallis Test 9.6 The Friedman Test Concluding Remarks Exercises References

ANALYSIS OF CATEGORICAL DATA

10.1 Goodness-of-Fit Test 10.2 The 2 by 2 Contingency Table 10.2.1 Comparing Two Independent Binomial Proportions 10.2.2 Expected Cell Counts Assuming No Association 10.2.3 The Odds Ratio – a Measure of Association 10.2.4 The Fisher’s Exact Test 10.2.5 Analysis of Paired Data: The McNemar Test 10.3 The r by c Contingency Table 10.3.1 Testing Hypothesis of Non Association: The Chi-Square Test 10.3.2 Testing Hypothesis of No Trend 10.4 Multiple 2 by 2 Tables 10.4.1 Analyzing the Tables Separately 10.4.2 The Cochran-Mantel-Haenszel Test 104.3 The Mantel-Haenszel Common Odds Ratio Concluding Remarks Exercises References

- ANALYSIS OF SURVIVAL DATA

11.1 Data Collection in Follow-Up Studies 11.2 The Life Table Method 11.3 The Product-Limit Method 11.4 Comparison of Two Survival Distributions 11.4.1 The Cochran-Mantel-Haenszel Test 11.4.2 The Log-Rank Test Concluding Remarks Exercises References

ANALYSIS OF VARIANCE

12.1 Assumptions for the Use of the ANOVA 12.2 One-Way ANOVA 12.2.1 Sums of Squares and Mean Squares 12.2.2 The F Statistics 12.2.3 The ANOVA Table 12.3 Multiple Comparisons 12.3.1 Error Rates: Individual and Family 12.3.2 Tukey-Kramer Method 12.3.3 Fisher’s Least Significant Difference Method 12.3.4 Dunnett’s Method 12.4 Two-Way ANOVA for the Randomized Block Design with m Replicates 12.5 Two-Way ANOVA with Interaction 12.6 Linear Model Representation of the ANOVA 12.6.1 The Completely Randomized Design 12.6.2 The Randomized Block Design with m Replicates 12.6.3 Two-Way ANOVA with Interaction 12.7 ANOVA with Unequal Numbers of Observations in Subgroups Concluding Remarks Exercises References

LINEAR REGRESSION

13.1 Simple Linear Regression 13.1.1 Estimation of Coefficients 13.1.2 The Variance of Y|X 13.1.3 The Coefficient of Determination (R2) 13.2 Inference about the Coefficients 13.2.1 Assumptions for Inference in Linear Regression 13.2.2 Regression Diagnostics 13.2.3 The Slope Coefficient 13.2.4 The Y-Intercept Coefficient 13.2.5 The ANOVA Summary Table 13.3 Interval Estimation for and

13.3.1 Confidence Interval for

13.3.2 Prediction Interval for

13.4 Multiple Linear Regression 13.4.1 The Multiple Linear Regression Model 13.4.2 Specification of a Multiple Linear Regression Model 13.4.3 The Parameter Estimates, ANOVA, and Diagnostics 13.4.4 Multicollinearity Problems 13.4.5 Extending the Regression Model: Dummy Variables Concluding Remarks Exercises References

LOGISTIC AND PROPORTIONAL HAZARD REGRESSION

14.1 Introduction to Logistic Regression 14.2 Simple Logistic Regression 14.3 Multiple Logistic Regression 14.4 Ordered Logistic Regression 14.5 Introduction to Proportional Hazard Regression Concluding Remarks Exercises References

- ANALYSIS OF SURVEY DATA

15.1 Introduction to Design-Based Inference 15.2 Complex Design and Unequal Selection Probability 15.2.1 Sample Weight 15.2.2 Poststratification 15.2.3 The Design Effect 15.3 Strategies for Variance Estimation 15.3.1 Replicated Sampling: A General Approach 15.3.2 Balanced Repeated Replication 15.3.3 Jackknife Repeated Replication 15.3.4 Linearization Method 15.4 Strategies for Analysis 15.4.1 Preliminary Analysis 15.4.2 Subpopulation Analysis 15.4.3 Descriptive Analysis 15.4.4 Contingency Table Analysis 15.4.5 Linear and Logistic Regression Analysis Concluding Remarks Exercises References

Appendices

A. BASIC MATHEMATIC CONCEPTS

B. STATISTICAL TABLES

B1. Random Digits B2. Binomial Probabilities B3. Poisson Probabilities B4. Critical Values for the t Distribution B6. Charts for Confidence Intervals for the Proportion B7. Critical Values for the Chi-Square Distribution B8. Factors, k, for Two-Sided Tolerance Limits for Normal Distribution B9. Critical Values for the Wilcoxon Signed Rank Test B10. Critical Values for the Wilcoxon Rank Sum Test B11. Critical Values for the F Distribution B12. The Studentized Range for the Kramer-Tukey Procedure B13. The Studentized Range for the Dunnett ProcedureC. SELECTED GOVERNMENTAL BIOSTATISTICAL DATA

C1. Population Census Data C2. Vital Statistics C3. Sample Surveys C4. Life TablesD. SOLUTIONS TO SELECTED EXERCISES

## Details

- No. of pages:
- 528

- Language:
- English

- Copyright:
- © Academic Press 2007

- Published:
- 14th December 2006

- Imprint:
- Academic Press

- eBook ISBN:
- 9780080467726

- Hardcover ISBN:
- 9780123694928

## About the Author

### Ronald Forthofer

### Affiliations and Expertise

Boulder County, Colorado, U.S.A.

### Eun Lee

### Affiliations and Expertise

Oregon Health Science University, Portland, U.S.A.

### Mike Hernandez

Mike Hernandez has been working as a statistical analyst in the Department of Biostatistics at the MD Anderson Cancer Center for over 10 years. Working in a large medical center, he has developed an expertise in doing collaborative research spanning several disciplines from health disparities to clinical trials. He has coauthored over 40 peer-reviewed manuscripts, and is a co-author of: Biostatistics: A Guide to Design, Analysis, and Discovery 2nd ed.

### Affiliations and Expertise

Anderson Cancer Center, Houston, TX, USA

## Reviews

"This book succeeds in all aspects: It is clearly written, it has excellent coverage of the fundamentals of statistical methodology...and it is informative through its novel statistical presentation...." - *Journal of the American Statistical Association*