Potential projects/topics: Missing data are often unavoidable in data science and statistics. Participant non-response in surveys, attrition in longitudinal studies, linking multiple data sources, and errors in the data collection process all contribute to missingness. However, the simplest and most common approach to missingness, complete case analysis--using only the observations/rows with fully-observed variables/columns--can often produce misleading or erroneous conclusions. Using new tools for data imputation (i.e., data-driven methods to fill in the missing values), this project will study the implications of correct vs. incorrect handling of missing values across several widely-used datasets---and potentially uncover hidden mistakes among existing analyses.
Potential skills gained: Data analysis; statistical computing; data visualization; statistical modeling; Bayesian analysis; missing data
Required qualifications: Preferably STAT 410 and familiarity with R
Direct mentor: Faculty/P.I., Graduate Student