spring 2024 Projects
Alex Bank: Deep Learning on Sports Data
Student: Weixuan Liu
Slides | Writeup
Student: Minh Tran
Slides | Writeup
Prerequisites: Probability theory (Stat 394/395), Linear Algebra (Math 208 or Math 340), advance Python skills (PyTorch experience a big plus)Deep learning techniques—and artificial intelligence in particular—are having a cultural moment. In this DRP, we will dive into the math behind deep learning models and leverage these modeling techniques to analyze sports data. The chosen topic will be driven by the student’s interest. Some possible topics include (but certainly not limited to) capturing and analyzing biometric data, predicting whether or not a team will cover their betting line, or visualizing optimal field positioning.
We will break up the quarter into roughly the following schedule: 2-ish weeks identifying a problem and collecting data, 2-3 weeks reading materials on deep learning techniques and implementing toy models, 2-3 weeks creating model, 1-2 weeks producing the write up and visualizations. Preference will be given to students who participate in athletics and want to analyze data from their sport.
Antonio Olivas: Estimation for cancer screening models using deconvolution
Student: Huayue Zou
Slides | Writeup
Prerequisites: Calculus (MATH 126), exposure to probability theory (STAT 340), and experience with R.Cancer screening programs are an important component for secondary cancer prevention. To understand the conditions under which a cancer screening program provides the best benefit, mathematical models are used to estimate relevant quantities using information from cancer screening trials. In the natural history of a cancer, the time to cancer onset (subclinical) and the sojourn/latent time (time between onset and clinical appearance) are two quantities of interest, but impossible to know separately. However, by using a screening tool we obtain some information that allow us to differentiate between these two components. In this project we will study a mathematical model that uses information at the aggregated level from a cancer screening trial to estimate mean time to onset, mean sojourn time, and sensitivity of the screening test, via the deconvolution formula and maximum likelihood estimation.
Apara Venkat: Rashomon Effect
Student: Wenxin Xia
Slides | Writeup
Prerequisites: Basic probability and statistics, familiarity with linear regression and decision trees, proficiency in R.Statistics and machine learning have predominantly focussed on finding the best model, as defined by some loss function. However, in the real world, it turns out that there are often several statistically indistinguishable models that offer wildly different explanations. This is called the Rashomon Effect. By looking at the optimal model, we expose ourselves to the Rashomon effect and miss out on things the near-optimal models can teach us. There are a few directions we can take this project in depending on your experience and interests – learning theory and recent developments in this area, building a shiny app to illustrate this with some real or synthetic data, etc.
Ethan Ancell: Introduction to Statistics Research
Student: Hansen Zhang
Slides | Writeup
Prerequisites: N/A(closed to new applicants.)
This DRP will be an introduction to basic research techniques in statistics. Over the spring quarter, the mentee will prepare an undergraduate research paper for submission to the Undergraduate Statistics Research Project (USRESP) competition hosted by the American Statistical Association. At the beginning of the quarter, the mentor and mentee will identify an appropriate research area, and then get experience doing (1) a literature review, (2) identifying a potential extension to the existing literature, (3) conducting appropriate simulations, (4) framing the work in a compelling way.
Yuhan Qian: Great Ideas in Statistics
Student: Gefei Shen
Slides | Writeup
Prerequisites: Familiarity with the basics of probability and statistics.In this project, we will explore several great ideas that have had or will likely have a significant impact in statistics. The tentative topics include Kernel Prediction and Density Estimation, Tree-based Models, Double Machine Learning, Selective Inference, Conformal Predictions, and Variational Inference. Emphasizing breadth, we can tailor the depth and focus based on the mentee’s interests and knowledge.