winter 2023 Projects


Alex Bank: Cutting-Edge Sports Statistics

Student: Luke VanHouten
Slides | Writeup
Prerequisites: Experienced with Python or R

Have you ever wondered how an NBA player creates space for their shot? Or how an MLB slugger knows when to crush a fastball? Or maybe you wonder how a soccer player decides where to run when they are away from the ball? This project will be driven by the student and will explore cutting-edge models being used in sports statistics. We will select a research paper from a the Sloan Sports Conference and study the data and math behind the models used in the paper. We will apply the techniques we studied to implement our own model that answers a question of interest. Potential directions include (but are not limited to) spatial models for player positioning, optimizing shot selection, projecting top draft picks, and identifying inefficiencies in Vegas lines. Students who are interested in this project should look through the conference papers from various years at the link below.

https://www.sloansportsconference.com/conference/2022-conference#research-papers



Andrea Boskovic: Proportional Hazards Models

Student: Dante Ramirez
Slides | Writeup
Prerequisites: Some experience in survival analysis

Researchers in biomedical fields are often interested in the time it takes for a particular outcome of interest to occur, i.e., time to death. Survival models, which relate the time that passes before an event occurs to some covariates, can be used to answer these questions. In this project, we will be investigating a specific type of survival models: proportional hazards models, where a unit increase in a given covariate is multiplicative with respect to the hazard rate.



Antonio Olivas: Statistical evaluation of medical tests for classification and prediction

Student: Sephora-Clotilde Zoro
Slides | Writeup
Prerequisites: None

In medicine, there exist many medical tests for diagnosing a disease or for learning about an individual’s prognosis once a diagnosis has been established. However, how do we know how accurately those tests diagnose the diseases they are supposed to diagnose? Also, when there is more than one diagnostic test for the same disease, how do we know which one is better? Moreover, when the diagnostic test corresponds to a continuous variable, how do we know the threshold to differentiate between having or not having the disease?

In this project, we will learn how to evaluate the performance of continuous medical tests using the receiving operating characteristic (ROC) curve. The ROC curve is very popular in medicine because it conveys graphically the performance of the test. Using properties of the ROC curve, we will learn different ways of comparing two or more medical tests, and different ways of choosing the optimal threshold based on the condition of interest.

If time permits and depending on the student’s interests, we can also study how to evaluate the performance of a continuous medical test when the performance and optimal threshold depends on other individual characteristics such as age and sex.



Charlie Wolock: Introduction to prediction

Student: Liuyixin Shao
Slides | Writeup
Prerequisites: Basic familiarity with R, introductory statistics. Some knowledge of regression would be useful.

Many classical statistical methods are focused on learning associations between variables. However, we may also be interested in prediction – making a guess about an unknown or future outcome on the basis of whatever information we have access to. In this project, we’ll learn about the unique challenges of prediction. We’ll discuss how to use traditional statistical methods to make predictions and start to explore more modern machine learning techniques. This project will have a strong focus on thoughtful construction and evaluation of prediction models. We will identify an interesting dataset and implement some of our own prediction procedures using R.



Ethan Ancell: Statistics in Neuroscience

Student: David Ye
Slides | Writeup
Prerequisites: Students should have an understanding of hypothesis testing, as well as familiarity with R and RStudio.

Neuroscience is a fascinating and rapidly moving field enabling us to better understand how the brain works. In the quest for understanding the brain, neuroscientists use special technology in experimental trials to track neuron behavior across time, and pair this data with events occurring during the experiment. Because there is so much data generated from these trials, there are lots of fascinating statistical questions to be answered when analyzing this data. In this directed reading project, students will analyze an example dataset from a real neuroscience experimental trial conducted here at UW to try and answer whether the neurons in a mouse are actually responding to external stimuli in the experiment. Broadly speaking, this directed reading program project will be an excellent opportunity for undergraduate students to try their hands at a real application of statistics in neuroscience, as well as learn about some of the difficulties of conducting hypothesis tests in environments where certain assumptions of classical hypothesis tests are not fully met.



Nina Galanter: Introduction to Survival Analysis

Student: Hannah Chiu
Slides | Writeup
Prerequisites: Some knowledge of R or another programming language, understanding of expected value and conditional probability, some familiarity with linear regression

In medicine and public health, we are often interested in answering questions about the time until an event occurs. For example, what is the median recovery time from some surgery? Or: does a treatment prolong the time until death for patients with a particular cancer? Because of this, Survival Analysis, which works with these time-to-event outcomes, is an important area of Biostatistics. Most time-to-event data is censored - we cannot observe the event for everyone because we lose track of some subjects or something else happens to them. In this project, we will learn about survival analysis methods for censored data, including Kaplan-Meier curves, the Logrank test, and Cox regression. We may cover other topics based on time and student interest. This project will culminate in either a real data analysis using a dataset of the student’s choice or a simulation study.



Vydhourie R.T. Thiyageswaran: Random walks on graphs

Student: Noah McMahon
Slides | Writeup
Prerequisites: Some basic exposure to probability. We would still properly review basic probability.

We would study what a random walk is, followed by a little bit of graph theory. Finally, we would go over some examples of where thinking about random walks on graphs has been interesting approaches to solving more general problems.