Mentors and Project Descriptions

winter 2025 Projects


Andrew Zhang: Large Deviations

Prerequisites: Familiarity with probability and the topology of R^d

Large deviations is the study of rare events. For example, let X_j be i.i.d and represent the claims an insurance company has to pay out in month j. If each month the insurance company earns C in premiums, then we want to estimate P(X_1+…+X_n < nC). Note that the Central Limit Theorem fails to give an answer, as it provides information about the moderate deviation P(X_1+…+X_n < √n C).

We will discuss Sanov’s Theorem, Cramér’s Theorem, the Gärtner-Ellis Theorem, and applications. We will follow the book by Dembo and Zeitouni.



Cindy Elder: How to use numbers to make the world a better place

Prerequisites: No firm prerequisite. Some coding helpful but happy to help intro coders.

We have a statistics project group (currently 2 faculty advisors and 2 grad students) to support the UW School of Law Innocence Project, and we are trying to use statistical methods to estimate the number of innocent people in prison, based on the current data on known exonerations. We have also been modeling how official misconduct by police officers/prosecutors is associated with race, gender and age of the innocent defendant.

If you are an undergrad who want to work on police accountability, law and statistics, and social justice, I would be happy to work with you!



Ethan Ancell: Survival Analysis

Prerequisites: Stat 311 or Stat 390 required, Stat 394 preferred but not required

Survival analysis is a branch of statistics that is focused on modeling time-to-event data, where the event of interest could be death, recovery, failure, or any event with a defined endpoint. This branch of statistics is particularly useful in biological, medicinal, and engineering applications of statistics. In this DRP we will loosely be following this online source.

Note: This project will have two undergraduate mentees working on the project, and the project is currently open to accept one more applicant.

Note 2: The linked source sadly seems to be down at the moment… If you’re hoping to get a sense for what survival analysis entails for the sake of applying to a project, a little bit of Google searching should get you on the right path. (:



Kayla Irish: Logistic Regression and GLMs

Prerequisites: Having taken MATH/STAT 395 or 342 is recommended, and MATH/STAT 394 or STAT 340 is required. STAT 311 or STAT 390 is also required.

Logistic regression is a statistical tool for modeling the probability of binary outcomes, making it helpful for applications ranging from predicting customer behavior to diagnosing diseases. This project will provide a comprehensive exploration of logistic regression, focusing on understanding its underlying assumptions, how the model is fitted, and what its output provides. We will examine what it means for the model to be correctly specified, delve into its mathematical foundation, including the score equation, and explore contexts where logistic regression may decrease efficiency in covariate adjustment. Expanding beyond logistic regression, we will investigate how it relates to and differs from other generalized linear models, using an e-textbook as our primary resource for learning.



Kenny Zhang: Sensitivity Analysis in Causal Inference

Prerequisites: Stat 311 or Stat 390 is required. More advanced statistics/math class will be a plus.

Sensitivity analysis in causal inference evaluates how robust causal conclusions are to violations of the assumptions underlying the causal model. It is particularly focused on understanding the potential impact of unmeasured confounders, model misspecification, or measurement errors on the estimated causal effect. This DRP will serve as an introduction to these concepts.



Leon Tran and Nila Cibu: Theory of Gambling

Prerequisites: Real analysis at the level of Math 424. Math 425 and 426 would be great to have too, but can be learned during the project.

In a game of chance, how do I gamble well? That is, how do I come up with a strategy where I’ll end up with a lot of money? For what types of games is this impossible?

These questions are essential in finance and machine learning, for example. They can be given a satisfying answer when phrased in the language of measure theory. Our goal is to teach you this foundation: we will work through the book “Probability with Martingales” by David Williams as far as possible. The book is intended for an undergraduate audience!

This project can be extended to the spring quarter based on student interest; there’s a lot to cover. Past the fundamental measure theory and martingale theory, the topics can be chosen based on student interest too.

Note: We will be accepting two undergraduate applicants to work on this project.



Nina Galanter: The Target Trial Framework for Causal Inference

Prerequisites: An introductory statistics course

Research in medicine and public health often involves answering causal questions. Randomized trials are the gold standard for answering these questions but are sometimes impractical or unethical. Instead, we conduct observational (nonrandomized) studies, in which it is crucial to avoid bias due to confounding and other factors. The target trial framework is valuable for avoiding bias in observational analyses for causal effects. In this project, we will learn causal inference basics, use the target trial framework to explore types of biases in observational data, and introduce estimation methods. We can focus more on study design and bias or more on estimation depending on student interest and background.



Ronan Perry: Tacking the reproducibility crisis and abuse of p-values-- e-values and universal inference

Prerequisites: The student should have a strong foundation in statistical inference, namely hypothesis testing and p-values. Stat 394 or a stronger probability foundation will be necessary. No computing experience is needed. Course content can be adapted to the background of the student, but will be more mathematical in nature as opposed to applied.

We will conduct a guided reading and discussion of the online textbook: Hypothesis Testing with E-values by Ramdas and Wang, 2024. As the authors put it, “the recent crisis of scientific reproducibility — largely related to the use and misuse of p-values (especially peeking at p-values and optional stopping and continuation of experiments) — calls for methodologies that are statistically justifiable under various new and complicated environments. E-values are one tool (though certainly not the only one) to address this challenge, because they benefit from their simple definition, natural connections to game-theoretic probability and statistics, flexibility and robustness in multiple testing under dependence, and their central role in anytime-valid statistical inference. Equipping applied statisticians with the knowledge of e-values has visible benefits to the sciences and for information technology companies.”



Rui Wang: Introduction to Conformal Inference

Prerequisites: Stat 394 (required), Stat 395/Stat 396(highly recommanded), Stat 341/Stat 342 (highly recommanded), familiarity with hypothesis testing, proficiency in R.

Conformal inference is a statistical framework that generates reliable prediction intervals for the outputs of any machine-learning algorithm, ensuring the intervals meet a predefined coverage level. This project will focus on Chapters 1–3 of the recent textbook Theoretical Foundations of Conformal Prediction. Students will have the opportunity to explore the theoretical properties of conformal inference and may apply the methodology to real-world datasets.



Simon Nguyen: Active Learning

Prerequisites: (Stat 311 or Stat 390) and Stat 341 and Stat 342

Collecting labeled data to train data-hungry modern artificial intelligence (AI) and machine learning (ML) models can be expensive or time-consuming. This challenge arises in a wide range of applications: sentence classification, image labelling, and verbal autopsy. In such scenarios, strategically determining which observations merit labeling will greatly reduce data redundancy and improve the learning of covariate-label relationships.

To address time and budget constraints, active learning allows researchers the freedom to strategically choose which observations to label. The key task in active learning is choosing the most informative observations that will enhance the predictive quality of the model when labelled. By iteratively training the model and adaptively querying for labels, active learning allows for more efficient use of resources while maintaining high model accuracy.

Note: This project is closed to new applicants.



Weitao Wang: Introduction to Structural Causal Models

Prerequisites: Basic probability theory and statistics. Familiarity with R/Python.

Originating from a synthesis of ideas in statistics, economics, and computer science, structural causal models (SCMs) provide a rigorous framework to represent and infer the causal structure underlying complex systems. We will study SCMs and methods for estimating causal relations from finite observational data. Before midterm, we will select a research paper of interest to explore a real-world application. If time permits, student will implement algorithm to identify causal relations from a real scientific dataset.

Our readings will be drawn from Causal Inference in Statistics: A Primer and Elements of Causal Inference: Foundations and Learning Algorithms.



Yuhan Qian: Introduction to Kernel Methods for Causal Inference

Prerequisites: N/A

Building on my Fall 2024 project, we will introduce the key concepts in causal inference and extend the discussion to the application of Gaussian processes. Then, the project will focus on kernel methods for causal inference, particularly on causal functions, and explore their application in complex clinical trial designs, such as platform trials.

Note: This project is closed to new applicants.