Mentors and Project Descriptions

spring 2025 Projects

Antonio Olivas: Function Estimation using Reproducing Kernel Hilbert Spaces

Prerequisites: Real analysis at the level of Math 424

Project targeted for: Junior/Senior

Number of students: 1

Reproducing Kernel Hilbert spaces (RKHSs) are a particular case of Hilbert spaces defined by reproducing kernels that enjoy a geometric structure similar to ordinary Euclidean space, and depending on the kernel, may include a reasonably broad class of functions. RKHSs have been widely used to estimate functions that involve optimizing over function spaces which appear in many statistical problems such as interpolation, regression, and density estimation, and they are attractive because many optimization problems over these spaces reduce to relatively simple calculations involving the kernel matrix. In this project, we will read most of Chapter 12 of the textbook “High-Dimensional Statistics. A Non-Asymptotic Viewpoint” by Martin J. Wainwright, related to RKHSs. The goal of this DRP is to understand the properties of the RKHS, and to apply this tool to a real-life problem.

Dasha Petrov: Intro to Deep Learning

Prerequisites: Basic calculus, linear algebra, and probability/statistics, and some experience with R/Python

Project targeted for: Junior/Senior

Number of students: 1

Deep learning is becoming an increasingly popular tool in biostatistics research, with applications in various -omics fields (such as transcriptomics) and personalized medicine. In this DRP, we will explore the foundations of deep learning, focusing on the mathematical and computational principles behind neural networks. We will follow the interactive textbook Dive into Deep Learning, combining weekly readings with Python coding exercises to develop both theoretical understanding and practical experience. The goal is to cover Chapters 1–5, but we will adjust the pace as needed.

Ethan Ancell and Clinton Alden: Statistics for Avalanche and Weather Forecasting

Prerequisites: Strong coding background (either formally through a course, or through prior coding projects)

Project targeted for: Sophomore/Junior/Senior

Number of students: 1-2

In this DRP we will write code to automate downloads of data from a variety of weather sources such as NOAA (National Oceanic and Atmospheric Administration), NWS (National Weather Service), NWAC (Northwest Avalanche Center), and the ECMWF (European Center for Midrange Weather Forecasts), and build tools to visualize this data and explore relationships between weather and avalanche data.

Strong preference will be given to applicants who have interests in working in climate or weather, or have particular interest in avalanche forecasts.

Kayla Irish: Data Visualization

Prerequisites: Stat 311 or Stat 390

Project targeted for: Junior/Senior

Number of students: 1

This project is closed to new applicants.

Communicating statistical insights effectively is as important as performing the analysis itself. Well-designed visualizations reveal patterns in data and make results more interpretable. Poor visualization can mislead audiences and introduce bias to the reader.

This project will explore principles of effective data visualization through reading and practice. Depending on student interest, we may begin by reading through Fundamentals of Data Visualization by Claus Wilke or The Visual Display of Quantitative Information by Edward Tufte. We will then apply these principles by creating visualizations for real-world datasets using R. Depending on interest, we may also explore interactive visualizations.

By the end of the project, we will gain a deeper understanding of how to design clear, informative, and aesthetically pleasing statistical visualizations and will create a small portfolio of visualizations that we made.

Leon Tran and Nila Cibu: Theory of Gambling

Prerequisites: Stat 311/390 (required); Stat 394 (recommended); Math 208/Linear algebra (recommended); Familiarity with R/Python (recommended)

Project targeted for: Sophomore/Junior

Number of students: 2

This project is a continuation of a DRP from a previous quarter and is closed to new applicants.

Real analysis at the level of Math 424. Math 425 and 426 would be great to have too, but can be learned during the project.

In a game of chance, how do I gamble well? That is, how do I come up with a strategy where I’ll end up with a lot of money? For what types of games is this impossible?

These questions are essential in finance and machine learning, for example. They can be given a satisfying answer when phrased in the language of measure theory. Our goal is to teach you this foundation: we will work through the book “Probability with Martingales” by David Williams as far as possible. The book is intended for an undergraduate audience!

Ronan Perry: Introduction to Bandits

Prerequisites: An interest and familiarity with probability at the STAT 394 will be necessary to understand bandit algorithms. Familiarity with statistical inference (i.e. confidence intervals and p-values) is highly recommended.

Project targeted for: Junior/Senior

Number of students: 1

What are Bandits you ask? Imagine you are running a medical trial and have K possible treatments. How do you assign treatments to your patients and adjust your assignments as you observe the treatment efficacies so as to maximize the good outcomes? If you can pick stocks and observe their returns each day, how do you choose where to invest your money? These are bandit problems, a fundamental aspect of reinforcement learning. We will read textbook chapters and tutorials to learn how to best solve these problems. Depending on student interest, we may pursue extensions including (but not limited to): (i) coding implementations to compare methods or (ii) studying how to perform valid statistical inference in these settings.

Simon Nguyen: Active Learning

Prerequisites: (Stat 311 or Stat 390) and Stat 341 and Stat 342

Project targeted for: Junior/Senior

Number of students: 1

This project is a continuation from last quarter and is closed to new applicants.

Collecting labeled data to train data-hungry modern artificial intelligence (AI) and machine learning (ML) models can be expensive or time-consuming. This challenge arises in a wide range of applications: sentence classification, image labelling, and verbal autopsy. In such scenarios, strategically determining which observations merit labeling will greatly reduce data redundancy and improve the learning of covariate-label relationships.

To address time and budget constraints, active learning allows researchers the freedom to strategically choose which observations to label. The key task in active learning is choosing the most informative observations that will enhance the predictive quality of the model when labelled. By iteratively training the model and adaptively querying for labels, active learning allows for more efficient use of resources while maintaining high model accuracy.

Stefan Inzer: An Introduction to Time (Series)

Prerequisites: Familiarity with linear algebra and analysis is very useful. Also helpful will be familiarity with linear regression, the normal distribution, and some experience in R or in Python (like STAT 311).

Project targeted for: Junior/Senior

Number of students: 1

Forget i.i.d. samples. In time series, observations are usually correlated for the fact they appear in a sequence over time. The goal of this DRP will be to learn about time series analysis and see how valuable they are in real-world applications.

In the first part of the project, we will read through the basics of time series analysis, from fitting a linear model to forecasting and STL (seasonal-trend) decomposition. In the second part, we will explore modern methods of time series, with an emphasis on application. We will find different real-life examples and see how topics as varied as disease spread, natural language processing, and musical source separation involve time series analysis in one way or another. Time permitting, we will do a bit of coding alongside reading.

Most of the introductory reading will come from Ryan Tibshirani’s notes for Stat 153, a class offered at the University of California, Berkeley. Our reading of novel methods and applications will be more open-ended, but we will try and identify 1-2 articles.

Yuhan Qian: Introduction to Generative Models

Prerequisites: Stat 311/390 (required); Stat 394 (recommended); Math 208/Linear algebra (recommended); Familiarity with R/Python (recommended)

Project targeted for: Sophomore/Junior

Number of students: 1

In this DRP, we will explore key generative modeling approaches, including Generative Adversarial Networks (GANs), Variational Autoencoders (VAEs), Flow-based models, and Diffusion models. We will read foundational papers, work through mathematical derivations, and discuss practical applications. The specific focus will be shaped by the mentee’s interests, whether theoretical insights, algorithmic implementation, or real-world applications.