winter 2024 Projects


Antonio Olivas: Estimating functions using Reproducing kernel Hilbert spaces

Student: David Sharkansky
Slides | Writeup
Prerequisites: Real analysis

Reproducing kernel Hilbert spaces (RKHSs) are a particular case of Hilbert spaces defined by reproducing kernels that enjoy a geometric structure similar to ordinary Euclidean space, and depending on the kernel, may include a reasonably broad class of functions. RKHSs have been widely used to estimate functions that involve optimizing over function spaces which appear in many statistical problems such as interpolation, regression, and density estimation, and they are attractive because many optimization problems over these spaces reduce to relatively simple calculations involving the kernel matrix.

In this project we will read most of Chapter 12 of the textbook “High-Dimensional Statistics. A non-asymptotic viewpoint” by Martin J. Wainwright, related to RKHSs. The goal of this DRP is to understand the properties of the RKHS, and to apply this tool to a real-life problem.



Ethan Ancell: Random Matrix Theory

Student: Abigail Cummings
Slides | Writeup
Student: Hansen Zhang
Slides | Writeup
Prerequisites: (Required): A good foundation in probability theory (Stat 394/395) and linear algebra (Math 208 or Math 340). (Optional and awesome): mathematical analysis at the level of Math 327.

This DRP will be a broad survey over some fundamental ideas in random matrix theory. Over the quarter we will follow some notes from the legendary UCLA mathematician Terrance Tao and study a set of tools used to prove the semicircular law, a fundamental result in random matrix theory concerning the asymptotic distribution of the eigenvalues of certain classes of matrices. If time is available in the quarter, we will explore the connections between this theorem and free probability, a “non-commutative generalization” of probability theory.



Facheng Yu: Sparse Linear Model in High Dimensions

Student: Puyuan Yao
Slides | Writeup
Prerequisites: Linear Algebra

High-dimensional statistics considers the situation when we have much more covariates than samples. To use least squares estimator in this setting, sparsity of linear coefficients should be assumed, and this often leads to a regularised optimization problem, Lasso. In this project, we mainly focus on the theory of Lasso under some reasonable assumptions. This reading mainly refers to Chapter 7, High-Dimensional Statistics: A Non-Asymptotic Viewpoint, Wainwright, M. (2019).



Kenny Zhang: Hypothesis Testing and Causal Inference

Student: Yuyang Sun
Slides | Writeup
Prerequisites: Level of STAT 394/341 knowledge is required.

We will start with the basic t tests and paired t tests, then study what will happen under model misspecification, potentially interference. If time permits, we will investigate rank based tests and multiple testing. Some of the problems are open in literature.



Pawel Morzywolek: Introduction to Causal Inference

Student: Casey Logan
Slides | Writeup
Prerequisites: STAT 311 (or equivalent), a bit of familiarity with R is a plus.

Causal inference is an emerging field of study aiming at identifying cause-and-effect relationships from the data, which is crucial for determining the effects of interventions. Causal questions are ubiquitous across all scientific disciplines, e.g. “What is the effect of a new medication in the population of interest?”, “What is the optimal time to initiate a treatment?”, “How does fertilizer affect crop yields?”, “How does education affect income?”, etc.

The project aims to provide an introduction to the field of causal inference. It will be based on the book “What If” (Hernan and Robins, 2020) and some introductory papers.



Ronan Perry: Causal inference: Regression and discontinuity designs

Student: Pranav Madhukar
Slides | Writeup
Prerequisites: Familiarity with basic probability and statistics. Familiarity with regression is highly recommended.

Estimating causal effects is hard, but can be possible using “natural randomization” such as differences in policy across time or geographic boundaries. This will be a guided reading of Chapter 5 of Mostly Harmless Econometrics, where we will discuss and learn about regression discontinuity design. We will begin with an overview of regression in general, and its role in causal inference, before moving onto discontinuity designs. Depending on student interest and time, we will focus more/less on theory vs. coding examples in R.



Yuhan Qian: Introduction to Tree-based Models

Student: Zikun Zheng
Slides | Writeup
Prerequisites: Familiarity with Python or R. Familiarity with the basics of probability and statistics.

Explore the dynamic world of tree-based models! From the basics of decision trees to the power of Random Forest, this project will first focus on classification and regression trees, and the ideas of boosting, bagging, and ensembling. Next, we will mainly explore Random Forest and XGBoost, the preferred tools when RTX4090 is out of reach. If time permits, we will talk about other intriguing topics like the isolation forest and deep forest.



Zhaoxing Wu: Classify High-Dimensional Data

Student: Bowen Dong
Slides | Writeup
Prerequisites: Some programming experience with R

In machine learning, classification is a task that assigns a class label to examples from the problem domain. However, high dimensionality poses significant statistical challenges and renders many traditional classification algorithms impractical to use. In this project, we will first learn or review (depends on student’s background) some classical supervised classification techniques and discuss the curse of dimensionality. Next, we will mainly explore Penalized Discriminant Analysis (PDA) which is designed to classify high-dimensional data as an extension of the classical Linear Discriminant Analysis. It classifies data by finding the optimal lower-dimension projections that reveal “interesting structures” in the original dataset. If time permits, students will implement PDA to analyze a real-life dataset of the student’s choice or some simple toy examples.