Spring 2022

We may add 1-2 more projects before the quarter begins. All students who apply will be automatically considered for the unlisted projects.

Andrea Boskovic and Harshil Desai: NBA Analytics and Machine Learning

  • Prerequisites: Some experience in R or Python; some knowledge about basketball
  • Have you ever wondered how to predict which NBA rookie will become an all star or wondered how teams choose which players to draft? In this project, we will explore NBA data to make a model that predicts something related to basketball. We will start with an introduction to basic machine learning models, learn how to implement models in R or Python, and evaluate the models we've created. Potential directions could include (but are definitely not limited to) ranking players based on box scores and advanced stats, predicting who will be the MVP, or predicting a team's odds of making the playoffs in a given year. We are willing to mentor two students!
  • Nina Galanter: Optimal Treatment Rules: Causal Inference and Statistical Learning

  • Prerequisites: Some familiarity with conditional probability, linear regression, and R
  • In many biomedical and public health applications of statistics we are interested in determining the best treatment. However, people and their specific situations will vary and in some cases one treatment does not fit all! Instead, we can create a treatment rule which will take in a subject and their variables and predict the best treatment for them. Optimal treatment rules involve both causal inference and statistical learning as we create rules based on estimated treatment effects. This project will first go over causal inference foundations and then explore Q-learning methods for treatment rules, which might include regression, penalized regression, or generalized additive models depending on time and the student's background. We will use R to evaluate the methods with simulated data.
  • Anna Neufeld and Alan Min: Introduction to Computational Biology

  • Prerequisites: Programming experience (preferably in R). Knowledge of probability distributions at the level of Math/Stat 394 or Stat 340 is preferred but not required.
  • Given massive amounts of data available from next generation genome sequencing, sequence alignment methods are necessary to align genomic reads to reference genomes. Alignment tools make it possible to identify genetic variation and mutation leading to biological discovery. We plan to work with the textbook "Computational Genome Analysis," by Deonier, Waterman, and Tavare (available for free online). We will start with some background reading on necessary biological context, and then we will read about statistical concepts related to sequence alignment problems that are common in modern computational biology. After gaining this necessary background, we will learn about modern algorithms for sequence alignment. We are hoping to mentor two students!
  • Reading and Research Opportunity on Voting

    Mentors: Prof. Elena Erosheva, Michael Pearce, Prof. Conor Mayo-Wilson

  • Prerequisites: Prerequisites: Computational skills (R required; other knowledge and experience, e.g., with python, is desirable). Preference given to Statistics and CSE majors and to candidates with interest and possibility to continue with the project in Summer and Fall 2022
  • In peer review settings, groups or panels of experts are tasked with evaluating submissions such as grant proposals or job candidate materials. For each submission, individual input is often given as a numeric score or a letter grade. The average or median of such scores is often used to summarize the collective opinion of a panel of experts. In this project, we will consider other ways to aggregate expert opinions by drawing a parallel between panel decisions and elections or voting. All voting procedures have two key features: types of input that are used and how these inputs are aggregated. Examples of voting procedures include majority rule, Borda rule, single transferrable vote, and majority judgement. Voting procedures matter in that a choice of voting procedure can change panel outcomes or which candidate(s) or proposal(s) are preferred. Social choice theory demonstrates that (a) no voting procedure for selection of one out of three or more choices can satisfy simultaneously a small number of natural desiderata (this result is known as Arrow's Impossibility Theorem), that (b) every voting procedure satisfy some desiderata but not others, and that (c) election outcomes can differ depending on what voting system is used. The points (a)-(c) constitute compelling reasons in favor of better understanding the influence of aggregation methods on panel-level outcomes: we will critically assess properties of voting procedures and whether these properties should be required or desired in panel opinion aggregation methods used in peer review. The project will involve applying social choice algorithms (e.g., Borda rule and Majority Judgement) to de-identified data on panel grant peer review.
  • Antonio Olivas: Estimation for cancer screening models using deconvolution

  • Prerequisites: Calculus (MATH 126) and exposure to probability theory (STAT 340).
  • Cancer screening programs are an important component for secondary cancer prevention. To understand the conditions under which a cancer screening program provides the best benefit, mathematical models are used to estimate relevant quantities using information from cancer screening trials. In the natural history of a cancer, the time to cancer onset (subclinical) and the sojourn/latent time (time between onset and clinical appearance) are two quantities of interest, but impossible to know separately. However, by using a screening tool we obtain some information that allow us to differentiate between these two components. In this project we will study a mathematical model that uses information at the aggregated level from a cancer screening trial to estimate mean time to onset, mean sojourn time, and sensitivity of the screening test, via the deconvolution formula and maximum likelihood estimation.
  • Rrita Zejnullahi: Introduction to Human Rights Statistics

  • Prerequisites: Some exposure to survey sampling and regression analysis.
  • In this DRP project, we consider the application of statistics methodology to Human Rights. Topics include missing females, criminal justice, violence against women, hunger and poverty. By the end of the project, we will be able to describe ways that statistical methods can be applied to human rights problems and identify areas that need development of new methods. In the first half, we will read and discuss research papers. In the latter half, we will pick a paper to replicate, with the exact choice of topic at student’s discretion. This project will be mostly remote (meetings via zoom!)