To Join Via Zoom: To join this seminar, please request Zoom connection details from headsec [at] stat.ubc.ca.
Time: 11:00am - 11:30am
Speaker: Kevin Chern, UBC Statistics MSc student
Title: On estimating marginal likelihoods via Monte Carlo methods based on distribution continua
Abstract: Computing the marginal likelihood (ML) of a model is an integral part of the Bayesian paradigm for model comparison. In practice, computing the ML equates to evaluating a high-dimensional integral, rendering standard numerical methods impractical. Consequently, more sophisticated alternatives such as Monte Carlo (MC) methods are often employed instead. State-of-the-art MC methods include Parallel Tempering (PT) and Sequential Monte Carlo (SMC), both of which leverage a sequence of tempered distributions to improve the efficiency of estimators. PT and SMC have been studied extensively but independently, leaving a gap in the literature for comparing the two. Furthermore, it is unclear how one should allocate a fixed computational budget between two parameters in SMC: the length of the sequence of distributions and the number of particles. While adaptive SMC algorithms exist for automatically determining sequences of distributions, runtimes are typically random and unknown a priori to inference, further complicating the budget allocation problem.
In an attempt to address these challenges, we benchmark the performances between PT and SMC on 14 statistical models including common models from the physics and biology literature. Our first contribution provides practical recommendations for selecting estimators given characteristics of a model, e.g., multimodal, discrete, or continuous targets. Second, we propose a strategy for approximating the length of the sequence of distributions for an adaptive SMC algorithm. By employing this adaptive SMC algorithm in tandem with an accurate approximation of the random runtime, we provide practical recommendations for allocating budgets in SMC.
Time: 11:30am - 12:00pm
Speaker: Matteo Lepur, UBC Statistics MSc student
Title: A Bayesian Nonparametric Model for Pan-Cancer Analysis
Abstract: Cancers from different tissue types can share a latent structure reflecting commonly altered gene pathways. It is difficult to cluster cancer patients based on this latent structure because the tissue of origin often dominates the latent structure effect. We propose a Bayesian nonparametric model that accounts for the tissue effect and clusters based on a latent structure using a Dirichlet Process prior. Our approach learns the tissue effect by using tissue parameters in a supervised learning setting, while simultaneously learning the latent structure based on the resulting residuals in an unsupervised setting. We demonstrate our model by showing results on synthetic data, semi-synthetic data, and a publicly available dataset from the International Cancer Genome Consortium (ICGC).