Speaker: Qiaoyue Tang, UBC Satistics Co-op M.Sc. Student
Title: Classification of rare cancer subtypes
Abstract: In many biomedical data, the goal is to predict outcomes such as disease subtypes given a set of features such as patients’ characteristics, RNA expression, images, etc. These data are usually imbalanced where the interest lies in properly classifying minority outcomes represented by rare subtypes. High-dimensionality in these data often makes the problem more difficult. In this talk, I will discuss common sampling strategies and algorithm-level approaches to handle classification tasks with the imbalanced condition.
Speaker: Jonathan Steif, UBC Satistics Co-op M.Sc. Student
Title: Histone Modifications in Diverse Human Cell-Types: An Integrative Analysis of 70 ChIP-seq Datasets
Abstract: Chromatin immunoprecipitation followed by high-throughput sequencing (ChIP-seq) is a relatively new advancement in the study of protein-DNA binding. High costs associated with deep sequencing are still a limiting factor for most researchers, leading to published datasets with low numbers of biological replicates and often no technical replicates. These inadequate sample sizes have left several rudimentary questions relating to the consistency and variability of protein-DNA binding unanswered. Namely, are certain genomic regions consistently marked by proteins in samples of the same disease or cell-type? Are these marked genomic regions observed across samples of different cell-types? To what extent does the choice of signal detection algorithm (peak-caller) bias results?
This integrated analysis of 70 datasets available via the International Human Epigenome Consortium employs machine learning techniques to examine the variability in the enrichment of six histone modifications in both cancerous and normal human tissues. Traditional peak-calling algorithms are limited in that they are incapable of handling multiple ChiP-seq samples as inputs. The statistical assumptions behind a range of popular algorithms are critiqued, motivating the need for new algorithms better suited for large-scale analyses.