Masters candidate in Biostatistics, Katrina Harper, will present:
“Clustering Methods for Single-Cell RNA-Sequencing Data”
Plan B Adviser: Erika Helgeson
Abstract: As next-generation sequencing technology has advanced, access to single cell RNA-sequencing (scRNA-seq) data has grown along with an interest in its analysis via clustering. scRNA-seq data has two qualities that require attention when it comes to analysis: high-dimensionality and zero inflation (i.e. dropout events). There is not yet a scientific consensus on the best clustering approach to analyzing this type of data. This paper explores data pre-processing steps, as well as various clustering methods to optimize performance in analysis of scRNA-seq data. For data pre-processing, we consider imputation and data transformation. In clustering methods, we consider various distance metrics as well as clustering algorithms. We implement combinations of these four parameters on two real scRNA-seq datasets. Overall, the best combination of parameters as assessed by adjusted-rand index score and the full clustering table utilized SAVER imputation, centered log-ratio transformation, φs proportional distance, and K-mediods clustering.