Doctoral candidate in Biostatistics, Yi Yang, will present:
“Bayesian Hierarchical Models for Multi-Variant and Multi-Trait Genome-Wide Association Studies”
PhD Advisers: Lin Zhang and Saonli Basu
Abstract: While genome-wide association studies (GWASs) have been widely used to identify associations between complex diseases and genetic variants, standard single-variant and single-trait analyses often have limited power when applied to scenarios in which variants are in linkage disequilibrium, occur at low frequency, or are associated with multiple correlated traits. In this dissertation, we propose three Bayesian hierarchical models for multi-variant and multi-trait GWASs based on the hierarchically structured variable selection (HSVS) framework: the generalized fused HSVS (HSVS-GF), the adaptive HSVS (HSVS-A), and the multivariate HSVS (HSVS-M). HSVS is a discrete mixture prior composed of a point mass at zero and a multivariate scale-mixing normal distribution for modeling the effects of variants. As an extension and development of the HSVS framework, the proposed methods have the flexibility to account for various correlation structures, which allows them to extensively borrow strength from multiple correlated variants and traits. As Bayesian methods, they can also integrate complex genetic information into the priors and thus boost the power by leveraging information from various sources. In addition to testing associations, the proposed methods in the Bayesian framework also produce posterior effect estimates for individual variants simultaneously, a distinctive and useful feature that most of the competing methods do not possess. Specifically, HSVS-GF is a pathway-based method that uses summary statistics and pathway structural information to identify the association of a disease with variants in a pathway. HSVS-A is a set-based method that tests the association of a continuous or dichotomous trait with rare variants in a set and estimates the effects of individual rare variants. HSVS-M is a multi-variant and multi-trait method that uses summary statistics both to test the association of variants in a gene with multiple correlated traits and to estimate the strength of association of the gene with each trait. Through analysis of simulated data in various scenarios and GWAS data from the Wellcome Trust Case Control Consortium and the Global Lipids Genetics Consortium, we show that the proposed methods can substantially outperform the competing methods and identify novel causal variants.