Title : Predicting noncoding disease causal mutations in central nervous system through deep learning
Abstract:
The majority of GWAS mutations are in noncoding regions, making it challenging to pinpoint the true disease-causal mutations in a complex linkage disequilibrium (LD) block and quantify their causality. To address this problem, we developed a deep learning (DL) algorithm that accurately predicts cell type-specific enhancers and transcription factor binding sites (TFBSs) from raw DNA sequences and used it to guide the subsequent identification of phenotype-causal enhancer mutations. In a pilot study of 10 GWAS SNPs linked to brain diseases, whose causality had been previously validated experimentally, we precisely identified 8 out of 10 SNPs as causative within their corresponding LD blocks. Among 27,488 GWAS LD SNPs associated with schizophrenia, autism, and several other central nervous system (CNS) diseases, 3% reside in predicted TF binding sites active in the fetal brain, and out of these 3%, 11 SNPs are predicted as candidate causal. These results suggest that our algorithm can be applied to resolve LD blocks of other disease-associated SNPs, for which no experimental profiling has been done yet, and accurately identify disease-causative SNPs, as well as putatively affected regulatory mechanisms and pathways.