GWAS has been a routine approach to identify genes involved in
human complex disease or some other trait [1]. As GWAS serves as initiation of
future study of genetics and mechanisms, one key challenge for GWAS data
interpretation is to identify causal SNPs and to provide profound evidence on
the way in which they affect the trait [2]. ICSNPathway is a web server
developed to discover candidate causal SNPs and corresponding candidate causal
pathways from genome-wide association study (GWAS).
The input of ICSNPathway is the full list of GWAS SNP P-values. The output is the list of
candidate causal SNPs and candidate causal pathways with hypothesis of SNP
-> gene -> pathway(s), which represents that the candidate causal SNP
alters the role of its corresponding gene/protein in the context of the
pathway(s). Different from other methods and web tools, which focus either on
inferring candidate casual SNPs by annotating SNPs to genes (e.g. [3-4]) or on
identification of disease-related pathways by pathway-based analysis (e.g.
[5-6]), ICSNPathway implements a composed analytical pipeline to integrate
linkage disequilibrium (LD) analysis, functional SNP annotation and
pathway-based analysis (PBA). ICSNPathway facilitates to bridge the gap between
the current different methods/web tools and the gap between GWAS SNP P-values and biological mechanism. To
our knowledge, so far there is no web server available to provide the similar
function as ICSNPathway.
There are two key concepts in ICSNPathway. One is LD analysis,
which searches the SNPs in LD with the most significant SNPs to ensure to
capture more possible candidate causal SNPs based on the extended data set from
HapMap [7]. The other is functional SNP. ICSNPathway pre-selects candidate causal
SNPs based on functional SNPs, which are important for understanding the
underlying genetics of human health. Functional SNP is defined as SNPs that may
alter protein, gene expression or the role of protein in context of pathway.
The functional SNPs include deleterious and non-deleterious non-synonymous
SNPs, SNPs leading to gain or lost of stop codon, SNPs resulting in frame
shift, and SNPs in essential splice site
(the first two bp and last two bp of an intron) and in regulatory region
(i.e. DNase I hypersensitive sites which marks open chromatin, histone
modification sites, CCCTC-binding factor (CTCF) sites which characterize
insulator/enhancer elements, and transcription factor binding sites (TFBSs))
[8]. In ICSNPathway, we implement a PBA algorithm, as named i-GSEA (improved-gene set enrichment
analysis) developed by our research group [6], on the full list of GWAS SNP P-values to detect pathways associated
with traits. The key idea for i-GSEA
is to pick up pathways including a high proportion of significant genes to
study the combined effects of possibly modest SNPs/genes in complex disease.
There are mainly four steps of running procedures in
ICSNPathway: (1) search for SNPs in LD with the most significant SNPs based on
LD information of a specific HapMap population [7] and perform function
annotation on the SNPs and select the functional SNPs; (2) extract the
corresponding genes and pathways for the selected functional SNPs; (3) perform
pathway-based analysis (PBA) [6, 9] on GWAS SNP P-values by using the
pre-selected pathways as search space; (4) identify candidate causal SNPs and
pathways to generate hypothesis for disease mechanism.
Reference
[1] McCarthy, M.I., Abecasis, G.R., Cardon, L.R., Goldstein,
D.B., Little, J., Ioannidis, J.P. and Hirschhorn, J.N. (2008) Genome-wide
association studies for complex traits: consensus, uncertainty and challenges. Nat Rev Genet, 9, 356-369.
[2] McCarthy, M.I. and Hirschhorn, J.N. (2008) Genome-wide
association studies: potential next steps on a genetic journey. Hum Mol Genet, 17, R156-165.
[3] Schmitt, A.O., Assmus, J., Bortfeldt, R.H. and Brockmann,
G.A. (2010) CandiSNPer: a web tool for the identification of candidate SNPs for
causal variants. Bioinformatics, 26, 969-970.
[4] Saccone, S.F., Bolze, R., Thomas, P., Quan, J., Mehta, G.,
Deelman, E., Tischfield, J.A. and Rice, J.P. (2010) SPOT: a web-based tool for
using biological databases to prioritize SNPs after a genome-wide association
study. Nucleic Acids Res, 38, W201-209.
[5]
[6] Zhang, K., Cui, S., Chang, S., Zhang, L. and Wang, J. (2010)
i-GSEA4GWAS: a web server for identification of pathways/gene sets associated
with traits by applying an improved gene set enrichment analysis to genome-wide
association study. Nucleic Acids Res,
38 Suppl, W90-95.
[7] Altshuler, D.M., Gibbs, R.A., Peltonen, L., Dermitzakis, E.,
Schaffner, S.F., Yu, F., Bonnen, P.E., de Bakker, P.I., Deloukas, P., Gabriel,
S.B. et al. (2010) Integrating common
and rare genetic variation in diverse human populations. Nature, 467, 52-58.
[8] Flicek, P., Aken, B.L., Ballester, B., Beal, K., Bragin, E.,
Brent, S., Chen, Y., Clapham, P., Coates, G., Fairley, S. et al. (2010) Ensembl's 10th year. Nucleic Acids Res, 38,
D557-562.
[9] Wang, K., Li, M. and Hakonarson, H. (2010) Analysing
biological pathways in genome-wide association studies. Nat Rev Genet, 11,
843-854.