GWAS has been a routine approach to identify genes involved in human complex disease or some other trait [1]. As GWAS serves as initiation of future study of genetics and mechanisms, one key challenge for GWAS data interpretation is to identify causal SNPs and to provide profound evidence on the way in which they affect the trait [2]. ICSNPathway is a web server developed to discover candidate causal SNPs and corresponding candidate causal pathways from genome-wide association study (GWAS).

The input of ICSNPathway is the full list of GWAS SNP P-values. The output is the list of candidate causal SNPs and candidate causal pathways with hypothesis of SNP -> gene -> pathway(s), which represents that the candidate causal SNP alters the role of its corresponding gene/protein in the context of the pathway(s). Different from other methods and web tools, which focus either on inferring candidate casual SNPs by annotating SNPs to genes (e.g. [3-4]) or on identification of disease-related pathways by pathway-based analysis (e.g. [5-6]), ICSNPathway implements a composed analytical pipeline to integrate linkage disequilibrium (LD) analysis, functional SNP annotation and pathway-based analysis (PBA). ICSNPathway facilitates to bridge the gap between the current different methods/web tools and the gap between GWAS SNP P-values and biological mechanism. To our knowledge, so far there is no web server available to provide the similar function as ICSNPathway.

There are two key concepts in ICSNPathway. One is LD analysis, which searches the SNPs in LD with the most significant SNPs to ensure to capture more possible candidate causal SNPs based on the extended data set from HapMap [7]. The other is functional SNP. ICSNPathway pre-selects candidate causal SNPs based on functional SNPs, which are important for understanding the underlying genetics of human health. Functional SNP is defined as SNPs that may alter protein, gene expression or the role of protein in context of pathway. The functional SNPs include deleterious and non-deleterious non-synonymous SNPs, SNPs leading to gain or lost of stop codon, SNPs resulting in frame shift, and SNPs in essential splice site (the first two bp and last two bp of an intron) and in regulatory region (i.e. DNase I hypersensitive sites which marks open chromatin, histone modification sites, CCCTC-binding factor (CTCF) sites which characterize insulator/enhancer elements, and transcription factor binding sites (TFBSs)) [8]. In ICSNPathway, we implement a PBA algorithm, as named i-GSEA (improved-gene set enrichment analysis) developed by our research group [6], on the full list of GWAS SNP P-values to detect pathways associated with traits. The key idea for i-GSEA is to pick up pathways including a high proportion of significant genes to study the combined effects of possibly modest SNPs/genes in complex disease.

There are mainly four steps of running procedures in ICSNPathway: (1) search for SNPs in LD with the most significant SNPs based on LD information of a specific HapMap population [7] and perform function annotation on the SNPs and select the functional SNPs; (2) extract the corresponding genes and pathways for the selected functional SNPs; (3) perform pathway-based analysis (PBA) [6, 9] on GWAS SNP P-values by using the pre-selected pathways as search space; (4) identify candidate causal SNPs and pathways to generate hypothesis for disease mechanism.

Reference

[1] McCarthy, M.I., Abecasis, G.R., Cardon, L.R., Goldstein, D.B., Little, J., Ioannidis, J.P. and Hirschhorn, J.N. (2008) Genome-wide association studies for complex traits: consensus, uncertainty and challenges. Nat Rev Genet, 9, 356-369.

[2] McCarthy, M.I. and Hirschhorn, J.N. (2008) Genome-wide association studies: potential next steps on a genetic journey. Hum Mol Genet, 17, R156-165.

[3] Schmitt, A.O., Assmus, J., Bortfeldt, R.H. and Brockmann, G.A. (2010) CandiSNPer: a web tool for the identification of candidate SNPs for causal variants. Bioinformatics, 26, 969-970.

[4] Saccone, S.F., Bolze, R., Thomas, P., Quan, J., Mehta, G., Deelman, E., Tischfield, J.A. and Rice, J.P. (2010) SPOT: a web-based tool for using biological databases to prioritize SNPs after a genome-wide association study. Nucleic Acids Res, 38, W201-209.

[5] Medina, I., Montaner, D., Bonifaci, N., Pujana, M.A., Carbonell, J., Tarraga, J., Al-Shahrour, F. and Dopazo, J. (2009) Gene set-based analysis of polymorphisms: finding pathways or biological processes associated to traits in genome-wide association studies. Nucleic Acids Res, 37, W340-344.

[6] Zhang, K., Cui, S., Chang, S., Zhang, L. and Wang, J. (2010) i-GSEA4GWAS: a web server for identification of pathways/gene sets associated with traits by applying an improved gene set enrichment analysis to genome-wide association study. Nucleic Acids Res, 38 Suppl, W90-95.

[7] Altshuler, D.M., Gibbs, R.A., Peltonen, L., Dermitzakis, E., Schaffner, S.F., Yu, F., Bonnen, P.E., de Bakker, P.I., Deloukas, P., Gabriel, S.B. et al. (2010) Integrating common and rare genetic variation in diverse human populations. Nature, 467, 52-58.

[8] Flicek, P., Aken, B.L., Ballester, B., Beal, K., Bragin, E., Brent, S., Chen, Y., Clapham, P., Coates, G., Fairley, S. et al. (2010) Ensembl's 10th year. Nucleic Acids Res, 38, D557-562.

[9] Wang, K., Li, M. and Hakonarson, H. (2010) Analysing biological pathways in genome-wide association studies. Nat Rev Genet, 11, 843-854.