mSignatureDB: a database for deciphering mutational signatures in human cancers

Po-Jung Huang1,2, Ling-Ya Chiu3, Chi-Ching Lee2,4, Yuan-Ming Yeh2,3, Kuo-Yang Huang5, Cheng-Hsun Chiu2,6 and Petrus Tang3,6,*

Cancer is a genetic disease caused by somatic mutations; however, the understanding of the causative biological processes generating these mutations is limited. A cancer genome bears the cumulative effects of mutational processes during tumor development. Deciphering mutational signatures in cancer is a new topic in cancer research. The Wellcome Trust Sanger Institute (WTSI) has categorized 30 reference signatures in the COSMIC database based on the analyses of ∼10 000 sequencing datasets from TCGA and ICGC. Large cohorts and bioinformatics skills are required to perform the same analysis as WTSI. The quantification of known signatures in custom cohorts is not possible under the current framework of the COSMIC database, which motivates us to construct a database for mutational signatures in cancers and make such analyses more accessible to general researchers. mSignatureDB ( integrates R packages and in-house scripts to determine the contributions of the published signatures in 15 780 individual tumors from 73 TCGA/ICGC cancer projects, making the comparison of signature patterns within and between projects become possible. mSignatureDB also allows users to perform signature analysis on their own datasets, quantifying contributions of signatures at sample resolution, which is a unique feature of mSignatureDB not available in other related databases.

An external file that holds a picture, illustration, etc.
Object name is gkx1133fig1.jpg

Figure 1. Overview of mSignatureDB. Somatic mutation profiles were gathered from TCGA/ICGC large-scale genomics studies. mSignatureDB comprises four components: (i) browse; (ii) search; (iii) analysis and (iv) download. In the ‘Browse’ page, the landscapes of mutational signatures can be inspected by cancer project, primary site or country. Users can search the database using the names of cancer projects. The hierarchically-clustered heatmap is used to reveal dominant signatures in a cancer project according to the contribution of each signature. By displaying mutations according to substitution types and along a reference genome, users can easily depict dominant mutation types and localized regions of mutation hotspots. The signature profiles and the clinical associations can be downloaded through the ‘Download’ page. The web interfaces for two popular mutational signature analysis tools, the deconstructSigs, and the WTSI Mutational Signature Framework, are provided to facilitate custom data analyses.