The WDSPdb 2.0 is based on UniProtKB V201707, and the data coverage is about 10 times of WDSPdb1.0.
The statistic of all WD40 proteins in different categories
Category | WD40 Proteins | WD40 Repeats | Hydrogen Bond Network | Potential Hotspots on the Top Face | Species |
---|---|---|---|---|---|
Total | 594,319 | 4,033,034 | 852,295 | 4,963,216 | 4,426 |
High | 194,182 | 1,575,604 | 709,212 | 2,093,562 | 1,984 |
Middle | 63,401 | 411,377 | 96,753 | 490,226 | 2,086 |
Low | 336,736 | 2,046,053 | 46,330 | 2,379,428 | 4,021 |
Eukaryota | 349,985 | 2,526,886 | 759,192 | 3,165,106 | 1,711 |
Bacteria | 232,439 | 1,433,214 | 90,131 | 1,711,023 | 2,472 |
Archaea | 7,272 | 48,997 | 1,779 | 57,419 | 175 |
Viruses | 680 | 4,043 | 176 | 5,584 | 67 |
Homo sapiens | 1,941 | 10,698 | 3,388 | 13,087 | |
Mus musculus | 1,398 | 8,955 | 2,674 | 10,943 | |
Danio rerio | 950 | 6,900 | 2,025 | 8,519 | |
Oryza sativa | 791 | 5,290 | 1,544 | 6,565 | |
Arabidopsis thaliana | 1,232 | 8,718 | 2,397 | 10,813 | |
Drosophila melanogaster | 688 | 4,682 | 1,282 | 5,718 | |
Saccharomyces cerevisiae | 176 | 1,197 | 305 | 1,542 | Schizosaccharomyces pombe | 176 | 1,319 | 377 | 1,685 |
The statistic of Swiss-Prot section WD40 proteins in different categories
Category | WD40 Proteins | WD40 Repeats | Hydrogen Bond Network | Potential Hotspots on the Top Face | Species |
---|---|---|---|---|---|
Total | 5,601 | 33,410 | 8,950 | 43,146 | 1,040 |
High | 2,173 | 16,788 | 8,250 | 23,647 | 191 |
Middle | 333 | 2,716 | 439 | 3,035 | 110 |
Low | 3,095 | 13,906 | 261 | 16,464 | 962 |
Eukaryota | 4,002 | 27,449 | 8,746 | 35,753 | 286 |
Bacteria | 1,461 | 5,441 | 191 | 6,698 | 679 |
Archaea | 46 | 151 | 1 | 202 | 26 |
Viruses | 92 | 369 | 12 | 493 | 49 |
Homo sapiens | 473 | 3,413 | 990 | 4,205 | |
Mus musculus | 436 | 3,160 | 899 | 3,845 | |
Danio rerio | 99 | 673 | 216 | 893 | |
Oryza sativa | 40 | 335 | 99 | 461 | |
Arabidopsis thaliana | 307 | 1,960 | 463 | 2,548 | |
Drosophila melanogaster | 101 | 699 | 178 | 912 | |
Saccharomyces cerevisiae | 176 | 1,197 | 305 | 1,542 | Schizosaccharomyces pombe | 176 | 1,319 | 377 | 1,685 |
The statistic of TrEMBL section WD40 proteins in different categories
Category | WD40 Proteins | WD40 Repeats | Hydrogen Bond Network | Potential Hotspots on the Top Face | Species |
---|---|---|---|---|---|
Total | 588,718 | 3,999,624 | 843,345 | 4,920,070 | 4,205 |
High | 192,009 | 1,558,816 | 700,962 | 2,069,915 | 1,961 |
Middle | 63,068 | 408,661 | 96,314 | 487,191 | 2,049 |
Low | 333,641 | 2,032,147 | 46,069 | 2,362,964 | 3,801 |
Eukaryota | 345,983 | 2,499,437 | 750,446 | 3,129,353 | 1,648 |
Bacteria | 230,978 | 1,427,773 | 89,940 | 1,704,325 | 2,364 |
Archaea | 7,226 | 48,846 | 1,778 | 57,217 | 173 |
Viruses | 588 | 3,674 | 164 | 5,091 | 19 |
Homo sapiens | 1,468 | 7,285 | 2,398 | 8,882 | |
Mus musculus | 962 | 5,795 | 1,775 | 7,098 | |
Danio rerio | 851 | 6,227 | 1,809 | 7,626 | |
Oryza sativa | 751 | 4,955 | 1,445 | 6,104 | |
Arabidopsis thaliana | 925 | 6,758 | 1,934 | 8,265 | |
Drosophila melanogaster | 587 | 3,983 | 1,104 | 4,806 | |
Saccharomyces cerevisiae | 0 | 0 | 0 | 0 | Schizosaccharomyces pombe | 0 | 0 | 0 | 0 |
Display by default | Columns | Explanation |
---|---|---|
On | Accession Number | UniProt accession number (can be sorted) |
On | Section | UniProt section, Swiss-Prot ir TrEMBL (can be sorted) |
On | Entry Name | The UniProt entry name (can be sorted) |
On | Protein Name | Full protein name (can be sorted) |
On | Organism | The Latin and English name of organism (can be sorted) |
On | WDSP Category | The assigned confidence category for that protein |
On | Repeat Number | The predicted WD40 repeats number (can be sorted) |
Off | WDSP Score | The average repeats score of prediction |
Items | Explanation |
---|---|
Accession Number | UniProt accession number |
Gene Name | The gene symbol |
Gene ID | Entrez gene ID |
Protein Name | Full protein name |
Organism | The Latin and English name of organism |
Organism Domain | The highest taxonomic rank of organisms in the biological taxonomy |
WDSP Category | The assigned confidence category for that protein |
HGNC ID | The HGNC ID of the protein |
Data Source | The protein is belong to the Swiss-Prot or TrEMBL |
Experimental Structures of WD40 Domain | The experimental structures containing WD40 domain |
Reference Sequence | protein ID or transcript ID in NCBI |
Description | Alternative protein names |
Functions | The annotated function in UniProt Knowledge Base |
Ensembl ID | The Ensembl gene ID, transcript ID and protein ID |
Column | Explanation |
---|---|
Repeat ID | The WD40 repeat order number |
Score | The WDSP predicted score of the WD40 repeat (mannually reviewed if empty) |
Start | The start site of the WD40 repeat, means the first site of the strand_d |
End | The first site of the loop_cd. The end site of last repeat is the followed site of the strand_c |
Strand_d | The first strand of the WD40 repeat at the side face of the structure |
Loop_da | The loop connecting the strand_d and strand_a at the top-side face of the structure |
Strand_a | The second strand of the WD40 repeat at the inner face of the structure |
Loop_ab | The loop connecting the strand_a and strand_b at the bottom face of the structure |
Strand_b | The third strand of the WD40 repeat |
Loop_bc | The loop connecting the strand_b and strand_c at the top face of the structure |
Strand_c | The fourth strand of the WD40 repeat |
Loop_cd | The loop connecting the strand_c and strand_d of next WD40 repeat at the side-bottom face of the structure |
H_bonds | The residues participate in forming hydrogen bond networks of WD40 repeat (blue residues) |
Hotspots | The potential hotspost residues on the top face (red residues) |
Button | Explanation |
---|---|
Template | Jump to the template page |
View the featured sites include potential hotspots on the top face and hydrogen bond networks | |
Download the 3D structure models or associated sequences | |
Change the structural display style: style, color schemes or background | |
Reset all actions | |
Expand to full screen to view | |
Let the structure spin | |
Jump to the help of 3D structure viewer |
Display by default | Column | Explanation |
---|---|---|
On | Site | Variant site corresponding to the UniProt sequence |
On | Substitution | The single amino acid substitution of the variant |
On | 2D Location | The secondary structure location of the variant in that WD40 repeat |
On | Repeat ID | The ordinal number of WD40 repeat where the variant are located |
On | Featured site | The wild type residue of the variant is the member of the hydrogen bond network or hotspots on the top face |
On | Resource | The data source or database of the variant |
On | Clinical Info | The clinical information of the variant from ClinVar or Cosmic. Please see. |
On | Cancer Driver | The associated cancer types driven by that variant, as annotated in IntOGen.Please see. |
On | Highly Recurrent | Whether the variant is highly recurrent in cancer, as annotated in cBioPortal. Please see the answer of "Which resources are integrated for variant annotation?" at the FAQ page of cBioPortal. |
On | PPI Effect | The variant effect on the interaction and specific partner, as annotated in the IntAct. Please see. |
Off | Reference CDS Changes | The transcript codon changes of the variant |
Off | Reference | The PMID of the reference which reported the variant |
Off | Allele Frequency | The allele frequency of the variant from the dataset |
Off | Chromosome Coordinate | The codon change site in the chromosome of the variant |
Off | SIFT_score | Sorting Intolerant From Tolerant score, range from 0 to 1 |
Off | SIFT_pred | Prediction of SIFT scores: "T(olerated)" or "D(amaging)", The score cutoff between "D" and "T" is 0.05. |
Off | MutationAssessor_score | MutationAssessor functional impact combined score, The score ranges from -5.135 to 6.49 in dbNSFP. |
Off | MutationAssessor_pred | MutationAssessor's functional impact of a variant : predicted functional, i.e. high ("H") or medium ("M"), or predicted non-functional, i.e. low ("L") or neutral ("N"). The MAori score cutoffs between "H" and "M", "M" and "L", and "L" and "N", are 3.5, 1.935 and 0.8, respectively. |
Off | FATHMM_score | FATHMM default score (weighted for human inherited-disease mutations with Disease Ontology) (FATHMMori). Scores range from -16.13 to 10.64. The smaller the score the more likely the SNP has damaging effect. |
Off | FATHMM_pred | If a FATHMM score is <=-1.5 (or rankscore >=0.81332) the corresponding nsSNV is predicted as "D(AMAGING)"; otherwise it is predicted as "T(OLERATED)". |
Off | MetaSVM_score | Support vector machine (SVM) based ensemble prediction score, which incorporated 10 scores (SIFT, PolyPhen-2 HDIV, PolyPhen-2 HVAR, GERP++, MutationTaster, Mutation Assessor, FATHMM, LRT, SiPhy, PhyloP) and the maximum frequency observed in the 1000 genomes populations. Larger value means the SNV is more likely to be damaging. Scores range from -2 to 3 in dbNSFP. |
Off | MetaSVM_pred | Prediction of our SVM based ensemble prediction score,"T(olerated)" or "D(amaging)". The score cutoff between "D" and "T" is 0. |
Off | MetaLR_score | Logistic regression (LR) based ensemble prediction score, which incorporated 10 scores (SIFT, PolyPhen-2 HDIV, PolyPhen-2 HVAR, GERP++, MutationTaster, Mutation Assessor, FATHMM, LRT, SiPhy, PhyloP) and the maximum frequency observed in the 1000 genomes populations. Larger value means the SNV is more likely to be damaging. Scores range from 0 to 1. |
Off | MetaLR_pred | Prediction of our MetaLR based ensemble prediction score,"T(olerated)" or "D(amaging)". The score cutoff between "D" and "T" is 0.5. |
Off | CADD_phred | CADD score.The larger the score the more likely the SNP has damaging effect.If you would like to apply a cutoff on deleteriousness, e.g. to identify potentially pathogenic variants, we would suggest to put a cutoff somewhere between 10 and 20. Maybe at 15, as this also happens to be the median value for all possible canonical splice site changes and non-synonymous variants. However, there is not a natural choice here -- it is always arbitrary. We therefore recommend integrating C-scores with other evidence and to rank your candidates for follow up rather than hard filtering. |
Off | ExAC_AC | Allele count in total ExAC samples (60,706 samples) |
Off | ExAC_AF | Allele frequency in total ExAC samples |
Off | ExAC_pLI | "The probability of being loss-of-function intolerant (intolerant of both heterozygous and homozygous lof variants)" based on ExAC r0.3 data |
Off | ExAC_pRec | "The probability of being intolerant of homozygous, but not heterozygous lof variants" based on ExAC r0.3 data |
Off | GDI | Gene damage index score, "a genome-wide, gene-level metric of the mutational damage that has accumulated in the general population" from doi: 10.1073/pnas.1518646112. The higher the score the less likely the gene is to be responsible for monogenic diseases. |
Off | Essential_gene | Essential ("E") or Non-essential phenotype-changing ("N") based on Mouse Genome Informatics database. from doi:10.1371/journal.pgen.1003484 |
Off | Polyphen2_HDIV_score | Polyphen2 score based on HumDiv. The score ranges from 0 to 1 |
Off | Polyphen2_HDIV_pred | Polyphen2 prediction based on HumDiv, "D" ("probably damaging", HDIV score in [0.957,1]), "P" ("possibly damaging", HDIV score in [0.453,0.956]) and "B" ("benign", HDIV score in [0,0.452] ). Score cutoff for binary classification is 0.5 for HDIV score |
Off | Polyphen2_HVAR_score | Polyphen2 score based on HumVar. The score ranges from 0 to 1. |
Off | Polyphen2_HVAR_pred | Polyphen2 prediction based on HumVar, "D" ("probably damaging", HVAR score in [0.909,1]), "P" ("possibly damaging", HVAR in [0.447,0.908]) and "B" ("benign", HVAR score in [0,0.446]). Score cutoff for binary classification is 0.5 for HVAR score |
Primary Tissue | The primary tissue from which the sample originated. More details on the tissue classification can be found here |
Tissue Subtype 1 | Further sub classifications of the samples tissue of origin |
Tissue Subtype 2 | Further sub classifications of the samples tissue of origin. |
Tissue Subtype 3 | Further sub classifications of the samples tissue of origin. |
Histology | The histological classification of the sample |
Histology Subtype 1 | Further histological classifications of the sample |
Histology Subtype 2 | Further histological classifications of the sample |
Histology Subtype 3 | Further histological classifications of the sample |
NS | Not specify |
Abbreviation | Full Name |
---|---|
ALL | acute lymphoid leukemia |
AML | acute myeloid leukemia |
BLCA | bladder carcinoma |
BRCA | breast carcinoma |
CLL | chronic lymphocytic leukemia |
CM | cutaneous melanoma |
COREAD | colorectal adenocarcinoma |
DLBC | diffuse large B cell lymphoma |
ESCA | esophageal carcinoma |
GBM | glioblastoma multiforme |
HC | hepatic carcinoma |
HNSC | head and neck squamous cell carcinoma |
LGG | lower grade glioma |
LUAD | lung adenocarcinoma |
LUSC | lung squamous cell carcinoma |
MB | medulloblastoma |
MM | multiple myeloma |
NB | neuroblastoma |
NSCLC | non small cell lung carcinoma |
OV | serous ovarian adenocarcinoma |
PA | pilocytic astrocytoma |
PAAD | pancreas adenocarcinoma |
PRAD | prostate adenocarcinoma |
RCCC | renal clear cell carcinoma |
SCLC | small cell lung carcinoma |
STAD | stomach adenocarcinoma |
THCA | thyroid carcinoma |
UCEC | uterine corpus endometrioid carcinoma |
If you use the data of WDSPdb or the WDSP predictor in you research, please cite the following papers:
www.wdspdb.com/wdsp/detail/{UniProt AC}.txt
You can replace {UniProt AC} with a specific Accession Number of a protein in UniProt database. And WDSPdb 2.0 will return the WDSP predicted result of this protein. For example, if you want to get the predicted result of WDR5_HUMAN whose Accession Number is "P61964", then your URL is www.wdspdb.com/wdsp/detail/P61964.txt. Don’t forget “.txt” at the end.