The WDSPdb 3.0 is based on UniProtKB 202202, and the data coverage has increased to about 1.6 times of WDSPdb 2.0.
The statistic of all WD40 proteins in different categories
Category | WD40 Proteins | WD40 Repeats | Hydrogen Bond Network | Hotspots on the Top Face |
---|---|---|---|---|
Total | 1,670,938 | 12,617,145 | 2,611,632 | 15,660,903 |
High | 576,687 | 4,776,558 | 2,154,765 | 6,427,038 |
Middle | 203,996 | 1,377,451 | 297,679 | 1,649,854 |
Low | 890,255 | 6,463,136 | 159,188 | 7,584,011 |
Eukaryota | 1,067,648 | 7,887,352 | 2,275,636 | 9,961,239 |
Bacteria | 572,434 | 4,499,652 | 325,898 | 5,432,329 |
Archaea | 21,672 | 173,014 | 7,645 | 198,585 |
Viruses | 942 | 6,550 | 332 | 8,622 |
Homo sapiens (9606) | 2,120 | 13,569 | 3,932 | 17,099 |
Mus musculus (10090) | 1,362 | 9,620 | 2,758 | 12,118 |
Rat (10116) | 1,062 | 8,500 | 2,616 | 10,745 |
Frog (8355) | 1,682 | 13,446 | 4,008 | 17,241 |
Danio rerio (7955) | 1,122 | 8,264 | 2,304 | 10,389 |
Drosophila melanogaster (7227) | 621 | 4,634 | 1,305 | 5,944 |
C. elegans (6239) | 311 | 2,296 | 614 | 2,889 |
Oryza sativa (39947) | 947 | 6,425 | 2,077 | 8,097 |
Arabidopsis thaliana (3702) | 1,649 | 12,600 | 3,864 | 15,786 |
Saccharomyces cerevisiae (559292) | 142 | 1,154 | 310 | 1,513 |
Schizosaccharomyces pombe (284812) | 160 | 1,294 | 380 | 1,657 |
The statistic of Swiss-Prot section WD40 proteins in different categories
Category | WD40 Proteins | WD40 Repeats | Hydrogen Bond Network | Hotspots on the Top Face |
---|---|---|---|---|
Total | 1,667,054 | 12,587,025 | 2,602,545 | 15,621,282 |
High | 574,455 | 4,759,298 | 2,146,262 | 6,402,584 |
Middle | 203,715 | 1,375,068 | 297,304 | 1,647,156 |
Low | 888,884 | 6,452,659 | 158,979 | 7,571,542 |
Eukaryota | 1,064,331 | 7,860,906 | 2,266,699 | 9,926,306 |
Bacteria | 571,902 | 4,496,222 | 325,755 | 5,427,986 |
Archaea | 21,663 | 172,949 | 7,644 | 198,493 |
Viruses | 916 | 6,371 | 326 | 8,369 |
Homo sapiens (9606) | 1,726 | 10,266 | 2,937 | 12,961 |
Mus musculus (10090) | 1,000 | 6,493 | 1,854 | 8,157 |
Rat (10116) | 930 | 7,393 | 2,239 | 9,307 |
Frog (8355) | 1,587 | 12,684 | 3,747 | 16,233 |
Danio rerio (7955) | 1,029 | 7,596 | 2,062 | 9,508 |
Drosophila melanogaster (7227) | 528 | 3,848 | 1,116 | 4,906 |
C. elegans (6239) | 218 | 1,566 | 410 | 1,936 |
Oryza sativa (39947) | 915 | 6,135 | 1,970 | 7,683 |
Arabidopsis thaliana (3702) | 1,461 | 11,134 | 3,353 | 13,892 |
Saccharomyces cerevisiae (559292) | 0 | 0 | 0 | 0 |
Schizosaccharomyces pombe (284812) | 0 | 0 | 0 | 0 |
The statistic of TrEMBL section WD40 proteins in different categories
Category | WD40 Proteins | WD40 Repeats | Hydrogen Bond Network | Hotspots on the Top Face |
---|---|---|---|---|
Total | 1,667,054 | 12,587,025 | 2,602,545 | 15,621,282 |
High | 574,455 | 4,759,298 | 2,146,262 | 6,402,584 |
Middle | 203,715 | 1,375,068 | 297,304 | 1,647,156 |
Low | 888,884 | 6,452,659 | 158,979 | 7,571,542 |
Eukaryota | 1,064,331 | 7,860,906 | 2,266,699 | 9,926,306 |
Bacteria | 571,902 | 4,496,222 | 325,755 | 5,427,986 |
Archaea | 21,663 | 172,949 | 7,644 | 198,493 |
Viruses | 916 | 6,371 | 326 | 8,369 |
Homo sapiens (9606) | 1,726 | 10,266 | 2,937 | 12,961 |
Mus musculus (10090) | 1,000 | 6,493 | 1,854 | 8,157 |
Rat (10116) | 930 | 7,393 | 2,239 | 9,307 |
Frog (8355) | 1,587 | 12,684 | 3,747 | 16,233 |
Danio rerio (7955) | 1,029 | 7,596 | 2,062 | 9,508 |
Drosophila melanogaster (7227) | 528 | 3,848 | 1,116 | 4,906 |
C. elegans (6239) | 218 | 1,566 | 410 | 1,936 |
Oryza sativa (39947) | 915 | 6,135 | 1,970 | 7,683 |
Arabidopsis thaliana (3702) | 1,461 | 11,134 | 3,353 | 13,892 |
Saccharomyces cerevisiae (559292) | 0 | 0 | 0 | 0 |
Schizosaccharomyces pombe (284812) | 0 | 0 | 0 | 0 |
You can view the slides online below if you can access google docs. Otherwise you need to download the slides to view locally.
You can view the slides online below if you can access google docs. Otherwise you need to download the slides to view locally.
You can view the slides online below if you can access google docs. Otherwise you need to download the slides to view locally.
You can view the slides online below if you can access google docs. Otherwise you need to download the slides to view locally.
Display by default | Columns | Explanation |
---|---|---|
On | Accession Number | UniProt accession number (can be sorted) |
On | Section | UniProt section, Swiss-Prot or TrEMBL (cannot be sorted) |
On | Entry Name | The UniProt entry name (can be sorted) |
On | Gene Name | The name of the gene that encode this protein (can be sorted) |
On | Organism | The Latin and English name of organism (can be sorted) |
On | Confidence Category | The assigned confidence category for that protein as a WD40 protein (can be sorted) |
On | Repeat Number | The predicted WD40 repeats number (can be sorted) |
Off | WDSP Score | The total score of WDSP, i.e., sum of repeat scores, tetrad sores, and other scoring terms (can be sorted) |
Items | Explanation |
---|---|
Accession Number | UniProt accession number |
Gene Name | The gene symbol |
Gene ID | Entrez gene ID |
Protein Name | Full protein name |
Organism | The Latin and English name of organism |
Organism Domain | The highest taxonomic group of organisms in the biological taxonomy. e.g., Eukaryota, Prokaryota, etc. |
Confidence Category | The assigned confidence category for that protein as a WD40 protein |
HGNC ID | The HGNC ID of the protein |
Data Source | The protein belongs to Swiss-Prot or TrEMBL |
dbVar | The link to dbVar for the gene encoding this protein if exists |
Experimental Structures of WD40 Domain | The experimental structures covering one of the WD40 repeat |
Reference Sequence | protein ID or transcript ID in NCBI Refseq database |
Description | Alternative protein names |
Functions | The annotated function from UniProt Knowledge Base |
Ensembl ID | The Ensembl gene ID, transcript ID and protein ID of this protein |
Column | Explanation |
---|---|
Repeat ID | The WD40 repeat order number |
Score | The score of the predicted WD40 repeat given by WDSP |
Start | The start site of the WD40 repeat, means the first site of the strand_d |
End | 'End' here means the first site of the loop_cd of this repeat. For the last repeat, the 'End' site is the last site of strand_c of this repeat, because the last repeat has no loop_cd |
Strand_d | The first strand of the WD40 repeat at the side face of the structure |
Loop_da | The loop connecting the strand_d and strand_a at the top-side face of the structure |
Strand_a | The second strand of the WD40 repeat at the inner face of the structure |
Loop_ab | The loop connecting the strand_a and strand_b at the bottom face of the structure |
Strand_b | The third strand of the WD40 repeat |
Loop_bc | The loop connecting the strand_b and strand_c at the top face of the structure |
Strand_c | The fourth strand of the WD40 repeat |
Loop_cd | The loop connecting the strand_c and strand_d of next WD40 repeat at the side-bottom face of the structure |
H_bonds | The residues participate in forming hydrogen bond networks of WD40 repeat (also colored blue) |
Hotspots | The potential hotspost residues on the top face (also colored red) |
Button | Explanation |
---|---|
Template | Jump to the template page |
View the featured sites as sticks, including potential hotspots on the top face and hydrogen bond networks | |
Download the 3D structure models or associated sequences | |
Change the structural display style: style, color schemes or background | |
Reset all actions | |
Expand to full screen to view | |
Let the structure spin | |
Jump to the help page of 3D structure viewer |
Display by default | Column | Explanation |
---|---|---|
On | Site | Variant site of the UniProt sequence |
On | Substitution | The single amino acid substitution of the variant |
On | 2D Location | The location of variant on the secondary structure |
On | Repeat ID | The ordinal number of WD40 repeat where the variant are located |
On | Featured site | The variant is on the hydrogen bond network or hotspot residues on the top face |
On | Resource | The data source or database of the variant |
On | Clinical Info | The clinical information of the variant from ClinVar, Humsavar, or Cosmic. Please see here. |
On | Cancer Driver | The associated cancer types driven by this variant, as annotated in IntOGen.Please see here. |
On | Highly Recurrent | Whether the variant is highly recurrent in cancer, as annotated in CancerHotspots or recurrent in COSMIC. |
On | PPI Effect | The variant effect on the interaction and specific partner, as annotated in the IntAct. Please see. |
On | Humsavar Category | For variants from Humsavar of Swiss-Prot, a category (LP/P, LB/B, US) is provided. Please see here . |
On | ClinVAR Significance | For variants from ClinVAR, the significance is provided, as annotated in ClinVAR. |
Off | Humsavar FTid | For variants from Humsavar, the Humsavar FTid is provided. |
Off | ClinVAR ID | For variants from ClinVAR, the ClinVAR ID is provided. |
Off | ClinVAR Review Status | For variants from ClinVAR, the ClinVAR Review Status is provided. |
Off | Genome Assembly | If available, the genome assembly that the variant was mapped is provided |
Off | Chromosome Coordinate | If available, the genome coordinate that the variant was mapped to the specific genome assembly is provided. |
Off | Reference | If available, the PMID of the reference which reported the variant |
Off | Allele Frequency | The allele frequency of the variant from the dataset |
Off | Reference CDS Changes | The transcript codon changes of the variant |
Off | SIFT Class | Prediction of SIFT: "T(olerated)" or "D(eleterious)", The score cutoff between "D" and "T" is 0.05. |
Off | SIFT Score | Sorting Intolerant From Tolerant score, range from 0 to 1 |
Off | MutationAssessor Class | MutationAssessor's functional impact of a variant : predicted functional, i.e. high ("H") or medium ("M"), or predicted non-functional, i.e. low ("L") or neutral ("N"). The MAori score cutoffs between "H" and "M", "M" and "L", and "L" and "N", are 3.5, 1.935 and 0.8, respectively. |
Off | MutationAssessor Score | MutationAssessor functional impact combined score, The score ranges from -5.135 to 6.49 in dbNSFP. |
Off | FATHMM Class | If a FATHMM score is <=-1.5 (or rankscore >=0.81332) the corresponding nsSNV is predicted as "D(AMAGING)"; otherwise it is predicted as "T(OLERATED)". |
Off | FATHMM Score | FATHMM default score (weighted for human inherited-disease mutations with Disease Ontology) (FATHMMori). Scores range from -16.13 to 10.64. The smaller the score the more likely the SNP has damaging effect. |
Off | MetaSVM class | Prediction of our SVM based ensemble prediction score,"T(olerated)" or "D(amaging)". The score cutoff between "D" and "T" is 0. |
Off | MetaSVM Score | Support vector machine (SVM) based ensemble prediction score, which incorporated 10 scores (SIFT, PolyPhen-2 HDIV, PolyPhen-2 HVAR, GERP++, MutationTaster, Mutation Assessor, FATHMM, LRT, SiPhy, PhyloP) and the maximum frequency observed in the 1000 genomes populations. Larger value means the SNV is more likely to be damaging. Scores range from -2 to 3 in dbNSFP. |
Off | MetaLR Class | Prediction of MetaLR based ensemble prediction score,"T(olerated)" or "D(amaging)". The score cutoff between "D" and "T" is 0.5. |
Off | MetaLR Score | Logistic regression (LR) based ensemble prediction score, which incorporated 10 scores (SIFT, PolyPhen-2 HDIV, PolyPhen-2 HVAR, GERP++, MutationTaster, Mutation Assessor, FATHMM, LRT, SiPhy, PhyloP) and the maximum frequency observed in the 1000 genomes populations. Larger value means the SNV is more likely to be damaging. Scores range from 0 to 1. |
Off | CADD Phred | CADD score.The larger the score the more likely the SNP has damaging effect.If you would like to apply a cutoff on deleteriousness, e.g. to identify potentially pathogenic variants, we would suggest to put a cutoff somewhere between 10 and 20. Maybe at 15, as this also happens to be the median value for all possible canonical splice site changes and non-synonymous variants. However, there is not a natural choice here -- it is always arbitrary. We therefore recommend integrating C-scores with other evidence and to rank your candidates for follow up rather than hard filtering. |
Off | Polyphen2 HDIV Class | Polyphen2 prediction based on HumDiv, "D" ("probably damaging", HDIV score in [0.957,1]), "P" ("possibly damaging", HDIV score in [0.453,0.956]) and "B" ("benign", HDIV score in [0,0.452] ). Score cutoff for binary classification is 0.5 for HDIV score |
Off | Polyphen2 HDIV Score | Polyphen2 score based on HumDiv. The score ranges from 0 to 1 |
Off | Polyphen2 HVAR Class | Polyphen2 prediction based on HumVar, "D" ("probably damaging", HVAR score in [0.909,1]), "P" ("possibly damaging", HVAR in [0.447,0.908]) and "B" ("benign", HVAR score in [0,0.446]). Score cutoff for binary classification is 0.5 for HVAR score |
Off | Polyphen2 HVAR Class | Polyphen2 score based on HumVar. The score ranges from 0 to 1. |
Abbreviation | Full Name |
---|---|
ALL | acute lymphoid leukemia |
AML | acute myeloid leukemia |
BLCA | bladder carcinoma |
BRCA | breast carcinoma |
CLL | chronic lymphocytic leukemia |
CM | cutaneous melanoma |
COREAD | colorectal adenocarcinoma |
DLBC | diffuse large B cell lymphoma |
ESCA | esophageal carcinoma |
GBM | glioblastoma multiforme |
HC | hepatic carcinoma |
HNSC | head and neck squamous cell carcinoma |
LGG | lower grade glioma |
LUAD | lung adenocarcinoma |
LUSC | lung squamous cell carcinoma |
MB | medulloblastoma |
MM | multiple myeloma |
NB | neuroblastoma |
NSCLC | non small cell lung carcinoma |
OV | serous ovarian adenocarcinoma |
PA | pilocytic astrocytoma |
PAAD | pancreas adenocarcinoma |
PRAD | prostate adenocarcinoma |
RCCC | renal clear cell carcinoma |
SCLC | small cell lung carcinoma |
STAD | stomach adenocarcinoma |
THCA | thyroid carcinoma |
UCEC | uterine corpus endometrioid carcinoma |
If you use the data of WDSPdb or the WDSP predictor in your research, please cite the following references:
www.wdspdb.com/wdspdb3/detail/{UniProt AC}.txt
You can replace {UniProt AC} with a specific Accession Number of a protein in UniProt database. And WDSPdb 3.0 will return the WDSP predicted result of this protein. For example, if you want to get the predicted result of WDR5_HUMAN whose Accession Number is "P61964", then your URL is www.wdspdb.com/wdspdb3/detail/P61964.txt. Don’t forget “.txt” at the end.