WD40 proteins database

Help

Introduction

WDSPdb 3.0

WDSPdb, originally created in 2014, is a database for WD40-repeat proteins. It serves as a free and comprehensive resource to provide high quality structural information of WD40 proteins as well as the featured sites, including the special hydrogen-bond network residues and interaction hotspot residues. WDSPdb 3.0 is based on UniprotKB 2022_02, and it stores more than 1,600,000 predicted WD40-repeat proteins from about 36,000 species. Compared to its previous version (WDSPdb 2.0, released in 2019), the proteins in WDSPdb 3.0 have increased about 1,000,000.
WDSPdb provides accurate structure predictions and featured sites annotations specifically for WD40-repeat domains, based on our in-house prediction tool, WDSP. During the updating of this release, WDSP has been configured to use Psipred 4.0.1, and NCBI BLAST+ 2.13.0 against Uniref90(Release_202202), and 3 iterations with random number seeds of 0,1,2.
It exhibited the 3D structure models for WD40-repeat proteins whose repeat counts are 6, 7 or 8, and whose confidence category is 'High'. These 3D structures were modeled by MODELLER 10.4 using our customized target-template alignments.
It mapped about 28,000 missense variants/mutations/substitutions to 231 human WD40 proteins from Swiss-Prot section with confidence category of 'High'. These data are from 9 sources: CancerHotspots, COSMIC, Clinvar, Humsavar from UniprotKB, 1000 Genomes phase 3, IntoGen, IntAct, ExAC, and GnomAD exomes. Besides the general information of variants, we not only provided the special annotations include exact secondary structure locations, featured site identifications (hydrogen bond network, potential hotspots on the top face), but also presented the information obtained from other databases, like pathogenic predictions in dbNSFP,cancer driver mutation from the IntOgen, cancer highly recurrent mutations from the CancerHotspots, and whether the variants were experimentally shown to affect the PPIs based on IntAct annotations.
The updated WDSP prediction tool that was adopted for building WDSPdb 3.0 is also deployed as a web server. If the interested sequence is not included in the database, the user can submit it to this online tool to make predictions.

WD40 Repeat Protein

The WD40-repeat domains are one of the most abundant interactors in protein-protein interaction (PPI) networks. By acting as scaffolds, they assemble various molecular machineries, and play versatile roles in fundamental biological processes including signal transduction, expression regulation, histone modification, ubiquitination, cell cycle control, etc.
In structure, WD40 domain is a β-propeller usually formed by 6-8 blades, and each blade contains 40-60 residues with conserved GH and WD dipeptides.
Each WD40 blade is a four-stranded antiparallel β-sheet (strand a, b, c and d, connected by loops), and is often stabilized by a strong side-chain hydrogen bond network that is widely and uniquely presented in WD40 proteins. These hydrogen-bond network residues servers as a group of featured sites of WD40 domains.
The structural blades correspond to the repeated units in sequence, with one strand shifted. That is, a blade contains strand a, b, c, d consecutively, while a repeat unit contains strand d of the previous blade, and strand a, b, c of the blade that follows.
WD40 domain has three faces, i.e. top, side and bottom faces, to mediate the interactions with other molecules. The top face is better studied than others, and the potential hotspot residues on this face are exposed into the solvent by the β bulge between the strand a and strand b to participate in interactions.

WDSP

WD40 repeat protein Structure Predictor (WDSP) was developed to accurately predict the secondary structures of WD40 domains, featured sites including hydrogen bond network residues, and interaction hotspot residues.
The WDSP tool adopts a WD40-specific position weight matrix (PWM) and PSIPRED as backends.
WDSP is the only tool which can both identify the exact boundaries of WD40 repeat and their structural and functional features up to date.

Statistics

The WDSPdb 3.0 is based on UniProtKB 202202, and the data coverage has increased to about 1.6 times of WDSPdb 2.0.

The statistic of all WD40 proteins in different categories

Category	WD40 Proteins	WD40 Repeats	Hydrogen Bond Network	Hotspots on the Top Face
Total	1,670,938	12,617,145	2,611,632	15,660,903
High	576,687	4,776,558	2,154,765	6,427,038
Middle	203,996	1,377,451	297,679	1,649,854
Low	890,255	6,463,136	159,188	7,584,011
Eukaryota	1,067,648	7,887,352	2,275,636	9,961,239
Bacteria	572,434	4,499,652	325,898	5,432,329
Archaea	21,672	173,014	7,645	198,585
Viruses	942	6,550	332	8,622
Homo sapiens (9606)	2,120	13,569	3,932	17,099
Mus musculus (10090)	1,362	9,620	2,758	12,118
Rat (10116)	1,062	8,500	2,616	10,745
Frog (8355)	1,682	13,446	4,008	17,241
Danio rerio (7955)	1,122	8,264	2,304	10,389
Drosophila melanogaster (7227)	621	4,634	1,305	5,944
C. elegans (6239)	311	2,296	614	2,889
Oryza sativa (39947)	947	6,425	2,077	8,097
Arabidopsis thaliana (3702)	1,649	12,600	3,864	15,786
Saccharomyces cerevisiae (559292)	142	1,154	310	1,513
Schizosaccharomyces pombe (284812)	160	1,294	380	1,657

Swiss-Prot

The statistic of Swiss-Prot section WD40 proteins in different categories

Category	WD40 Proteins	WD40 Repeats	Hydrogen Bond Network	Hotspots on the Top Face
Total	1,667,054	12,587,025	2,602,545	15,621,282
High	574,455	4,759,298	2,146,262	6,402,584
Middle	203,715	1,375,068	297,304	1,647,156
Low	888,884	6,452,659	158,979	7,571,542
Eukaryota	1,064,331	7,860,906	2,266,699	9,926,306
Bacteria	571,902	4,496,222	325,755	5,427,986
Archaea	21,663	172,949	7,644	198,493
Viruses	916	6,371	326	8,369
Homo sapiens (9606)	1,726	10,266	2,937	12,961
Mus musculus (10090)	1,000	6,493	1,854	8,157
Rat (10116)	930	7,393	2,239	9,307
Frog (8355)	1,587	12,684	3,747	16,233
Danio rerio (7955)	1,029	7,596	2,062	9,508
Drosophila melanogaster (7227)	528	3,848	1,116	4,906
C. elegans (6239)	218	1,566	410	1,936
Oryza sativa (39947)	915	6,135	1,970	7,683
Arabidopsis thaliana (3702)	1,461	11,134	3,353	13,892
Saccharomyces cerevisiae (559292)	0	0	0	0
Schizosaccharomyces pombe (284812)	0	0	0	0

TrEMBL

The statistic of TrEMBL section WD40 proteins in different categories

Category	WD40 Proteins	WD40 Repeats	Hydrogen Bond Network	Hotspots on the Top Face
Total	1,667,054	12,587,025	2,602,545	15,621,282
High	574,455	4,759,298	2,146,262	6,402,584
Middle	203,715	1,375,068	297,304	1,647,156
Low	888,884	6,452,659	158,979	7,571,542
Eukaryota	1,064,331	7,860,906	2,266,699	9,926,306
Bacteria	571,902	4,496,222	325,755	5,427,986
Archaea	21,663	172,949	7,644	198,493
Viruses	916	6,371	326	8,369
Homo sapiens (9606)	1,726	10,266	2,937	12,961
Mus musculus (10090)	1,000	6,493	1,854	8,157
Rat (10116)	930	7,393	2,239	9,307
Frog (8355)	1,587	12,684	3,747	16,233
Danio rerio (7955)	1,029	7,596	2,062	9,508
Drosophila melanogaster (7227)	528	3,848	1,116	4,906
C. elegans (6239)	218	1,566	410	1,936
Oryza sativa (39947)	915	6,135	1,970	7,683
Arabidopsis thaliana (3702)	1,461	11,134	3,353	13,892
Saccharomyces cerevisiae (559292)	0	0	0	0
Schizosaccharomyces pombe (284812)	0	0	0	0