Hypothetical proteins analysis of Porphyromonas gingivalis W83 and Streptococcus mutans UA159

Ching-Hung Tseng and Chuan-Hsiung Chang

Introduction

After annotating a genome, we can generally group these genes into three types: first type are known genes which have been functionally characterized; second type are conserved hypothetical genes which are conserved in many organisms; third type are hypothetical genes which are not found in other organisms. Generally, a newly sequenced genome contains about thirty percents of genes that are annotated poorly or even wrongly [1]. Hence, these poorly characterized hypothetical genes might play an important role in our understanding of life and biology [2]. Given that so little are known about these unknown genes, it is a great challenge for scientists to select a research target worth spending years of time. Here we designed a scoring method to give each candidate protein a score based on different criteria ranking result and applied it to hypothetical proteins of two major oral pathogens, Porphyromonas gingivalis W83 and Streptococcus mutans UA159.

Materials and methods

Results

Porphyromonas gingivalis W83 top-20, [Full ranking table]. (Click on the table header for sorting)
RankGILANL IDTIGR IDSize (aa)Gene Cluster SizeDomain (#)Solubility (%)Disordered Region Length (aa)Disordered Region Length Percentage (%)Bacteria Strain (#)Bacteria Species (#)Bacteria Family (#)TFBS (#)Definition
134541330 PG1479PG_16943645448.77821.440341945conserved hypothetical protein
234541597 PG1750PG_20043307113.8144.240331643conserved membrane protein
334540102 PG0232PG_02574817210.77114.81381155136ABC transporter subunit
434539940 PG0058PG_00695041126.8509.9101873633conserved hypothetical protein (possible sugar kinase)
534540161 PG0301PG_03252098241.93114.850432832conserved hypothetical protein (possible serine cycle enzyme,formiminotransferase cyclodeaminase)
634541437 PG1588PG_181727113341155.525231524cytochrome c-type synthesis protein (cytochrome c biogenesis protein)
734539921 PG0042PG_004836742286818.531304437conserved GTP-binding protein
834541242 PG1390PG_159136512415615.327272147conserved hypothetical protein
934540104 PG0235PG_02594477122.86514.547442429conserved hypothetical protein (possible ABC transporter related membrane protein)
1034540684 PG0826PG_092713813258.93424.6104934131conserved hypothetical protein
1134540774 PG0920PG_1033248725.393.694904436conserved hypothetical protein/possible ABC element with MSD domain
1234541294 PG1443PG_1653407779.44811.833241034conserved hypothetical protein
1334540453 PG0590PG_065210011146.50019171430conserved hypothetical protein
1434540561 PG0699PG_0778239813.83313.887753543possible glycoprotein endopeptidase
1534540098 PG0228PG_02531527146.44529.631312143conserved hypothetical protein
1634541477 PG1627PG_18681735150.70021161032conserved hypothetical protein
1734540740 PG0890PG_09963028712.111036.426234147conserved hypothetical protein
1834540228 PG0368PG_0401513731515630.4106933732conserved hypothetical protein(possible CTP synthase)
1934541630 PG1784PG_20433643217.9328.864502531conserved hypothetical protein
2034540629 PG0769PG_08593127320.16119.616141135conserved hypothetical protein

Streptococcus mutans UA159 top-20, [Full ranking table]. (Click on the table header for sorting)
RankGILANL IDTIGR IDSize (aa)Gene Cluster SizeDomain (#)Solubility (%)Disordered Region Length (aa)Disordered Region Length Percentage (%)Bacteria Starin (#)Bacteria Species (#)Bacteria Family (#)TFBS (#)Definition
124379036SMu0505NTL02SM05048617151.40079712658conserved hypothetical protein
224380160SMu1634NTL02SM162824720264.63413.856441257conserved hypothetical protein; possible methyltransferase
324379999SMu1473NTL02SM146716411295.3412592843358conserved hypothetical protein
424378760SMu0228NTL02SM022847110120.88217.41341266657ABC transporter permease
524378552SMu0020NTL02SM002039118121.4123.184722456aspartate or aromatic amino acid aminotransferase
624379271SMu0740NTL02SM07393788325.9246.336301759aminotransferase
724380037SMu1511NTL02SM150528814116.3238116933757conserved hypothetical protein, tetrapyrrole methylase family
824378835SMu0303NTL02SM030327110255.1269.699782855inner membrane protein
924378920SMu0388NTL02SM03882754438.511461461458conserved hypothetical protein, Cof family
1024379860SMu1332NTL02SM13282629220.83814.582642858conserved hypothetical protein, NIF3-related
1124380470SMu1942NTL02SM19386579227.512719.34436860conserved hypothetical protein (DHH family protein)
1224380093SMu1567NTL02SM15618210180.80040391258conserved hypothetical protein
1324379851SMu1323NTL02SM131932771091.9257.62315459conserved hypothetical protein, tetratricopeptide repeat family
1424380155SMu1629NTL02SM162323820149.59037.8100873057conserved hypothetical protein
1524378740SMu0208NTL02SM020855571321001875602157conserved hypothetical protein (possible kinase)
1624378693SMu0161NTL02SM016120012129.9157.521193059conserved hypothetical protein (possible oxidoreductase)
1724379602SMu1073NTL02SM10702466158.20035312056conserved hypothetical protein
1824379206SMu0675NTL02SM06742734540.85419.865491458hydrolase, haloacid dehalogenase-like family
1924379498SMu0969NTL02SM096611010778.23229.163541958DNA-binding protein
2024379348SMu0817NTL02SM08169216160.21314.131272059conserved hypothetical protein

Remarks

In this analysis, we provide a ranked list of those hypothetical proteins in Porphyromonas gingivalis W83 and Streptococcus mutans UA195, both are oral pathogens of great interest. Those hyperlinks provide some evidence about that these hypothetical proteins might possess possible functions. This analysis is not an end but a start for further experiments and one day we might fulfill annotations of these hypothetical proteins.

References
  1. Bork P. Powers and pitfalls in sequence analysis: the 70% hurdle. (2000) Genome Res. Vol. 10, 398-400.
  2. Michael Y Galperin. Conserved 'hypothetical' proteins: new hints and new puzzles. (2001) Comparative and Functional Genomics Vol. 2, 14-18.
  3. Taison Tan, et al. Length, protein-protein interaction, and complexity. (2005) Physica A Vol. 350, 52-62.
  4. Ross Overbeek, et al. The use of gene clusters to infer functional coupling. (1999) Proc. Natl. Acad. Sci. USA Vol. 9, 2896-2901.
  5. Sean Eddy. HMMER. http://hmmer.wustl.edu/
  6. David L. Wilkinson, et al. Predicting the solubility of recombinant proteins in Escherichia coli. (1991) Bio/Technology Vol. 9, 443-448.
  7. Rune Linding, et al. Protein Disorder Prediction: Implications for Structural Proteomics. (2003) Structure Vol. 11, 1453-1459.
  8. Salgado H., et al. RegulonDB (version 5.0): Escherichia coli K-12 transcriptional regulatory network, operon organization, and growth conditions. (2006) Nucleic Acids Res. Database issue 34, D394-397.
  9. van Helden, J. Regulatory sequence analysis tools. (2003) Nucleic Acids Res. Vol. 31, 3593-3956.