Visualizing and Clustering Protein Similarity Networks: Sequences, Structures, and Functions

Te Lun Mai, Geng Ming Hu, Chi-Ming Chen

Research output: Contribution to journalArticle

8 Citations (Scopus)

Abstract

Research in the recent decade has demonstrated the usefulness of protein network knowledge in furthering the study of molecular evolution of proteins, understanding the robustness of cells to perturbation, and annotating new protein functions. In this study, we aimed to provide a general clustering approach to visualize the sequence-structure-function relationship of protein networks, and investigate possible causes for inconsistency in the protein classifications based on sequences, structures, and functions. Such visualization of protein networks could facilitate our understanding of the overall relationship among proteins and help researchers comprehend various protein databases. As a demonstration, we clustered 1437 enzymes by their sequences and structures using the minimum span clustering (MSC) method. The general structure of this protein network was delineated at two clustering resolutions, and the second level MSC clustering was found to be highly similar to existing enzyme classifications. The clustering of these enzymes based on sequence, structure, and function information is consistent with each other. For proteases, the Jaccard's similarity coefficient is 0.86 between sequence and function classifications, 0.82 between sequence and structure classifications, and 0.78 between structure and function classifications. From our clustering results, we discussed possible examples of divergent evolution and convergent evolution of enzymes. Our clustering approach provides a panoramic view of the sequence-structure-function network of proteins, helps visualize the relation between related proteins intuitively, and is useful in predicting the structure and function of newly determined protein sequences.

Original languageEnglish
Pages (from-to)2123-2131
Number of pages9
JournalJournal of Proteome Research
Volume15
Issue number7
DOIs
Publication statusPublished - 2016 Jul 1

Fingerprint

Cluster Analysis
Proteins
Enzymes
Protein Databases
Molecular Evolution
Peptide Hydrolases
Research Personnel
Demonstrations
Visualization
Research

Keywords

  • protein similarity networks
  • sequence similarity
  • sequence-structure-function relationship
  • structure similarity

ASJC Scopus subject areas

  • Biochemistry
  • Chemistry(all)

Cite this

Visualizing and Clustering Protein Similarity Networks : Sequences, Structures, and Functions. / Mai, Te Lun; Hu, Geng Ming; Chen, Chi-Ming.

In: Journal of Proteome Research, Vol. 15, No. 7, 01.07.2016, p. 2123-2131.

Research output: Contribution to journalArticle

@article{41429a57193e4db785dad37cbf43fdf1,
title = "Visualizing and Clustering Protein Similarity Networks: Sequences, Structures, and Functions",
abstract = "Research in the recent decade has demonstrated the usefulness of protein network knowledge in furthering the study of molecular evolution of proteins, understanding the robustness of cells to perturbation, and annotating new protein functions. In this study, we aimed to provide a general clustering approach to visualize the sequence-structure-function relationship of protein networks, and investigate possible causes for inconsistency in the protein classifications based on sequences, structures, and functions. Such visualization of protein networks could facilitate our understanding of the overall relationship among proteins and help researchers comprehend various protein databases. As a demonstration, we clustered 1437 enzymes by their sequences and structures using the minimum span clustering (MSC) method. The general structure of this protein network was delineated at two clustering resolutions, and the second level MSC clustering was found to be highly similar to existing enzyme classifications. The clustering of these enzymes based on sequence, structure, and function information is consistent with each other. For proteases, the Jaccard's similarity coefficient is 0.86 between sequence and function classifications, 0.82 between sequence and structure classifications, and 0.78 between structure and function classifications. From our clustering results, we discussed possible examples of divergent evolution and convergent evolution of enzymes. Our clustering approach provides a panoramic view of the sequence-structure-function network of proteins, helps visualize the relation between related proteins intuitively, and is useful in predicting the structure and function of newly determined protein sequences.",
keywords = "protein similarity networks, sequence similarity, sequence-structure-function relationship, structure similarity",
author = "Mai, {Te Lun} and Hu, {Geng Ming} and Chi-Ming Chen",
year = "2016",
month = "7",
day = "1",
doi = "10.1021/acs.jproteome.5b01031",
language = "English",
volume = "15",
pages = "2123--2131",
journal = "Journal of Proteome Research",
issn = "1535-3893",
publisher = "American Chemical Society",
number = "7",

}

TY - JOUR

T1 - Visualizing and Clustering Protein Similarity Networks

T2 - Sequences, Structures, and Functions

AU - Mai, Te Lun

AU - Hu, Geng Ming

AU - Chen, Chi-Ming

PY - 2016/7/1

Y1 - 2016/7/1

N2 - Research in the recent decade has demonstrated the usefulness of protein network knowledge in furthering the study of molecular evolution of proteins, understanding the robustness of cells to perturbation, and annotating new protein functions. In this study, we aimed to provide a general clustering approach to visualize the sequence-structure-function relationship of protein networks, and investigate possible causes for inconsistency in the protein classifications based on sequences, structures, and functions. Such visualization of protein networks could facilitate our understanding of the overall relationship among proteins and help researchers comprehend various protein databases. As a demonstration, we clustered 1437 enzymes by their sequences and structures using the minimum span clustering (MSC) method. The general structure of this protein network was delineated at two clustering resolutions, and the second level MSC clustering was found to be highly similar to existing enzyme classifications. The clustering of these enzymes based on sequence, structure, and function information is consistent with each other. For proteases, the Jaccard's similarity coefficient is 0.86 between sequence and function classifications, 0.82 between sequence and structure classifications, and 0.78 between structure and function classifications. From our clustering results, we discussed possible examples of divergent evolution and convergent evolution of enzymes. Our clustering approach provides a panoramic view of the sequence-structure-function network of proteins, helps visualize the relation between related proteins intuitively, and is useful in predicting the structure and function of newly determined protein sequences.

AB - Research in the recent decade has demonstrated the usefulness of protein network knowledge in furthering the study of molecular evolution of proteins, understanding the robustness of cells to perturbation, and annotating new protein functions. In this study, we aimed to provide a general clustering approach to visualize the sequence-structure-function relationship of protein networks, and investigate possible causes for inconsistency in the protein classifications based on sequences, structures, and functions. Such visualization of protein networks could facilitate our understanding of the overall relationship among proteins and help researchers comprehend various protein databases. As a demonstration, we clustered 1437 enzymes by their sequences and structures using the minimum span clustering (MSC) method. The general structure of this protein network was delineated at two clustering resolutions, and the second level MSC clustering was found to be highly similar to existing enzyme classifications. The clustering of these enzymes based on sequence, structure, and function information is consistent with each other. For proteases, the Jaccard's similarity coefficient is 0.86 between sequence and function classifications, 0.82 between sequence and structure classifications, and 0.78 between structure and function classifications. From our clustering results, we discussed possible examples of divergent evolution and convergent evolution of enzymes. Our clustering approach provides a panoramic view of the sequence-structure-function network of proteins, helps visualize the relation between related proteins intuitively, and is useful in predicting the structure and function of newly determined protein sequences.

KW - protein similarity networks

KW - sequence similarity

KW - sequence-structure-function relationship

KW - structure similarity

UR - http://www.scopus.com/inward/record.url?scp=84977070642&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84977070642&partnerID=8YFLogxK

U2 - 10.1021/acs.jproteome.5b01031

DO - 10.1021/acs.jproteome.5b01031

M3 - Article

C2 - 27267620

AN - SCOPUS:84977070642

VL - 15

SP - 2123

EP - 2131

JO - Journal of Proteome Research

JF - Journal of Proteome Research

SN - 1535-3893

IS - 7

ER -