STRING Database - Protein–Protein Interaction Networks and GO/KEGG Enrichment Analysis Tool

STRING Database - Protein–Protein Interaction Networks and Functional Enrichment Analysis Tool

Skill Overview

The STRING Database skill provides full access to the STRING protein–protein interaction database, covering 59 million proteins and over 20 billion interaction records, supporting network analysis, GO/KEGG functional enrichment, interaction partner discovery, and systems biology research for more than 5,000 species.

Applicable Scenarios

1. Interpretation of Proteomics Data

Differentially expressed protein lists obtained from mass spectrometry experiments, transcriptome sequencing, or other omics studies can be interpreted in a systems biology context using STRING. By constructing interaction networks and performing functional enrichment analysis, you can quickly identify the biological pathways, molecular functions, and cellular components these proteins are involved in, providing biological explanations for experimental results.

2. Protein Interaction Network Research

Study the interaction network of a single protein (e.g., TP53) or analyze the connectivity among multiple proteins. Supports discovery of new interaction partners, identification of core proteins (hub proteins) in the network, exploration of protein complexes, and generation of visual network maps suitable for publication or academic presentations.

3. Cross-Species Comparative Analysis

Compare the conservation of protein interactions across different species (human, mouse, yeast, etc.), study differences in interaction patterns of homologous proteins, and trace network changes during evolution. Suitable for evolutionary biology, comparative genomics, and translational research.

Core Features

1. Interaction Network Querying and Visualization

Retrieve protein–protein interaction network data using the string_network() function, with support for custom confidence thresholds (150–900) and network types (functional association or physical binding). Generate PNG-format network visualizations with string_network_image(), with options to color by evidence type, confidence level, or activation/inhibition relationships.

2. Functional Enrichment Analysis

Perform multidimensional functional enrichment analysis on protein lists with the string_enrichment() function, covering Gene Ontology (Biological Process, Molecular Function, Cellular Component), KEGG pathways, Pfam domains, InterPro protein families, and other databases. Statistical significance is assessed using Fisher’s exact test with Benjamini–Hochberg FDR correction.

3. Interaction Partner Discovery and Network Expansion

Use string_interaction_partners() to find all known and predicted interaction partners of target proteins, with support for confidence-based filtering and limits on the number of partners. Combine with string_ppi_enrichment() to test whether a protein network forms functional modules and whether the interaction density is significantly higher than random expectation.

Frequently Asked Questions

What is the STRING database? How is it used for protein–protein interaction analysis?

STRING (Search Tool for the Retrieval of Interacting Genes/Proteins) is a database that integrates known and predicted protein–protein interactions. It aggregates evidence from experiments, computational predictions, literature mining, and public database annotations, providing over 20 billion interaction records for 59 million proteins across more than 5,000 species. With STRING you can query protein interaction networks, perform functional enrichment analysis, and visualize protein associations—making it a core tool for systems biology and proteomics research.

How do I use the STRING API to query protein interaction networks?

The Python functions provided by this skill allow easy access to the STRING REST API. First use string_map_ids() to map gene names or protein names to STRING identifiers (e.g., 9606.ENSP00000269305), then use string_network() to retrieve interaction data by specifying the species (e.g., 9606 for human) and a confidence threshold (0–1000). For network visualization, use string_network_image() to generate PNG images. Code example:

from scripts.string_api import string_map_ids, string_network
mapping = string_map_ids('TP53', species=9606)
network = string_network('TP53', species=9606, required_score=700)

How should I choose the STRING confidence score? What’s the difference between 400, 700, and 900?

STRING confidence scores range from 0–1000 and integrate seven evidence types: neighborhood on the genome, gene fusion, phylogenetic co-occurrence, co-expression, experimental evidence, database annotations, and text mining. Common threshold choices: 150 (low confidence) is suitable for exploratory analyses and hypothesis generation, including more potential interactions but with a higher false positive rate; 400 (medium confidence) is the default, balancing sensitivity and specificity and suitable for standard analyses; 700 (high confidence) is for conservative analyses, retaining primarily interactions with strong supporting evidence; 900 (highest confidence) is the most strict and typically includes only high-quality, experimentally supported interactions. Choose a threshold based on the trade-off between recall and precision for your study goals.

string-database

Author

Category

Install