ENA Database - European Nucleotide Archive Data Retrieval Tool

ENA Database - Data Retrieval Tool for the European Nucleotide Archive

Skill Overview
ENA Database is a tool for accessing the European Nucleotide Archive (ENA). It supports downloading DNA/RNA sequences, FASTQ raw sequencing data, and genome assembly data via REST API and FTP, suitable for genomics and bioinformatics research workflows.

Applicable Scenarios

Obtaining public sequencing data for bioinformatics analysis

When your research requires reanalysis of published sequencing data, you can quickly retrieve and download FASTQ raw reads, BAM/CRAM alignment files, or genome assemblies by accession number without sequencing the samples yourself.

Building automated bioinformatics pipelines

ENA provides a comprehensive REST API that supports programmatic queries of samples, studies, and genome data. It can be easily integrated into analysis pipelines to automate data retrieval and processing.

Cross-database queries and metadata retrieval

When you need to query metadata for specific species, projects, or samples, ENA supports multi-dimensional searches by taxonomic classification, project ID, date ranges, etc., and can obtain cross-referenced records from related databases.

Core Features

Retrieval and download of multiple data types

Supports searching and downloading Studies, Samples, Raw Reads, Assemblies, Sequences, and other data types. Provides a Browser API for direct record retrieval and a Portal API for advanced search and batch queries.

Support for multiple data formats

Sequence data are available in FASTQ (raw reads), FASTA (assembly sequences), and BAM/CRAM (alignments); metadata are available in XML, JSON, TSV/CSV formats. Data can be obtained via API, FTP, or the Aspera high-speed transfer tool.

Advanced queries and taxonomic search

Supports free-text search, sequence similarity search (BLAST), and Rulespace advanced query syntax. You can query related genome assemblies by taxonomic tree, retrieve taxonomic information and lineage, and perform cross-database cross-references.

Frequently Asked Questions

What is ENA Database and what is it mainly used for?
ENA Database is a data access tool for the European Nucleotide Archive, maintained by the EBI. It is one of the three major public nucleotide sequence databases worldwide (alongside NCBI GenBank and DDBJ), storing DNA/RNA sequences, raw sequencing data, genome assemblies, and functional annotations. Its main purpose is to enable researchers to search for and download publicly available genomics data for analysis and study.

How do I download FASTQ raw sequencing data using an accession number?
Use the ENA Browser API, e.g., https://www.ebi.ac.uk/ena/browser/api/xml/{accession} to get record information. For FASTQ file downloads, you can use the Portal API search: https://www.ebi.ac.uk/ena/portal/api/search?result=read_run&query=run_accession={ERR_ID}&format=json to obtain download links. For large-scale downloads, it is recommended to use enaBrowserTools or FTP/Aspera.

Are there rate limits for ENA APIs?
Yes. All ENA APIs are rate-limited to 50 requests per second. Exceeding the limit will return HTTP 429 (Too Many Requests). It is recommended to implement exponential backoff retry logic in code or use batch query endpoints to reduce request counts. For large data downloads, prefer FTP or Aspera rather than individual API calls.

What is the difference between the ENA Portal API and the Browser API?
The Portal API is for advanced searches and bulk queries, supporting complex query syntax and multi-field filtering, and can return metadata summaries in JSON/TSV/XML formats. The Browser API is for directly retrieving a single record’s full information and data files, primarily returning XML. In short: the Portal is suitable for "searching a batch," and the Browser is suitable for "retrieving one."

Which data formats does ENA Database support?
Metadata formats: XML (ENA native format), JSON (Portal API), TSV/CSV (tabular summaries). Sequence data: FASTQ (raw reads), BAM/CRAM (aligned sequences), FASTA (assembly sequences), EMBL flat file (annotated sequences). Choose the appropriate format according to your analysis needs.

ena-database

Author

Category

Install