FTP Download

You can download via a browser from our FTP site, use a script, or even use rsync from the command line.

Globus

For rapid bulk download of files, the Ensembl FTP site is available as an end point in the Globus Online system. In order to access the data you need to sign up for an account with Globus, install the Globus Connect Personal software and setup a personal endpoint to download the data. The Ensembl data is hosted at the EMBL-EBI end point called “Shared EMBL-EBI public endpoint”. Data from the Ensembl FTP site can then be found under the "/gridftp/ensemblorg/pub" directory within the EMBL-EBI public end point.

Each directory on http://ftp.ebi.ac.uk/ensemblgenomes contains a README file, explaining the directory structure.

Species	DNA (FASTA)	cDNA (FASTA)	CDS (FASTA)	Protein sequence (FASTA)	Annotated sequence (EMBL)	Annotated sequence (GenBank)	Gene sets	Other annotations	Variation (VEP)
SARS-CoV-2	FASTA	FASTA	FASTA	FASTA	EMBL	GenBank	GTF GFF3	TSV JSON	VEP

To facilitate storage and download all databases are GNU Zip (gzip, *.gz) compressed.

About the data

The following types of data dumps are available on the FTP site.

FASTA

FASTA sequence databases of Ensembl gene, transcript and protein model predictions. Since the FASTA format does not permit sequence annotation, these database files are mainly intended for use with local sequence similarity search algorithms. Each directory has a README file with a detailed description of the header line format and the file naming conventions.

DNA: Masked and unmasked genome sequences associated with the assembly (contigs, chromosomes etc.).; The header line in an FASTA dump files containing DNA sequence consists of the following attributes : coord_system:version:name:start:end:strand This coordinate-system string is used in the Ensembl API to retrieve slices with the SliceAdaptor.
CDS: Coding sequences for Ensembl or ab initio predicted genes.
cDNA: cDNA sequences for Ensembl or ab initio predicted genes.
Peptides: Protein sequences for Ensembl or ab initio predicted genes.

Annotated sequence

Flat files allow more extensive sequence annotation by means of feature tables and contain thus the genome sequence as annotated by the automated Ensembl genome annotation pipeline. Each nucleotide sequence record in a flat file represents a 1Mb slice of the genome sequence. Flat files are broken into chunks of 1000 sequence records for easier downloading.

EMBL: Ensembl database dumps in EMBL nucleotide sequence database format
GenBank: Ensembl database dumps in GenBank nucleotide sequence database format

GTF

Gene sets for each species. These files include annotations of both coding and non-coding genes. This file format is described here.

GFF3

GFF3 provides access to all annotated transcripts which make up an Ensembl gene set. This file format is described here.

VEP (variation data)

Compressed text files (called "cache files") used by the Variant Effect Predictor tool. More information about these files is available here.