Schema Documentation

assembly

The assembly table states, which parts of seq_regions are exactly equal. It enables to transform coordinates between seq_regions. Typically this contains how chromosomes are made of contigs, clones out of contigs, and chromosomes out of supercontigs. It allows you to artificially chunk chromosome sequence into smaller parts. The data in this table defines the "static golden path", i.e. the best effort draft full genome sequence as determined by the UCSC or NCBI (depending which assembly you are using). Each row represents a component, e.g. a contig, (comp_seq_region_id, FK from seq_region table) at least part of which is present in the golden path. The part of the component that is in the path is delimited by fields cmp_start and cmp_end (start < end), and the absolute position within the golden path chromosome (or other appropriate assembled structure) (asm_seq_region_id) is given by asm_start and asm_end.

Column	Type	Default value	Description	Index
asm_seq_region_id	INT(10)	-	Assembly sequence region id. Primary key, internal identifier. Foreign key references to the seq_region table.	key: asm_seq_region_idx unique key: all_idx
cmp_seq_region_id	INT(10)	-	Component sequence region id. Foreign key references to the seq_region table.	key: cmp_seq_region_idx unique key: all_idx
asm_start	INT(10)	-	Start absolute position within the golden path chromosome.	key: asm_seq_region_idx unique key: all_idx
asm_end	INT(10)	-	End absolute position within the golden path chromosome.	unique key: all_idx
cmp_start	INT(10)	-	Component start position within the golden path chromosome.	unique key: all_idx
cmp_end	INT(10)	-	Component start position within the golden path chromosome.	unique key: all_idx
ori	TINYINT	-	Orientation: 1 - sense; -1 - antisense.	unique key: all_idx

See also:

seq_region
supercontigs

List of species with populated data: Show species

Aotus nancymaae
Caenorhabditis elegans
Canis lupus familiaris
Capra hircus
Carlito syrichta
Cavia aperea
Cavia porcellus
Cebus capucinus
Cercocebus atys
Chinchilla lanigera
Chlorocebus sabaeus
Choloepus hoffmanni
Ciona intestinalis
Ciona savignyi
Colobus angolensis palliatus
Cricetulus griseus chok1gshd
Cricetulus griseus crigri
Danio rerio
Dasypus novemcinctus
Dipodomys ordii
Echinops telfairi
Erinaceus europaeus
Felis catus
Fukomys damarensis
Gallus gallus
Gasterosteus aculeatus
Gorilla gorilla
Heterocephalus glaber female
Heterocephalus glaber male
Homo sapiens
Ictidomys tridecemlineatus
Jaculus jaculus
Latimeria chalumnae
Lepisosteus oculatus
Loxodonta africana
Macaca nemestrina
Mandrillus leucophaeus
Mesocricetus auratus
Microcebus murinus
Microtus ochrogaster
Mus caroli
Mus musculus 129s1svimj
Mus musculus aj
Mus musculus akrj
Mus musculus balbcj
Mus musculus c3hhej
Mus musculus c57bl6nj
Mus musculus casteij
Mus musculus cbaj
Mus musculus
Mus musculus dba2j
Mus musculus fvbnj
Mus musculus lpj
Mus musculus nodshiltj
Mus musculus nzohlltj
Mus musculus pwkphj
Mus musculus wsbeij
Mus pahari
Mus spretus
Mustela putorius furo
Myotis lucifugus
Nannospalax galili
Nomascus leucogenys
Notamacropus eugenii
Ochotona princeps
Octodon degus
Oryctolagus cuniculus
Otolemur garnettii
Ovis aries
Pan paniscus
Pan troglodytes
Panthera pardus
Panthera tigris altaica
Papio anubis
Pelodiscus sinensis
Petromyzon marinus
Poecilia formosa
Pongo abelii
Procavia capensis
Propithecus coquereli
Pteropus vampyrus
Rattus norvegicus
Rhinopithecus bieti
Rhinopithecus roxellana
Saccharomyces cerevisiae
Saimiri boliviensis boliviensis
Sorex araneus
Sus scrofa
Tetraodon nigroviridis
Tupaia belangeri
Tursiops truncatus
Vicugna pacos

assembly_exception

Column	Type	Default value	Description	Index
assembly_exception_id	INT(10)	-	Assembly exception sequence region id. Primary key, internal identifier.	primary key
seq_region_id	INT(10)	-	Sequence region id. Foreign key references to the seq_region table.	key: sr_idx
seq_region_start	INT(10)	-	Sequence start position.	key: sr_idx
seq_region_end	INT(10)	-	Sequence end position.
exc_type	ENUM: HAP PAR PATCH_FIX PATCH_NOVEL	NULL	Exception type, e.g. PAR, HAP - haplotype.
exc_seq_region_id	INT(10)	-	Exception sequence region id. Foreign key references to the seq_region table.	key: ex_idx
exc_seq_region_start	INT(10)	-	Exception sequence start position.	key: ex_idx
exc_seq_region_end	INT(10)	-	Exception sequence end position.
ori	INT	-	Orientation: 1 - sense; -1 - antisense.

Column	Type	Default value	Description	Index
coord_system_id	INT(10)	-	Primary key, internal identifier.	primary key
species_id	INT(10)	1	Indentifies the species for multi-species databases.	unique key: rank_idx unique key: name_idx key: species_idx
name	VARCHAR(40)	-	Co-oridinate system name, e.g. 'chromosome', 'contig', 'scaffold' etc.	unique key: name_idx
version	VARCHAR(255)	NULL	Assembly.	unique key: name_idx
rank	INT	-	Co-oridinate system rank.	unique key: rank_idx
attrib	SET: default_version sequence_level	NULL	Co-oridinate system attrib (e.g. "top_level", "sequence_level").

Column	Type	Default value	Description	Index
data_file_id	INT(10)	-	Auto-increment surrogate primary key	primary key
coord_system_id	INT(10)	-	Coordinate system this file is linked to. Used to decipher the assembly version it was mapped to	unique key: df_unq_idx
analysis_id	SMALLINT	-	Analysis this file is linked to	unique key: df_unq_idx key: df_analysis_idx
name	VARCHAR(100)	-	Name of the file	unique key: df_unq_idx key: df_name_idx
version_lock	TINYINT(1)	0	Indicates that this file is only compatible with the current Ensembl release version
absolute	TINYINT(1)	0	Flags that the URL given is fully resolved and should be used without question
url	TEXT	NULL	Optional path to the file (can be absolute or relative)
file_type	ENUM: BAM BAMCOV BIGBED BIGWIG VCF	NULL	Type of file e.g. BAM, BIGBED, BIGWIG and VCF	unique key: df_unq_idx

Column	Type	Default value	Description	Index
seq_region_id	INT(10)	-	Primary key, internal identifier. Foreign key references to the seq_region table.	primary key
sequence	LONGTEXT	-	DNA sequence.

Column	Type	Default value	Description	Index
genome_statistics_id	INT(10)	-	Primary key, internal identifier.	primary key
statistic	VARCHAR(128)	-	Name of the statistics	unique key: stats_uniq
value	BIGINT(11)	'0'	Corresponding value of the statistics (count/length)
species_id	INT	1	Indentifies the species for multi-species databases.	unique key: stats_uniq
attrib_type_id	INT(10)	NULL	To distinguish similar statistics for different cases	unique key: stats_uniq
timestamp	DATETIME	NULL	Date the statistics was generated

Column	Type	Default value	Description	Index
karyotype_id	INT(10)	-	Primary key, internal identifier.	primary key
seq_region_id	INT(10)	-	Foreign key references to the seq_region table.	key: region_band_idx
seq_region_start	INT(10)	-	Sequence start position.
seq_region_end	INT(10)	-	Sequence end position.
band	VARCHAR(40)	NULL	Band.	key: region_band_idx
stain	VARCHAR(40)	NULL	Stain.

Column	Type	Default value	Description	Index
meta_id	INT	-	Primary key, internal identifier.	primary key
species_id	INT	1	Indentifies the species for multi-species databases.	unique key: species_key_value_idx key: species_value_idx
meta_key	VARCHAR(40)	-	Name of the meta entry, e.g. "schema_version".	unique key: species_key_value_idx
meta_value	VARCHAR(255)	-	Corresponding value of the key, e.g. "61".	unique key: species_key_value_idx key: species_value_idx

Column	Type	Default value	Description	Index
table_name	VARCHAR(40)	-	Ensembl database table name.	unique key: cs_table_name_idx
coord_system_id	INT(10)	-	Foreign key references to the coord_system table.	unique key: cs_table_name_idx
max_length	INT	NULL	Longest sequence length.

Column	Type	Default value	Description	Index
seq_region_synonym_id	INT	-	Primary key, internal identifier.	primary key
seq_region_id	INT(10)	-	Foreign key references to the seq_region table.	unique key: syn_idx key: seq_region_idx
synonym	VARCHAR(250)	-	Alternative name for sequence region.	unique key: syn_idx
external_db_id	INT	NULL	Foreign key references to the external_db table.

Column	Type	Default value	Description	Index
associated_group_id	INT(10)	-	Associated group id. Primary key, internal identifier	primary key
description	VARCHAR(128)	NULL	Optional description for this group

Column	Type	Default value	Description	Index
biotype_id	INT	-	Primary key, internal identifier.	primary key
name	VARCHAR(64)	-	Ensembl biotype name.	unique key: name_type_idx
object_type	ENUM: gene transcript	'gene'	Ensembl object type: 'gene' or 'transcript'.	unique key: name_type_idx
db_type	SET: cdna core coreexpressionatlas coreexpressionest coreexpressiongnf funcgen otherfeatures rnaseq variation vega presite sangervega	'core'	Type, e.g. 'cdna', 'core', 'coreexpressionatlas', 'coreexpressionest', 'coreexpressiongnf', 'funcgen', 'otherfeatures', 'rnaseq', 'variation', 'vega', 'presite', 'sangervega'
attrib_type_id	INT	NULL	Foreign key references to the attrib_type table.
description	TEXT	NULL	Description.
biotype_group	ENUM: coding pseudogene snoncoding lnoncoding mnoncoding LRG undefined no_group	NULL	Group, e.g. 'coding', 'pseudogene', 'snoncoding', 'lnoncoding', 'mnoncoding', 'LRG', 'undefined', 'no_group'
so_acc	VARCHAR(64)	NULL	Sequence Ontology accession of the biotype.
so_term	VARCHAR(1023)	NULL	Sequence Ontology term of the biotype.

Column	Type	Default value	Description	Index
external_db_id	INT	-	Primary key, internal identifier.	primary key
db_name	VARCHAR(100)	-	Database name.	unique key: db_name_db_release_idx
db_release	VARCHAR(255)	NULL	Database release.	unique key: db_name_db_release_idx
status	ENUM: KNOWNXREF KNOWN XREF PRED ORTH PSEUDO	-	Status, e.g. 'KNOWNXREF','KNOWN','XREF','PRED','ORTH','PSEUDO'.
priority	INT	-	Determines which one of the xrefs will be used as the gene name.
db_display_name	VARCHAR(255)	NULL	Database display name.
type	ENUM: ARRAY ALT_TRANS ALT_GENE MISC LIT PRIMARY_DB_SYNONYM ENSEMBL	-	Type, e.g. 'ARRAY', 'ALT_TRANS', 'ALT_GENE', 'MISC', 'LIT', 'PRIMARY_DB_SYNONYM', 'ENSEMBL'.
secondary_db_name	VARCHAR(255)	NULL	Secondary database name.
secondary_db_table	VARCHAR(255)	NULL	Secondary database table.
description	TEXT	NULL	Description.

Column	Type	Default value	Description	Index
associated_xref_id	INT(10)	-	Associated xref id. Primary key, internal identifier	primary key
object_xref_id	INT(10)	'0'	Object xref id this associated xref is linked to. Foreign key linked to the object_xref table	key: associated_object_idx unique key: object_associated_source_type_idx
xref_id	INT(10)	'0'	Xref which is the associated term. Foreign key linked to the xref table	key: associated_idx unique key: object_associated_source_type_idx
source_xref_id	INT(10)	NULL	Xref which is source of this association. Foreign key linked to the xref table	key: associated_source_idx unique key: object_associated_source_type_idx
condition_type	VARCHAR(128)	NULL	The type of condition this link occurs in e.g. evidence, from, residue or assigned_by	unique key: object_associated_source_type_idx
associated_group_id	INT(10)	NULL	Foreign key to allow for associated_group	key: associated_group_idx unique key: object_associated_source_type_idx
rank	INT(10)	'0'	The rank in which the association occurs within an associated_group

Column	Type	Default value	Description	Index
interpro_ac	VARCHAR(40)	-	InterPro protein accession number.	unique key: accession_idx
id	VARCHAR(40)	-	InterPro protein id.	unique key: accession_idx key: id_idx

Column	Type	Default value	Description	Index
object_xref_id	INT(10)	'0'	Composite key. Foreign key references to the object_xref table.	key: object_idx unique key: object_source_type_idx
source_xref_id	INT(10)	NULL	Composite key. Foreign key references to the xref table.	key: source_idx unique key: object_source_type_idx
linkage_type	VARCHAR(3)	NULL	Composite key. Evidence tags	unique key: object_source_type_idx

Column	Type	Default value	Description	Index
unmapped_object_id	INT(10)	-	Primary key, internal identifier.	primary key
type	ENUM: xref cDNA Marker	-	Object type: 'xref', 'cDNA', 'Marker'.
analysis_id	SMALLINT	-	Foreign key references to the analysis table.	key: anal_exdb_idx
external_db_id	INT	NULL	Foreign key references to the external_db table.	unique key: unique_unmapped_obj_idx key: anal_exdb_idx key: ext_db_identifier_idx
identifier	VARCHAR(255)	-	External database identifier.	unique key: unique_unmapped_obj_idx key: id_idx key: ext_db_identifier_idx
unmapped_reason_id	INT(10)	-	Foreign key references to the unmapped_reason table.	unique key: unique_unmapped_obj_idx
query_score	DOUBLE	NULL	Actual mapping query score.
target_score	DOUBLE	NULL	Target mapping query score.
ensembl_id	INT(10)	'0'	Foreign key references to the seq_region, transcript, gene, @translation tables depending on ensembl_object_type.	unique key: unique_unmapped_obj_idx
ensembl_object_type	ENUM: RawContig Transcript Gene Translation	'RawContig'	Ensembl object type: 'RawContig', 'Transcript', 'Gene','Translation'.	unique key: unique_unmapped_obj_idx
parent	VARCHAR(255)	NULL	Foreign key references to the dependent_xref table, in case the unmapped object is dependent on a primary external reference which wasn't mapped to an ensembl one.	unique key: unique_unmapped_obj_idx

Column	Type	Default value	Description	Index
density_feature_id	INT(10)	-	Primary key, internal identifier.	primary key
density_type_id	INT(10)	-	Density type. Foreign key references to the density_type table.	key: seq_region_idx
seq_region_id	INT(10)	-	Sequence region. Foreign key references to the seq_region table.	key: seq_region_idx key: seq_region_id_idx
seq_region_start	INT(10)	-	Sequence start position.	key: seq_region_idx
seq_region_end	INT(10)	-	Sequence end position.
density_value	FLOAT	-	Density value.

Column	Type	Default value	Description	Index
ditag_id	INT(10)	-	Primary key, internal identifier.	primary key
name	VARCHAR(30)	-	Ditag name.
type	VARCHAR(30)	-	Ditag type.
tag_count	smallint(6)	1	Tag count.
sequence	TINYTEXT	-	Sequence.

Column	Type	Default value	Description	Index
ditag_feature_id	INT(10)	-	Primary key, internal identifier.	primary key
ditag_id	INT(10)	'0'	Foreign key references to the ditag table.	key: ditag_idx
ditag_pair_id	INT(10)	'0'	Ditag pair id.	key: ditag_pair_idx
seq_region_id	INT(10)	'0'	Foreign key references to the seq_region table.	key: seq_region_idx
seq_region_start	INT(10)	'0'	Sequence start position.	key: seq_region_idx
seq_region_end	INT(10)	'0'	Sequence end position.	key: seq_region_idx
seq_region_strand	TINYINT(1)	'0'	Sequence region strand: 1 - forward; -1 - reverse.
analysis_id	SMALLINT	'0'	Foreign key references to the analysis table.
hit_start	INT(10)	'0'	Alignment hit start position.
hit_end	INT(10)	'0'	Alignment hit end position.
hit_strand	TINYINT(1)	'0'	Alignment hit strand: 1 - forward; -1 - reverse.
cigar_line	TINYTEXT	-	Used to encode gapped alignments.
ditag_side	ENUM: F L R	-	Ditag side: L - start, R - end, F - 5\'tag only

Column	Type	Default value	Description	Index
intron_supporting_evidence_id	INT(10)	-	Surrogate primary key	primary key
analysis_id	SMALLINT	-	Foreign key references to the analysis table.	unique: key
seq_region_id	INT(10)	-	Foreign key references to the seq_region table.	unique: key key: seq_region_idx
seq_region_start	INT(10)	-	Sequence start position.	unique: key key: seq_region_idx
seq_region_end	INT(10)	-	Sequence end position.	unique: key
seq_region_strand	TINYINT(2)	-	Sequence region strand: 1 - forward; -1 - reverse.	unique: key
hit_name	VARCHAR(100)	-	External entity name/identifier.	unique: key
score	DECIMAL(10,3)	NULL	Score supporting the intron
score_type	ENUM: NONE DEPTH	'NONE'	The type of score e.g. NONE
is_splice_canonical	TINYINT(1)	0	Indicates if the splice junction can be considered canonical i.e. behaves according to accepted rules

Column	Type	Default value	Description	Index
map_id	INT(10)	-	Primary key, internal identifier.	primary key
map_name	VARCHAR(30)	-	Map name.

Column	Type	Default value	Description	Index
marker_id	INT(10)	-	Primary key, internal identifier.	primary key key: marker_idx
display_marker_synonym_id	INT(10)	NULL	Marker synonym.	key: display_idx
left_primer	VARCHAR(100)	-	Left primer sequence.
right_primer	VARCHAR(100)	-	Right primer sequence.
min_primer_dist	INT(10)	-	Minimum primer distance.
max_primer_dist	INT(10)	-	Maximum primer distance.
priority	INT	NULL	Priority.	key: marker_idx
type	ENUM: est microsatellite	NULL	Type, e.g. 'est', 'microsatellite'.

Ensembl Core - Schema documentation

List of the tables:

Assembly Tables

External References

Features

Fundamental Tables

ID Mapping

Misc