Ensembl Regulation (Funcgen) Schema Documentation

Introduction

This document describes the tables that make up the Ensembl Regulation schema. Tables are grouped logically by their function, and the purpose of each table is explained. This document refers to version 91 of the Ensembl Regulation schema.


List of the tables:

Main feature tables

These define the various genomics features and their relevant associated tables.

regulatory_feature

The table contains the features resulting from the regulatory build process.

See also:

regulatory_evidence

Links a regulatory feature and the epigenome (via regulatory activity) to the underlying structure of epigenetic marks that the regulatory feature has in this epigenome.

See also:

regulatory_activity

For every regulatory feature and epigenome that was a part of the regulatory build, this table links the regulatory feature to the predicted regulatory activity in this epigenome.

See also:

regulatory_build

Metadata for the regulatory build

See also:

regulatory_build_epigenome

Table that links a regulatory build to the epigenomes that were used in it.

See also:

segmentation_feature

Represents a genomic segment feature as the result of an segmentation analysis i.e. Segway or ChromHmm

See also:

segmentation_file

Table to store metadata about a segmentation file

See also:

Represents a genomic feature as the result of an analysis i.e. a ChIP or DNase1 peak call.

See also:

Represents a peak calling analysis.

See also:

The table contains genomic alignments of binding_matrix PWMs.

See also:

mirna_target_feature

The table contains imports from externally curated resources e.g. cisRED, miRanda, VISTA, redFLY etc.

See also:

associated_motif_feature

The table provides links between motif_features and annotated_features representing peaks of the relevant transcription factor.

See also:

Contains information defining a specific binding matrix(PWM) as defined by the linked analysis e.g. Jaspar.

See also:

external_feature

The table contains imports from externally curated resources e.g. cisRED, miRanda, VISTA, redFLY etc.

See also:

external_feature_file

Table to store metadata about a file with features

See also:

The table contains genomic alignments probe entries.

See also:

probe_feature_transcript

The table maps probe_features to transcripts.

See also:

Contains information about different types/classes of feature e.g. Brno nomenclature, Transcription Factor names etc.

See also:

associated_feature_type

Link table providing many to many mapping for feature_type entries.

See also:

Set tables

Sets are containers for distinct sets of raw and/or processed data.

Container for genomic features defined by the result of an analysis e.g. peaks calls or regulatory features.

See also:

peak_calling_qc_prop_reads_in_peaks

See also:

Alignment of reads from a ChIP-seq or similar experiment

See also:

alignment_read_file

Linking table to connect alignments to the reads that were aligned.

See also:

alignment_qc_chance

See also:

alignment_qc_flagstats

See also:

alignment_qc_phantom_peak

See also:

See also:

See also:

read_file_experimental_configuration

See also:

Array design tables

Contains information defining an array or array set.

Represents the individual array chip design as part of an array or array set.

See also:

The table contains information about probe sets.

See also:

probe_set_transcript

This table maps probe sets to transcripts.

See also:

Defines individual probe designs across one or more array_chips.

See also:

Probe sequences

See also:

probe_transcript

This table maps probes to transcripts.

See also:

Experiment tables

These define the experimental meta and raw data.

Represents a sequencing experiment. Sequencing runs (input_subsets) link to this.

See also:

experimental_group

Think: Consortium or laboratory that produced sequencing experiments (@see experiment).

Ancilliary tables

These contain data types which are used across many of the above tables and are quite often denormalised to store generic associations to several table, this avoids the need for multiple sets of similar tables. Some of these tables have been omitted from the schema diagram.

The epigenomes known in Ensembl regulation.

Core tables

These are exact clones of the corresponding core schema tables, hence have been omitted from the schema diagram. See <a href='../core/core_schema.html'>core schema docs for more details.

Usually describes a program and some database that together are used to create a feature on a piece of sequence. Each feature is marked with an analysis_id. The most important column is logic_name, which is used by the webteam to render a feature correctly on contigview (or even retrieve the right feature). Logic_name is also used in the pipeline to identify the analysis which has to run in a given status of the pipeline. The module column tells the pipeline which Perl module does the whole analysis, typically a RunnableDB module.

See also:

analysis_description

Allows the storage of a textual description of the analysis, as well as a "display label", primarily for the EnsEMBL web site.

See also:

Stores data about the data in the current schema. Unlike other tables, data in the meta table is stored as key-value pairs. These data include details about the database, RegulatoryBuild and patches. The species_id field of the meta table is used in multi-species databases and makes it possible to have species-specific meta key-value pairs. The species-specific meta key-value pairs needs to be repeated for each species_id. Entries in the meta table that are not specific to any one species, such as the schema.version key and any other schema-related information must have their species_id field set to NULL . The default species_id, and the only species_id value allowed in single-species databases, is 1.

Describes which co-ordinate systems the different feature tables use.

See also:

This table associates extra associated annotations with a given ontology xref evidence and source under a specific condition. For GO this allows qualifiers (with/from) or annotation extensions to be added to a given ontology annotation.

See also:

associated_group

Groups together xref associations under a single description. Used when more than one associated xref term must be used to describe a condition

Describes how well a particular xref object matches the EnsEMBL object.

See also:

external_synonym

Some xref objects can be referred to by more than one name. This table relates names to xref IDs.

See also:

Stores data about the external databases in which the objects described in the xref table are stored.

See also:

This table associates ontology terms/accessions to Ensembl objects (primarily EFO/SO). NOTE: Currently not in use

See also:

Describes the reason why a mapping failed.

See also:

Core like tables

These are almost exact clones of the corresponding core schema tables. Some contain extra fields or different enum values to support the funcgen schema. These have been omitted from the schema diagram.

Holds data about objects which are external to EnsEMBL, but need to be associated with EnsEMBL objects. Information about the database that the external object is stored in is held in the external_db table entry referred to by the external_db column.

See also:

Describes links between Ensembl objects and objects held in external databases. The Ensembl object can be one of several types; the type is held in the ensembl_object_type column. The ID of the particular Ensembl gene, translation or whatever is given in the ensembl_id column. The xref_id points to the entry in the xref table that holds data about the external object. Each Ensembl object can be associated with zero or more xrefs. An xref object can be associated with one or more Ensembl objects.

See also:

Describes why a particular external entity was not mapped to an ensembl one.

See also: