Ensembl Regulation (funcgen) API Tutorial
Introduction
The Ensembl Regulation team deals with functional genomics data. The API and databases for Ensembl Regulation are called Funcgen.
This tutorial is an introduction to the Funcgen API. Knowledge of the Ensembl Core API and of the coding conventions used in the Ensembl APIs is assumed.
Documentation about the Regulation database schema is available here, and while not necessary for this tutorial, an understanding of the database tables may help as many of the adaptor modules are table-specific.
Regulatory Features
RegulatoryFeatures are features involved with regulatory aspects like:
- Predicted promoters,
- Predicted promoter flanking regions,
- Predicted enhancer regions,
- CTCF Binding Sites,
- Transcription factor binding sites or
- Open chromatin regions.
They are generated by the Ensembl Regulatory Build.
To fetch RegulatoryFeatures from the funcgen database, you need to use the corresponding adaptor. To obtain all the regulatory features present in a given region of the genome, use the adaptor method fetch_all_by_Slice:
use strict; use warnings; use Bio::EnsEMBL::Registry; my $registry = 'Bio::EnsEMBL::Registry'; $registry->load_registry_from_db( -host => 'ensembldb.ensembl.org', -user => 'anonymous' ); # get the SliceAdaptor and Slice my $slice_adaptor = $registry->get_adaptor('Human', 'Core', 'Slice'); my $slice = $slice_adaptor->fetch_by_region('chromosome', 1, 54_960_000, 54_980_000); # Get the RegulatoryFeatureAdaptor and fetch all RegulatoryFeatures by Slice my $regulatory_feature_adaptor = $registry->get_adaptor('Human', 'Funcgen', 'RegulatoryFeature'); my @regulatory_features = @{$regulatory_feature_adaptor->fetch_all_by_Slice($slice)}; # Move through the regulatory features and print information about them foreach my $current_regulatory_feature (@regulatory_features) { print $current_regulatory_feature->stable_id, "\t", $current_regulatory_feature->feature_type->name, "\n"; }
- SliceAdaptor (Core API)
- RegulatoryFeatureAdaptor
- Slice (Core API)
- RegulatoryFeature
Registry
You can also narrow down by FeatureType. To do this, you need to specify the FeatureType using the FeatureTypeAdaptor.
use strict; use warnings; use Bio::EnsEMBL::Registry; my $registry = 'Bio::EnsEMBL::Registry'; $registry->load_registry_from_db( -host => 'ensembldb.ensembl.org', -user => 'anonymous' ); # get the SliceAdaptor and Slice my $slice_adaptor = $registry->get_adaptor('Human', 'Core', 'Slice'); my $slice = $slice_adaptor->fetch_by_region('chromosome', 17, 64000000, 64050000); # Get the FeatureTypeAdaptor and specify the FeatureType my $feature_type_adaptor = $registry->get_adaptor('Human', 'Funcgen', 'FeatureType'); my $feature_type = $feature_type_adaptor->fetch_by_name("Promoter"); # Get the RegulatoryFeatureAdaptor and fetch all RegulatoryFeatures by Slice my $regulatory_feature_adaptor = $registry->get_adaptor('Human', 'Funcgen', 'RegulatoryFeature'); my @regulatory_features = @{$regulatory_feature_adaptor->fetch_all_by_Slice_FeatureType($slice, $feature_type)}; # Move through the regulatory features and print information about them foreach my $current_regulatory_feature (@regulatory_features) { print $current_regulatory_feature->stable_id, "\t", $current_regulatory_feature->seq_region_start, "-", $current_regulatory_feature->seq_region_end, "\n"; }
Regulatory Activities
For every regulatory feature the Ensembl Regulatory Build predicts the regulatory activity of the regulatory feature in each of the epigenomes of the regulatory build. For every epigenome there are five possible activities:
- Active
- Poised (Has both active and repressive marks, "ready to go"),
- Inactive,
- Repressed,
- NA (No data available for this epigenome)
The regulatory activities have their own object and can be queried like this:
use strict; use warnings; use Bio::EnsEMBL::Registry; my $registry = 'Bio::EnsEMBL::Registry'; $registry->load_registry_from_db( -host => 'ensembldb.ensembl.org', -user => 'anonymous' ); # get the regulatory feature by ID my $regulatory_feature_id = 'ENSR00000358244'; my $regulatory_feature_adaptor = $registry->get_adaptor('Human', 'Funcgen', 'RegulatoryFeature'); my $regulatory_feature = $regulatory_feature_adaptor->fetch_by_stable_id($regulatory_feature_id); # print information about the feature print "The ", $regulatory_feature->get_FeatureType->name, " with stable id: " . $regulatory_feature->stable_id . " has the following activities: \n"; # Get the activity my $regulatory_activity_adaptor = $registry->get_adaptor('homo_sapiens', 'funcgen', 'RegulatoryActivity'); my $regulatory_activity_list = $regulatory_activity_adaptor->fetch_all_by_RegulatoryFeature($regulatory_feature); # print the activity foreach my $current_regulatory_activity (@$regulatory_activity_list) { print "\tIn the epigenome ", $current_regulatory_activity->get_Epigenome->short_name, " it is ", $current_regulatory_activity->activity, "\n"; }
Peaks: Enriched regions from ChIP-seq and other high throughput experiments
Regulatory Features are built based on results from experiments like Dnase1 sensitivity assays (Dnase-Seq) to detect regions of open chromatin, or transcription factor binding assays, like Chromatin immunoprecipitation coupled with high throughput sequencing (ChIP-Seq). ChIP-Seq studies are also used to detect histone modifications (eg. H3K36 trimethylation) and Polymerase binding sites. Results from these experiments are stored as Peaks.
Peaks have these properties:
- Score. An analysis-dependent value (eg. peak-caller score)
- The peak Summit. Precise 1bp position within the peak with the highest read density in a ChIP experiment. It is dependent on the analysis and sometimes it may not be present.
Peaks also link to an object called PeakCalling, which contains information about the Epigenome, FeatureType and Experiment.
Fetch Peaks on a Slice
Here is an example how Peaks can be fetched from a Slice and filtered by their Epigenome:
use strict; use warnings; use Bio::EnsEMBL::Registry; my $registry = 'Bio::EnsEMBL::Registry'; $registry->load_registry_from_db( -host => 'ensembldb.ensembl.org', -user => 'anonymous' ); # get the adaptors my $peak_adaptor = $registry->get_adaptor('homo_sapiens', 'funcgen', 'Peak'); my $slice_adaptor = $registry->get_adaptor('homo_sapiens', 'core', 'Slice'); # Fetch a Slice my $slice = $slice_adaptor->fetch_by_region( 'chromosome', '17', 63_992_802, 64_038_237); # Fetch all Peaks on the Slice my @peaks = @{ $peak_adaptor->fetch_all_by_Slice($slice) }; # move through the Peaks and get the PeakCalling for them while (my $peak = shift @peaks) { my $peakcalling = $peak->get_PeakCalling; # get the Epigenome for the PeakCalling and filter by those found in placenta my $epigenome = $peakcalling->fetch_Epigenome->short_name; if ($epigenome eq 'placenta') { # Print the FeatureType and location of each Peak print $peakcalling->fetch_FeatureType->name, "\t", $peak->seq_region_name, ":", $peak->seq_region_start, "-", $peak->seq_region_end, "\n"; } }
- SliceAdaptor (Core API)
- PeakAdaptor
- Slice (Core API)
- Peak
- PeakCalling
- FeatureType
- Epigenome
Motif Features: Transcription factor binding sites
Motif Features represent short genomic regions where a Transcription Factor is thought to be directly interacting with the DNA. These regions are called Transcription Factor binding sites. More information on how these sites are found in Ensembl is on the RegulatoryBuild page.
MotifFeatures can be fetched from a RegulatoryFeature or Peak using the method get_all_MotifFeatures. They also have their own Adaptor which you can use to fetch them by Slice, BindingMatrix and Epigenome.
Information about the transcription factors bound by a MotifFeature can be found using the BindingMatrix.
The following script fetches MotifFeatures from a Slice and prints their properties:
use strict; use warnings; use Bio::EnsEMBL::Registry; my $registry = 'Bio::EnsEMBL::Registry'; $registry->load_registry_from_db( -host => 'ensembldb.ensembl.org', -user => 'anonymous' ); # get the adaptors my $motif_adaptor = $registry->get_adaptor('homo_sapiens', 'funcgen', 'MotifFeature'); my $slice_adaptor = $registry->get_adaptor('homo_sapiens', 'core', 'Slice'); # Fetch a slice my $slice = $slice_adaptor->fetch_by_region( 'chromosome', '13',32315400, 32315500); # Fetch all motifs on the slice my @motifs = @{ $motif_adaptor->fetch_all_by_Slice($slice) }; # move through the motifs while (my $motif = shift @motifs) { # print the motif ID and location print $motif->stable_id, "\t", $motif->seq_region_name, ":", $motif->seq_region_start, "-", $motif->seq_region_end, "\n"; # get the transcription factors associated with the MotifFeature, going via the BindingMatrix my @transcription_factors = @{ $motif->get_BindingMatrix->get_all_TranscriptionFactors }; # create and add to an array of transcription factor names, then print it my @tfs; foreach my $tf (@transcription_factors) { push @tfs, $tf->name; } print join(", ", @tfs), "\n"; }
- SliceAdaptor (Core API)
- MotifAdaptor
- Slice (Core API)
- MotifFeature
- BindingMatrix
- TranscriptionFactor
If a Peak for the same transcription factor overlaps a MotifFeature, the MotifFeature is classed as "experimentally validated" in that Epigenome. From the MotifFeature object, you can fetch the Epigenomes where it is validated and the Peaks that correspond to it. The MotifFeatureAdaptor will also allow you to fetch by Epigenome where the MotifFeature is validated.
use strict; use warnings; use Bio::EnsEMBL::Registry; my $registry = 'Bio::EnsEMBL::Registry'; $registry->load_registry_from_db( -host => 'ensembldb.ensembl.org', -user => 'anonymous' ); # get the adaptors my $motif_adaptor = $registry->get_adaptor('homo_sapiens', 'funcgen', 'MotifFeature'); my $epigenome_adaptor = $registry->get_adaptor('homo_sapiens', 'funcgen', 'Epigenome'); my $slice_adaptor = $registry->get_adaptor('homo_sapiens', 'core', 'Slice'); # Fetch a slice and epigenome of interest my $slice = $slice_adaptor->fetch_by_region( 'chromosome', '13',32315400, 32315500); my $epigenome = $epigenome_adaptor->fetch_by_short_name('K562'); # Fetch all motifs on the slice my @motifs = @{ $motif_adaptor->fetch_all_by_Slice($slice) }; # move through the motifs while (my $motif = shift @motifs) { # filter motifs to find only those that have been verified in our epigenome of interest if ($motif->is_experimentally_verified_in_Epigenome($epigenome)){ # print the motif ID and location print $motif->stable_id, "\t", $motif->seq_region_name, ":", $motif->seq_region_start, "-", $motif->seq_region_end, "\n"; # get the peaks associated with the motif and epigenome, then print information about them my @peaks = @{ $motif->get_all_overlapping_Peaks_by_Epigenome($epigenome) }; foreach my $peak (@peaks){ print $peak->get_PeakCalling->fetch_FeatureType->name, "\t", $peak->seq_region_name, ":", $peak->seq_region_start, "-", $peak->seq_region_end, "\n"; } } }
- SliceAdaptor (Core API)
- MotifAdaptor
- EpigenomeAdaptor
- Slice (Core API)
- MotifFeature
- Epigenome
- Peak
- PeakCalling
External Features: Externally curated data
There are some Feature Sets that are either entirely or partially curated by external groups. These are stored as ExternalFeatures and can be accessed using the ExternalFeatureAdaptor.
If you know the name of a feature set, you can use the name to fetch the data, using the FeatureSetAdaptor. For example, we store data from the Vista Enhancer Browser.
The following script fetches the Vista Enhancers for a Slice.
use strict; use warnings; use Bio::EnsEMBL::Registry; my $registry = 'Bio::EnsEMBL::Registry'; $registry->load_registry_from_db( -host => 'ensembldb.ensembl.org', -user => 'anonymous' ); # Get the Slice, FeatureSet and ExternalFeatureAdaptors my $slice_adaptor = $registry->get_adaptor('homo_sapiens', 'core', 'Slice'); my $feature_set_adaptor = $registry->get_adaptor('homo_sapiens', 'funcgen', 'FeatureSet'); my $ex_feat_adaptor = $registry->get_adaptor('homo_sapiens', 'funcgen', 'ExternalFeature'); # Fetch a Slice and FeatureSet of interest my $slice = $slice_adaptor->fetch_by_region( 'chromosome', '13'); my $vista_feature_set = $feature_set_adaptor->fetch_by_name('VISTA enhancer set'); # Use the ExternalFeatureAdaptor to fetch the Vista enhancers in the Slice my @vistas = @{ $ex_feat_adaptor->fetch_all_by_Slice_FeatureSets($slice, [$vista_feature_set]); }; # Move through the Vista enhancers and print their locations while (my $vista = shift @vistas) { print $vista->seq_region_name, ":", $vista->seq_region_start, "-", $vista->seq_region_end, "\n"; }
- SliceAdaptor (Core API)
- ExternalFeatureAdaptor
- FeatureSetAdaptor
- Slice (Core API)
- FeatureSet
- ExternalFeatures
Feature Types
FeatureTypes provide a biological annotation for features. They are divided in classes forming biologically coherent groups (eg. Transcription Factors). This is different from the FeatureSet class, which just states the origin of the data. Feature Types can be accessed using the FeatureTypeAdaptor.
External FeatureTypes
FeatureTypes for ExternalFeatures have a meaning that is specific to the FeatureSet. For example, for features of the Vista FeatureSet, the feature type indicates if the feature was active or inactive in an experiment.
Microarrays and associated information
Some popular commercial microarrays are stored in the Ensembl database, with mapping to genomic regions and genes. The arrays themselves are stored as Array objects, which can be fetched with the Array Adaptor.
The following script fetches all the arrays for a species and prints information about them:
use strict; use warnings; use Bio::EnsEMBL::Registry; my $registry = 'Bio::EnsEMBL::Registry'; $registry->load_registry_from_db( -host => 'ensembldb.ensembl.org', -user => 'anonymous' ); # get the array adaptor my $array_adaptor = $registry->get_adaptor('Human','Funcgen','Array'); # fetch all arrays and move through my @arrays = @ { $array_adaptor->fetch_all }; foreach my $array (@arrays) { # Print some array info print "Array:\t", $array->name,"\nType:\t", $array->type, "\nVendor:\t", $array->vendor, "\n"; # Get some information about the array chips and print my @array_chips = @{ $array->get_ArrayChips }; foreach my $array_chip (@array_chips) { print "ArrayChip:\t", $array_chip->name, " DesignID:\t", $array_chip->design_id, "\n"; } print "\n"; }
Fetch all Probe Features from a specific Array and Probe
Probes are stored as Probe objects, which represent the probe on the array, and ProbeFeature objects, which represent the mapping of the Probe to the genome.
In this example, a Probe from the WholeGenome_4x44k_v1 array is obtained.
use strict; use warnings; use Bio::EnsEMBL::Registry; my $registry = 'Bio::EnsEMBL::Registry'; $registry->load_registry_from_db( -host => 'ensembldb.ensembl.org', -user => 'anonymous' ); # get ProbeAdaptor and use to fetch a probe from the WholeGenome_4x44k_v1 array my $probe_adaptor = $registry->get_adaptor('Human', 'Funcgen', 'Probe'); my $probe = $probe_adaptor->fetch_by_array_probe_probeset_name('WholeGenome_4x44k_v1', 'A_23_P18656'); # Fetch the feature associated with this probe my @probe_features = @{ $probe->get_all_ProbeFeatures }; #Print some info about the features foreach my $probe_feature ( @probe_features ){ print "ProbeFeature found at:\t", $probe_feature->feature_Slice->name, "\n"; }
Probe mappings to transcripts
ProbeSets represent groups of Probes, and are mapped to transcripts.
In this example, the FOXP2 transcript is fetched by its stable_id. Then all ProbeSets that have been mapped to this transcript are fetched and printed.
use strict; use warnings; use Bio::EnsEMBL::Registry; my $registry = 'Bio::EnsEMBL::Registry'; $registry->load_registry_from_db( -host => 'ensembldb.ensembl.org', -user => 'anonymous' ); my $trans_id = "ENST00000393489"; # get the ProbeSetAdaptor my $probe_set_adaptor = $registry->get_adaptor("human", "Funcgen", "ProbeSet"); # Fetch ProbeSets associated with a transcript and move through my @probesets = @{ $probe_set_adaptor->fetch_all_by_transcript_stable_id($trans_id) }; foreach my $probeset (@probesets) { # get all the Arrays the ProbeSets are found on, then make an array of their names my @arrays = @{ $probeset->get_all_Arrays }; my @arraynames; foreach my $array (@arrays) { push @arraynames, ($array->name); } # print information about the mapping print "Probeset ", $probeset->name, " on array(s) ", join(", ", @arraynames), " maps to ", $trans_id, ".\n"; }
Further help
For additional information or help mail the ensembl-dev mailing list. You will need to subscribe to this mailing list to use it. More information on subscribing to any Ensembl mailing list is available from the Ensembl Contacts page.