Calculating lengths of features

These methods allow to calculate the lengths of features (transcripts, genes, CDS, 3' or 5' UTRs) defined in an EnsDb object or database.

# S4 method for EnsDb
lengthOf(x, of="gene", filter = AnnotationFilterList())

Arguments

(In alphabetic order)

filter: A filter describing which results to retrieve from the database. Can be a single object extending AnnotationFilter, an AnnotationFilterList object combining several such objects or a formula representing a filter expression (see examples below or AnnotationFilter for more details).
of: for lengthOf: whether the length of genes or transcripts should be retrieved from the database.
x: For lengthOf: either an EnsDb or a GRangesList object. For all other methods an EnsDb instance.

Methods and Functions

lengthOf

Retrieve the length of genes or transcripts from the database. The length is the sum of the lengths of all exons of a transcript or a gene. In the latter case the exons are first reduced so that the length corresponds to the part of the genomic sequence covered by the exons.

Note: in addition to this method, also the transcriptLengths function in the GenomicFeatures package can be used.

Value

For lengthOf: see method description above.

Author

Johannes Rainer

Examples


library(EnsDb.Hsapiens.v86)
edb <- EnsDb.Hsapiens.v86

#####    lengthOf
##
## length of a specific gene.
lengthOf(edb, filter = GeneIdFilter("ENSG00000000003"))
#> ENSG00000000003 
#>            4535 

## length of a transcript
lengthOf(edb, of = "tx", filter = TxIdFilter("ENST00000494424"))
#> ENST00000494424 
#>             820 

## Average length of all protein coding genes encoded on chromosomes X
mean(lengthOf(edb, of = "gene",
              filter = ~ gene_biotype == "protein_coding" &
                  seq_name == "X"))
#> [1] 3934.111

## Average length of all snoRNAs
mean(lengthOf(edb, of = "gene",
              filter = ~ gene_biotype == "snoRNA" &
                  seq_name == "X"))
#> [1] 125.8478

##### transcriptLengths
##
## Calculate the length of transcripts encoded on chromosome Y, including
## length of the CDS, 5' and 3' UTR.
len <- transcriptLengths(edb, with.cds_len = TRUE, with.utr5_len = TRUE,
                         with.utr3_len = TRUE, filter = SeqNameFilter("Y"))
head(len)
#>                           tx_id         tx_name         gene_id nexon tx_len
#> ENST00000516032 ENST00000516032 ENST00000516032 ENSG00000251841     1    105
#> ENST00000383070 ENST00000383070 ENST00000383070 ENSG00000184895     1    845
#> ENST00000454281 ENST00000454281 ENST00000454281 ENSG00000237659     1    502
#> ENST00000430735 ENST00000430735 ENST00000430735 ENSG00000232195     1    237
#> ENST00000250784 ENST00000250784 ENST00000250784 ENSG00000129824     7   1305
#> ENST00000430575 ENST00000430575 ENST00000430575 ENSG00000129824     7    811
#>                 cds_len utr5_len utr3_len
#> ENST00000516032       0        0        0
#> ENST00000383070     615       96      134
#> ENST00000454281       0        0        0
#> ENST00000430735       0        0        0
#> ENST00000250784     792      139      374
#> ENST00000430575     787       24        0

Arguments

Methods and Functions

Value

Author

See also

Examples