EnsDb-lengths.Rd
These methods allow to calculate the lengths of features (transcripts, genes,
CDS, 3' or 5' UTRs) defined in an EnsDb
object or database.
# S4 method for EnsDb
lengthOf(x, of="gene", filter = AnnotationFilterList())
(In alphabetic order)
A filter describing which results to retrieve from the database. Can
be a single object extending
AnnotationFilter
, an
AnnotationFilterList
object
combining several such objects or a formula
representing a
filter expression (see examples below or
AnnotationFilter
for more details).
for lengthOf
: whether the length of genes or
transcripts should be retrieved from the database.
For lengthOf
: either an EnsDb
or a
GRangesList
object. For all other methods an EnsDb
instance.
Retrieve the length of genes or transcripts from the database. The length is the sum of the lengths of all exons of a transcript or a gene. In the latter case the exons are first reduced so that the length corresponds to the part of the genomic sequence covered by the exons.
Note: in addition to this method, also the
transcriptLengths
function in the
GenomicFeatures
package can be used.
For lengthOf
: see method description above.
library(EnsDb.Hsapiens.v86)
edb <- EnsDb.Hsapiens.v86
##### lengthOf
##
## length of a specific gene.
lengthOf(edb, filter = GeneIdFilter("ENSG00000000003"))
#> ENSG00000000003
#> 4535
## length of a transcript
lengthOf(edb, of = "tx", filter = TxIdFilter("ENST00000494424"))
#> ENST00000494424
#> 820
## Average length of all protein coding genes encoded on chromosomes X
mean(lengthOf(edb, of = "gene",
filter = ~ gene_biotype == "protein_coding" &
seq_name == "X"))
#> [1] 3934.111
## Average length of all snoRNAs
mean(lengthOf(edb, of = "gene",
filter = ~ gene_biotype == "snoRNA" &
seq_name == "X"))
#> [1] 125.8478
##### transcriptLengths
##
## Calculate the length of transcripts encoded on chromosome Y, including
## length of the CDS, 5' and 3' UTR.
len <- transcriptLengths(edb, with.cds_len = TRUE, with.utr5_len = TRUE,
with.utr3_len = TRUE, filter = SeqNameFilter("Y"))
head(len)
#> tx_id tx_name gene_id nexon tx_len
#> ENST00000516032 ENST00000516032 ENST00000516032 ENSG00000251841 1 105
#> ENST00000383070 ENST00000383070 ENST00000383070 ENSG00000184895 1 845
#> ENST00000454281 ENST00000454281 ENST00000454281 ENSG00000237659 1 502
#> ENST00000430735 ENST00000430735 ENST00000430735 ENSG00000232195 1 237
#> ENST00000250784 ENST00000250784 ENST00000250784 ENSG00000129824 7 1305
#> ENST00000430575 ENST00000430575 ENST00000430575 ENSG00000129824 7 811
#> cds_len utr5_len utr3_len
#> ENST00000516032 0 0 0
#> ENST00000383070 615 96 134
#> ENST00000454281 0 0 0
#> ENST00000430735 0 0 0
#> ENST00000250784 792 139 374
#> ENST00000430575 787 24 0