genomeToTranscript
maps genomic coordinates to positions within the
transcript (if at the provided genomic position a transcript is encoded).
The function does only support mapping of genomic coordinates that are
completely within the genomic region at which an exon is encoded. If the
genomic region crosses the exon boundary an empty IRanges
is returned.
See examples for details.
genomeToTranscript(x, db)
GRanges
object with the genomic coordinates that should be
mapped.
EnsDb
object.
An IRangesList
with length equal to length(x)
. Each element providing
the mapping(s) to position within any encoded transcripts at the respective
genomic location as an IRanges
object. An IRanges
with negative start
coordinates is returned, if the provided genomic coordinates are not
completely within the genomic coordinates of an exon.
The ID of the exon and its rank (index of the exon in the transcript) are
provided in the result's IRanges
metadata columns as well as the genomic
position of x
.
The function first retrieves all exons overlapping the provided genomic
coordinates and identifies then exons that are fully containing the
coordinates in x
. The transcript-relative coordinates are calculated based
on the relative position of the provided genomic coordinates in this exon.
The function throws a warning and returns an empty IRanges
object if the
genomic coordinates can not be mapped to a transcript.
Other coordinate mapping functions:
cdsToTranscript()
,
genomeToProtein()
,
proteinToGenome()
,
proteinToTranscript()
,
transcriptToCds()
,
transcriptToGenome()
,
transcriptToProtein()
library(EnsDb.Hsapiens.v86)
## Subsetting the EnsDb object to chromosome X only to speed up execution
## time of examples
edbx <- filter(EnsDb.Hsapiens.v86, filter = ~ seq_name == "X")
## Define a genomic region and calculate within-transcript coordinates
gnm <- GRanges("X:107716399-107716401")
res <- genomeToTranscript(gnm, edbx)
## Result is an IRanges object with the start and end coordinates within
## each transcript that has an exon at the genomic range.
res
#> IRangesList object of length 1:
#> [[1]]
#> IRanges object with 2 ranges and 7 metadata columns:
#> start end width | tx_id
#> <integer> <integer> <integer> | <character>
#> ENST00000372390 145 147 3 | ENST00000372390
#> ENST00000486554 1 3 3 | ENST00000486554
#> exon_id exon_rank seq_start seq_end seq_name
#> <character> <integer> <integer> <integer> <character>
#> ENST00000372390 ENSE00001457675 1 107716399 107716401 X
#> ENST00000486554 ENSE00001927337 1 107716399 107716401 X
#> seq_strand
#> <character>
#> ENST00000372390 *
#> ENST00000486554 *
#>
## An IRanges with negative coordinates is returned if at the provided
## position no exon is present. Below we use the same coordinates but
## specify that the coordinates are on the forward (+) strand
gnm <- GRanges("X:107716399-107716401:+")
genomeToTranscript(gnm, edbx)
#> Warning: 1 genomic region(s) could not be mapped to a transcript; hint: see ?seqlevelsStyle if you used UCSC chromosome names
#> IRangesList object of length 1:
#> [[1]]
#> IRanges object with 1 range and 7 metadata columns:
#> start end width | tx_id exon_id exon_rank
#> <integer> <integer> <integer> | <character> <character> <integer>
#> [1] -1 -1 1 | <NA> <NA> <NA>
#> seq_start seq_end seq_name seq_strand
#> <integer> <integer> <character> <character>
#> [1] 107716399 107716401 X +
#>
## Next we provide multiple genomic positions.
gnm <- GRanges("X", IRanges(start = c(644635, 107716399, 107716399),
end = c(644639, 107716401, 107716401)), strand = c("*", "*", "+"))
## The result of the mapping is an IRangesList each element providing the
## within-transcript coordinates for each input region
genomeToTranscript(gnm, edbx)
#> Warning: 1 genomic region(s) could not be mapped to a transcript; hint: see ?seqlevelsStyle if you used UCSC chromosome names
#> IRangesList object of length 3:
#> [[1]]
#> IRanges object with 2 ranges and 7 metadata columns:
#> start end width | tx_id
#> <integer> <integer> <integer> | <character>
#> ENST00000381578 1569 1573 5 | ENST00000381578
#> ENST00000554971 969 973 5 | ENST00000554971
#> exon_id exon_rank seq_start seq_end seq_name
#> <character> <integer> <integer> <integer> <character>
#> ENST00000381578 ENSE00001489174 6 644635 644639 X
#> ENST00000554971 ENSE00002438186 5 644635 644639 X
#> seq_strand
#> <character>
#> ENST00000381578 *
#> ENST00000554971 *
#>
#> [[2]]
#> IRanges object with 2 ranges and 7 metadata columns:
#> start end width | tx_id
#> <integer> <integer> <integer> | <character>
#> ENST00000372390 145 147 3 | ENST00000372390
#> ENST00000486554 1 3 3 | ENST00000486554
#> exon_id exon_rank seq_start seq_end seq_name
#> <character> <integer> <integer> <integer> <character>
#> ENST00000372390 ENSE00001457675 1 107716399 107716401 X
#> ENST00000486554 ENSE00001927337 1 107716399 107716401 X
#> seq_strand
#> <character>
#> ENST00000372390 *
#> ENST00000486554 *
#>
#> [[3]]
#> IRanges object with 1 range and 7 metadata columns:
#> start end width | tx_id exon_id exon_rank seq_start
#> <integer> <integer> <integer> | <character> <character> <integer> <integer>
#> -1 -1 1 | <NA> <NA> <NA> 107716399
#> seq_end seq_name seq_strand
#> <integer> <character> <character>
#> 107716401 X +
#>