Map genomic coordinates to transcript coordinates

genomeToTranscript maps genomic coordinates to positions within the transcript (if at the provided genomic position a transcript is encoded). The function does only support mapping of genomic coordinates that are completely within the genomic region at which an exon is encoded. If the genomic region crosses the exon boundary an empty IRanges is returned. See examples for details.

genomeToTranscript(x, db)

Arguments

x: GRanges object with the genomic coordinates that should be mapped.
db: EnsDb object.

Value

An IRangesList with length equal to length(x). Each element providing the mapping(s) to position within any encoded transcripts at the respective genomic location as an IRanges object. An IRanges with negative start coordinates is returned, if the provided genomic coordinates are not completely within the genomic coordinates of an exon.

The ID of the exon and its rank (index of the exon in the transcript) are provided in the result's IRanges metadata columns as well as the genomic position of x.

Details

The function first retrieves all exons overlapping the provided genomic coordinates and identifies then exons that are fully containing the coordinates in x. The transcript-relative coordinates are calculated based on the relative position of the provided genomic coordinates in this exon.

Note

The function throws a warning and returns an empty IRanges object if the genomic coordinates can not be mapped to a transcript.

Author

Johannes Rainer

Examples


library(EnsDb.Hsapiens.v86)

## Subsetting the EnsDb object to chromosome X only to speed up execution
## time of examples
edbx <- filter(EnsDb.Hsapiens.v86, filter = ~ seq_name == "X")

## Define a genomic region and calculate within-transcript coordinates
gnm <- GRanges("X:107716399-107716401")

res <- genomeToTranscript(gnm, edbx)
## Result is an IRanges object with the start and end coordinates within
## each transcript that has an exon at the genomic range.
res
#> IRangesList object of length 1:
#> [[1]]
#> IRanges object with 2 ranges and 7 metadata columns:
#>                       start       end     width |           tx_id
#>                   <integer> <integer> <integer> |     <character>
#>   ENST00000372390       145       147         3 | ENST00000372390
#>   ENST00000486554         1         3         3 | ENST00000486554
#>                           exon_id exon_rank seq_start   seq_end    seq_name
#>                       <character> <integer> <integer> <integer> <character>
#>   ENST00000372390 ENSE00001457675         1 107716399 107716401           X
#>   ENST00000486554 ENSE00001927337         1 107716399 107716401           X
#>                    seq_strand
#>                   <character>
#>   ENST00000372390           *
#>   ENST00000486554           *
#> 

## An IRanges with negative coordinates is returned if at the provided
## position no exon is present. Below we use the same coordinates but
## specify that the coordinates are on the forward (+) strand
gnm <- GRanges("X:107716399-107716401:+")
genomeToTranscript(gnm, edbx)
#> Warning: 1 genomic region(s) could not be mapped to a transcript; hint: see ?seqlevelsStyle if you used UCSC chromosome names
#> IRangesList object of length 1:
#> [[1]]
#> IRanges object with 1 range and 7 metadata columns:
#>           start       end     width |       tx_id     exon_id exon_rank
#>       <integer> <integer> <integer> | <character> <character> <integer>
#>   [1]        -1        -1         1 |        <NA>        <NA>      <NA>
#>       seq_start   seq_end    seq_name  seq_strand
#>       <integer> <integer> <character> <character>
#>   [1] 107716399 107716401           X           +
#> 

## Next we provide multiple genomic positions.
gnm <- GRanges("X", IRanges(start = c(644635, 107716399, 107716399),
    end = c(644639, 107716401, 107716401)), strand = c("*", "*", "+"))

## The result of the mapping is an IRangesList each element providing the
## within-transcript coordinates for each input region
genomeToTranscript(gnm, edbx)
#> Warning: 1 genomic region(s) could not be mapped to a transcript; hint: see ?seqlevelsStyle if you used UCSC chromosome names
#> IRangesList object of length 3:
#> [[1]]
#> IRanges object with 2 ranges and 7 metadata columns:
#>                       start       end     width |           tx_id
#>                   <integer> <integer> <integer> |     <character>
#>   ENST00000381578      1569      1573         5 | ENST00000381578
#>   ENST00000554971       969       973         5 | ENST00000554971
#>                           exon_id exon_rank seq_start   seq_end    seq_name
#>                       <character> <integer> <integer> <integer> <character>
#>   ENST00000381578 ENSE00001489174         6    644635    644639           X
#>   ENST00000554971 ENSE00002438186         5    644635    644639           X
#>                    seq_strand
#>                   <character>
#>   ENST00000381578           *
#>   ENST00000554971           *
#> 
#> [[2]]
#> IRanges object with 2 ranges and 7 metadata columns:
#>                       start       end     width |           tx_id
#>                   <integer> <integer> <integer> |     <character>
#>   ENST00000372390       145       147         3 | ENST00000372390
#>   ENST00000486554         1         3         3 | ENST00000486554
#>                           exon_id exon_rank seq_start   seq_end    seq_name
#>                       <character> <integer> <integer> <integer> <character>
#>   ENST00000372390 ENSE00001457675         1 107716399 107716401           X
#>   ENST00000486554 ENSE00001927337         1 107716399 107716401           X
#>                    seq_strand
#>                   <character>
#>   ENST00000372390           *
#>   ENST00000486554           *
#> 
#> [[3]]
#> IRanges object with 1 range and 7 metadata columns:
#>        start       end     width |       tx_id     exon_id exon_rank seq_start
#>    <integer> <integer> <integer> | <character> <character> <integer> <integer>
#>           -1        -1         1 |        <NA>        <NA>      <NA> 107716399
#>      seq_end    seq_name  seq_strand
#>    <integer> <character> <character>
#>    107716401           X           +
#>