Map transcript-relative coordinates to genomic coordinates

transcriptToGenome maps transcript-relative coordinates to genomic coordinates. Provided coordinates are expected to be relative to the first nucleotide of the transcript, not the CDS. CDS-relative coordinates have to be converted to transcript-relative positions first with the cdsToTranscript() function.

transcriptToGenome(x, db, id = "name")

Arguments

x: IRanges with the coordinates within the transcript. Coordinates are counted from the start of the transcript (including the 5' UTR). The Ensembl IDs of the corresponding transcripts have to be provided either as names of the IRanges, or in one of its metadata columns.
db: EnsDb object.
id: character(1) specifying where the transcript identifier can be found. Has to be either "name" or one of colnames(mcols(prng)).

Value

GRangesList with the same length (and order) than the input IRanges

x. Each GRanges in the GRangesList provides the genomic coordinates corresponding to the provided within-transcript coordinates. The original transcript ID and the transcript-relative coordinates are provided as metadata columns as well as the ID of the individual exon(s). An empty GRanges is returned for transcripts that can not be found in the database.

Author

Johannes Rainer

Examples


library(EnsDb.Hsapiens.v86)
## Restrict all further queries to chromosome x to speed up the examples
edbx <- filter(EnsDb.Hsapiens.v86, filter = ~ seq_name == "X")

## Below we map positions 1 to 5 within the transcript ENST00000381578 to
## the genome. The ID of the transcript has to be provided either as names
## or in one of the IRanges' metadata columns
txpos <- IRanges(start = 1, end = 5, names = "ENST00000381578")

transcriptToGenome(txpos, edbx)
#> GRangesList object of length 1:
#> $ENST00000381578
#> GRanges object with 1 range and 5 metadata columns:
#>       seqnames        ranges strand |         exon_id           tx_id exon_rank
#>          <Rle>     <IRanges>  <Rle> |     <character>     <character> <integer>
#>   [1]        X 624344-624348      + | ENSE00001489178 ENST00000381578         1
#>        tx_start    tx_end
#>       <integer> <integer>
#>   [1]         1         5
#>   -------
#>   seqinfo: 1 sequence from GRCh38 genome
#> 
## The object returns a GRangesList with the genomic coordinates, in this
## example the coordinates are within the same exon and map to a single
## genomic region.

## Next we map nucleotides 501 to 505 of ENST00000486554 to the genome
txpos <- IRanges(start = 501, end = 505, names = "ENST00000486554")

transcriptToGenome(txpos, edbx)
#> GRangesList object of length 1:
#> $ENST00000486554
#> GRanges object with 2 ranges and 5 metadata columns:
#>       seqnames              ranges strand |         exon_id           tx_id
#>          <Rle>           <IRanges>  <Rle> |     <character>     <character>
#>   [1]        X 107715899-107715901      - | ENSE00001927337 ENST00000486554
#>   [2]        X 107714748-107714749      - | ENSE00001837666 ENST00000486554
#>       exon_rank  tx_start    tx_end
#>       <integer> <integer> <integer>
#>   [1]         1       501       505
#>   [2]         2       501       505
#>   -------
#>   seqinfo: 1 sequence from GRCh38 genome
#> 
## The positions within the transcript are located within two of the
## transcripts exons and thus a `GRanges` of length 2 is returned.

## Next we map multiple regions, two within the same transcript and one
## in a transcript that does not exist.
txpos <- IRanges(start = c(501, 1, 5), end = c(505, 10, 6),
    names = c("ENST00000486554", "ENST00000486554", "some"))

res <- transcriptToGenome(txpos, edbx)
#> Warning: 1 transcript(s) could either not be found in the database or the specified range is outside the transcript's sequence

## The length of the result GRangesList has the same length than the
## input IRanges
length(res)
#> [1] 3

## The result for the last region is an empty GRanges, because the
## transcript could not be found in the database
res[[3]]
#> GRanges object with 0 ranges and 0 metadata columns:
#>    seqnames    ranges strand
#>       <Rle> <IRanges>  <Rle>
#>   -------
#>   seqinfo: no sequences

res
#> GRangesList object of length 3:
#> $ENST00000486554
#> GRanges object with 2 ranges and 5 metadata columns:
#>       seqnames              ranges strand |         exon_id           tx_id
#>          <Rle>           <IRanges>  <Rle> |     <character>     <character>
#>   [1]        X 107715899-107715901      - | ENSE00001927337 ENST00000486554
#>   [2]        X 107714748-107714749      - | ENSE00001837666 ENST00000486554
#>       exon_rank  tx_start    tx_end
#>       <integer> <integer> <integer>
#>   [1]         1       501       505
#>   [2]         2       501       505
#>   -------
#>   seqinfo: 1 sequence from GRCh38 genome
#> 
#> $ENST00000486554
#> GRanges object with 1 range and 5 metadata columns:
#>       seqnames              ranges strand |         exon_id           tx_id
#>          <Rle>           <IRanges>  <Rle> |     <character>     <character>
#>   [1]        X 107716392-107716401      - | ENSE00001927337 ENST00000486554
#>       exon_rank  tx_start    tx_end
#>       <integer> <integer> <integer>
#>   [1]         1         1        10
#>   -------
#>   seqinfo: 1 sequence from GRCh38 genome
#> 
#> $some
#> GRanges object with 0 ranges and 0 metadata columns:
#>    seqnames    ranges strand
#>       <Rle> <IRanges>  <Rle>
#>   -------
#>   seqinfo: no sequences
#>

Map transcript-relative coordinates to genomic coordinates

Arguments

Value

See also

Author

Examples