R/transcriptToX.R
transcriptToGenome.Rd
transcriptToGenome
maps transcript-relative coordinates to genomic
coordinates. Provided coordinates are expected to be relative to the first
nucleotide of the transcript, not the CDS. CDS-relative coordinates
have to be converted to transcript-relative positions first with the
cdsToTranscript()
function.
transcriptToGenome(x, db, id = "name")
IRanges
with the coordinates within the transcript. Coordinates
are counted from the start of the transcript (including the 5' UTR). The
Ensembl IDs of the corresponding transcripts have to be provided either
as names
of the IRanges
, or in one of its metadata columns.
EnsDb
object.
character(1)
specifying where the transcript identifier can be
found. Has to be either "name"
or one of colnames(mcols(prng))
.
GRangesList
with the same length (and order) than the input IRanges
x
. Each GRanges
in the GRangesList
provides the genomic coordinates
corresponding to the provided within-transcript coordinates. The
original transcript ID and the transcript-relative coordinates are provided
as metadata columns as well as the ID of the individual exon(s). An empty
GRanges
is returned for transcripts that can not be found in the database.
cdsToTranscript()
and transcriptToCds()
for the mapping between
CDS- and transcript-relative coordinates.
Other coordinate mapping functions:
cdsToTranscript()
,
genomeToProtein()
,
genomeToTranscript()
,
proteinToGenome()
,
proteinToTranscript()
,
transcriptToCds()
,
transcriptToProtein()
library(EnsDb.Hsapiens.v86)
## Restrict all further queries to chromosome x to speed up the examples
edbx <- filter(EnsDb.Hsapiens.v86, filter = ~ seq_name == "X")
## Below we map positions 1 to 5 within the transcript ENST00000381578 to
## the genome. The ID of the transcript has to be provided either as names
## or in one of the IRanges' metadata columns
txpos <- IRanges(start = 1, end = 5, names = "ENST00000381578")
transcriptToGenome(txpos, edbx)
#> GRangesList object of length 1:
#> $ENST00000381578
#> GRanges object with 1 range and 5 metadata columns:
#> seqnames ranges strand | exon_id tx_id exon_rank
#> <Rle> <IRanges> <Rle> | <character> <character> <integer>
#> [1] X 624344-624348 + | ENSE00001489178 ENST00000381578 1
#> tx_start tx_end
#> <integer> <integer>
#> [1] 1 5
#> -------
#> seqinfo: 1 sequence from GRCh38 genome
#>
## The object returns a GRangesList with the genomic coordinates, in this
## example the coordinates are within the same exon and map to a single
## genomic region.
## Next we map nucleotides 501 to 505 of ENST00000486554 to the genome
txpos <- IRanges(start = 501, end = 505, names = "ENST00000486554")
transcriptToGenome(txpos, edbx)
#> GRangesList object of length 1:
#> $ENST00000486554
#> GRanges object with 2 ranges and 5 metadata columns:
#> seqnames ranges strand | exon_id tx_id
#> <Rle> <IRanges> <Rle> | <character> <character>
#> [1] X 107715899-107715901 - | ENSE00001927337 ENST00000486554
#> [2] X 107714748-107714749 - | ENSE00001837666 ENST00000486554
#> exon_rank tx_start tx_end
#> <integer> <integer> <integer>
#> [1] 1 501 505
#> [2] 2 501 505
#> -------
#> seqinfo: 1 sequence from GRCh38 genome
#>
## The positions within the transcript are located within two of the
## transcripts exons and thus a `GRanges` of length 2 is returned.
## Next we map multiple regions, two within the same transcript and one
## in a transcript that does not exist.
txpos <- IRanges(start = c(501, 1, 5), end = c(505, 10, 6),
names = c("ENST00000486554", "ENST00000486554", "some"))
res <- transcriptToGenome(txpos, edbx)
#> Warning: 1 transcript(s) could either not be found in the database or the specified range is outside the transcript's sequence
## The length of the result GRangesList has the same length than the
## input IRanges
length(res)
#> [1] 3
## The result for the last region is an empty GRanges, because the
## transcript could not be found in the database
res[[3]]
#> GRanges object with 0 ranges and 0 metadata columns:
#> seqnames ranges strand
#> <Rle> <IRanges> <Rle>
#> -------
#> seqinfo: no sequences
res
#> GRangesList object of length 3:
#> $ENST00000486554
#> GRanges object with 2 ranges and 5 metadata columns:
#> seqnames ranges strand | exon_id tx_id
#> <Rle> <IRanges> <Rle> | <character> <character>
#> [1] X 107715899-107715901 - | ENSE00001927337 ENST00000486554
#> [2] X 107714748-107714749 - | ENSE00001837666 ENST00000486554
#> exon_rank tx_start tx_end
#> <integer> <integer> <integer>
#> [1] 1 501 505
#> [2] 2 501 505
#> -------
#> seqinfo: 1 sequence from GRCh38 genome
#>
#> $ENST00000486554
#> GRanges object with 1 range and 5 metadata columns:
#> seqnames ranges strand | exon_id tx_id
#> <Rle> <IRanges> <Rle> | <character> <character>
#> [1] X 107716392-107716401 - | ENSE00001927337 ENST00000486554
#> exon_rank tx_start tx_end
#> <integer> <integer> <integer>
#> [1] 1 1 10
#> -------
#> seqinfo: 1 sequence from GRCh38 genome
#>
#> $some
#> GRanges object with 0 ranges and 0 metadata columns:
#> seqnames ranges strand
#> <Rle> <IRanges> <Rle>
#> -------
#> seqinfo: no sequences
#>