MetaboAnnotation: simplifying metabolite annotation

¹Institute for Biomedicine, Eurac Research, Italy
²Helmholtz Center Munich, Germany
https://github.com/jorainer/MetaboAnnotationIntro

Simplify annotation process and handling of matched results.
matchMz and matchSpectra functions.
Matching configured with specific Param object.
Tutorial.

`matchMz`

Annotation using mass or m/z and/or retention time.
matchMz(query, target, param)
query: features to annotate. Can be numeric, data.frame or SummarizedExperiment.
target: annotations, can be numeric, data.frame, CompDb (not yet).
param:
- MzParam: match query and target m/z values.
- MzRtParam: same as above with additional retention times.
- Mass2MzParam: target provides exact masses. m/z for (specified) adducts are calculated and matched.
- Mass2MzRtParam: same as above with additional retention times.
- … suggest your own …

The result

Matched object: contains query, target and parameter (reproducibility).

The result

Matched object: contains query, target and parameter (reproducibility).

The result

Matched object: contains query, target and parameter (reproducibility).

The result

Matched object: contains query, target and parameter (reproducibility).

The result

Matched object: contains query, target and parameter (reproducibility).

`matchSpectra`

Match query MS2 spectra against reference.
matchSpectra(query, target, param)
query: Spectra.
target: Spectra (e.g. representing MassBank data).
param:
- CompareSpectraParam: match spectra with score above threshold. Pre-filter by precursor m/z or presence of certain peak.
- MatchForwardReverseParam: same as above, but calculates also the reverse score.
- … suggest your own …

Outlook/TODOs

Integration of CompDb (and IonDb = + retention times) databases for matchMz.
Additional spectra similarity calculation methods? GNPS?
Improve handling of Matched and MatchedSpectra objects?

Example

The query.

library(MetaboAnnotation)
ms1_features <- read.table(system.file("extdata", "MS1_example.txt",
                                       package = "MetaboAnnotation"),
                           header = TRUE, sep = "\t")
head(ms1_features)

##     feature_id       mz    rtime
## 1 Cluster_0001 102.1281 1.560147
## 2 Cluster_0002 102.1279 2.153590
## 3 Cluster_0003 102.1281 2.925570
## 4 Cluster_0004 102.1281 3.419617
## 5 Cluster_0005 102.1270 5.801039
## 6 Cluster_0006 102.1230 8.137535

Example

The target data.

target_df <- read.table(system.file("extdata", "LipidMaps_CompDB.txt",
                                    package = "MetaboAnnotation"),
                        header = TRUE, sep = "\t")
head(target_df)

##   headgroup        name exactmass    formula chain_type
## 1       NAE  NAE 20:4;O  363.2773  C22H37NO3       even
## 2       NAT  NAT 20:4;O  427.2392 C22H37NO5S       even
## 3       NAE NAE 20:3;O2  381.2879  C22H39NO4       even
## 4       NAE    NAE 20:4  347.2824  C22H37NO2       even
## 5       NAE    NAE 18:2  323.2824  C20H37NO2       even
## 6       NAE    NAE 18:3  321.2668  C20H35NO2       even

Example

parm <- Mass2MzParam(adducts = c("[M+H]+", "[M+Na]+"),
                           tolerance = 0.005, ppm = 0)

matched_features <- matchMz(ms1_features, target_df, parm)
matched_features

## Object of class Matched 
## Total number of matches: 9173 
## Number of query objects: 2842 (1969 matched)
## Number of target objects: 57599 (3296 matched)

Example

whichQuery, whichTarget to get the indices of matched elements.
colnames to return the available columns names.

colnames(matched_features)

##  [1] "feature_id"        "mz"                "rtime"            
##  [4] "target_headgroup"  "target_name"       "target_exactmass" 
##  [7] "target_formula"    "target_chain_type" "adduct"           
## [10] "score"

Prefix "target_" is used for column names of the target.

Example

Extract matched elements.

matchedData(matched_features, c("feature_id", "adduct", "target_name"))

## DataFrame with 10046 rows and 3 columns
##        feature_id      adduct     target_name
##       <character> <character>     <character>
## 1    Cluster_0001          NA              NA
## 2    Cluster_0002          NA              NA
## ...           ...         ...             ...
## 2841 Cluster_2841     [M+Na]+    ACer 60:1;O4
## 2842 Cluster_2842      [M+H]+ Hex2Cer 42:2;O2

Example

Reduce the target to only matching elements.

matched_features

## Object of class Matched 
## Total number of matches: 9173 
## Number of query objects: 2842 (1969 matched)
## Number of target objects: 57599 (3296 matched)

matched_features <- pruneTarget(matched_features)
matched_features

## Object of class Matched 
## Total number of matches: 9173 
## Number of query objects: 2842 (1969 matched)
## Number of target objects: 3296 (3296 matched)

Example

Reduce the query to contain only matching elements.

matched_features

## Object of class Matched 
## Total number of matches: 9173 
## Number of query objects: 2842 (1969 matched)
## Number of target objects: 3296 (3296 matched)

matched_features <- matched_features[whichQuery(matched_features)]
matched_features

## Object of class Matched 
## Total number of matches: 9173 
## Number of query objects: 1969 (1969 matched)
## Number of target objects: 3296 (3296 matched)