MS/MS Spectra Matching with the `MetaboAnnotation` Package

Last modified: 2024-08-07 13:59:30.280951
Compiled: Wed Aug 7 14:05:26 2024

Overview

Introduction

The Spectra package provides all the functionality required for annotation and identification workflows for untargeted LC-MS/MS data, but, while being very flexible and customizable, it might be too cumbersome for beginners or analysts not accustomed with R. To fill this gap we developed the MetaboAnnotation package that builds upon Spectra and provides functions for annotation of LC-MS and LC-MS/MS data sets tailored towards the less experienced R user (Rainer et al. 2022).

Convenient spectra matching using `MetaboAnnotation`

In this example use case we match experimental MS2 spectra from a DDA experiment on a pesticide mix against reference spectra from MassBank. Below we load the experimental data file which is distributed via the msdata R package.

library(Spectra)
library(pander)

#' Load the pesticide mix data
fl <- system.file("TripleTOF-SWATH", "PestMix1_DDA.mzML", package = "msdata")
pest <- Spectra(fl)

We next restrict the data set to MS2 spectra only and in addition clean these spectra by removing all peaks from a spectrum that have an intensity lower than 5% of the largest peak intensity of that spectrum. Finally, single-peak spectra are removed.

#' restrict to MS2 data and remove intensities with intensity lower 5%
pest <- filterMsLevel(pest, msLevel = 2L)

#' Remove peaks with an intensity below 5% or the spectra's BPC
low_int <- function(x, ...) {
    x > max(x, na.rm = TRUE) * 0.05
}
pest <- filterIntensity(pest, intensity = low_int)

#' Remove peaks with a single peak
pest <- pest[lengths(pest) > 1]

This leads to a data set consisting of 2451 spectra. We next connect to a MassBank database (release 2023.11, running within this docker image) and create a Spectra object representing that data.

library(RMariaDB)
library(MsBackendMassbank)

#' Connect to the MassBank MySQL database
con <- dbConnect(MariaDB(), user = "massbank", dbname = "MassBank",
                 host = "localhost", pass = "massbank")
mbank <- Spectra(con, source = MsBackendMassbankSql())

Alternatively, MassBank spectral libraries are also provided through Bioconductor’s AnnotationHub and can be conveniently loaded from there. The code below first loads the AnnotationHub resource and then loads the data for one MassBank release. The database will be cached locally, avoiding thus to re-download the data for future use.

#' Load the AnnotationHub resource
library(AnnotationHub)
ah <- AnnotationHub()

#' List available MassBank release
query(ah, "MassBank")

## AnnotationHub with 6 records
## # snapshotDate(): 2024-04-30
## # $dataprovider: MassBank
## # $species: NA
## # $rdataclass: CompDb
## # additional mcols(): taxonomyid, genome, description,
## #   coordinate_1_based, maintainer, rdatadateadded, preparerclass, tags,
## #   rdatapath, sourceurl, sourcetype 
## # retrieve records with, e.g., 'object[["AH107048"]]' 
## 
##              title                                
##   AH107048 | MassBank CompDb for release 2021.03  
##   AH107049 | MassBank CompDb for release 2022.06  
##   AH111334 | MassBank CompDb for release 2022.12.1
##   AH116164 | MassBank CompDb for release 2023.06  
##   AH116165 | MassBank CompDb for release 2023.09  
##   AH116166 | MassBank CompDb for release 2023.11

#' Load the MassBank release 2023.11 and get a Spectra object
mbank <- ah[["AH116166"]] |>
    Spectra()

It is suggested to apply the same processing applied to the query spectra also to the target (reference) spectra. Thus we below filter also the reference spectra from MassBank with the same intensity filter.

#' Remove low intensity peaks and subsequently remove
#' spectra with less than 2 peaks
mbank <- filterIntensity(mbank, intensity = low_int)
mbank <- mbank[lengths(mbank) > 1]

We could now directly calculate similarities between the 2451 experimental (query) MS2 spectra and the 103525 MassBank reference (target) spectra using the compareSpectra() method, but this would be computationally very intense because a similarity score would be calculated between each query and each target spectrum. As alternative we use here the matchSpectra() function from the MetaboAnnotation package that allows to restrict similarity calculations between query and target spectra with similar m/z of their precursor ion (or considering also a similar retention time if available for reference spectra).

Below we create a CompareSpectraParam object setting parameter requirePrecursor = TRUE (to restrict similarity calculations only to query and target spectra with a similar precursor m/z) and ppm = 10 (m/z difference between the query and target precursor has to be within 10 ppm). Parameter THRESHFUN enables to define a threshold function that defines which spectra are considered matching. With the function used below only MS2 spectra with a similarity (calculated with the default dotproduct function) larger or equal to 0.8 are considered matching.

library(MetaboAnnotation)
prm <- CompareSpectraParam(ppm = 10, requirePrecursor = TRUE,
                           THRESHFUN = function(x) which(x >= 0.8))

We next call matchSpectra() with this parameter object and pass pest and mbank as query and target Spectra, respectively.

mtch <- matchSpectra(pest, mbank, param = prm)
mtch

## Object of class MatchedSpectra 
## Total number of matches: 113 
## Number of query objects: 2451 (39 matched)
## Number of target objects: 103525 (69 matched)

As a result we get a MatchedSpectra object that contains the query and target spectra as well as the matching result (i.e. the information which query spectrum matches with which target spectrum based on what similarity score). We can use the query() and target() functions to access the query and target Spectra objects and matches() to extract the matching information. Below we display the first 6 rows of that matrix.

head(matches(mtch))

##   query_idx target_idx     score
## 1       433      97386 0.9471924
## 2       433      97387 0.9258791
## 3       433      97388 0.8188538
## 4       435      97386 0.9745761
## 5       435      97387 0.8573438
## 6       493      68124 0.9089015

Functions whichQuery() and whichTarget() return the (unique) indices of the query and target spectra that could be matched.

whichQuery(mtch)

##  [1]  433  435  493  496  497  571  682  685  686  805  807  809  810  819  829
## [16]  983 1095 1127 1266 1281 1283 1454 1457 1480 1706 1828 1830 1834 1839 1874
## [31] 1882 1901 1906 2038 2040 2043 2050 2124 2309

As we can see only few of the query spectra (39 of the 2451 spectra) could be matched. This is in part because for a large proportion spectra in MassBank no precursor m/z is available and with requirePrecursor = TRUE these are not considered in the similarity calculation. Setting requirePrecursor = FALSE would calculate a similarity between all spectra (even those with missing precursor information) but calculations would take much longer.

sum(is.na(precursorMz(mbank)))

## [1] 23519

The MatchedSpectra object inherits much of the functionality of a Spectra object. The spectraVariables() function returns for example all the available spectra variables, from both the query as well as the target Spectra. The variable names of the latter are prefixed with target_ to discriminate them from the variable names of the query.

spectraVariables(mtch)

##  [1] "msLevel"                        "rtime"                         
##  [3] "acquisitionNum"                 "scanIndex"                     
##  [5] "dataStorage"                    "dataOrigin"                    
##  [7] "centroided"                     "smoothed"                      
##  [9] "polarity"                       "precScanNum"                   
## [11] "precursorMz"                    "precursorIntensity"            
## [13] "precursorCharge"                "collisionEnergy"               
## [15] "isolationWindowLowerMz"         "isolationWindowTargetMz"       
## [17] "isolationWindowUpperMz"         "peaksCount"                    
## [19] "totIonCurrent"                  "basePeakMZ"                    
## [21] "basePeakIntensity"              "ionisationEnergy"              
## [23] "lowMZ"                          "highMZ"                        
## [25] "mergedScan"                     "mergedResultScanNum"           
## [27] "mergedResultStartScanNum"       "mergedResultEndScanNum"        
## [29] "injectionTime"                  "filterString"                  
## [31] "spectrumId"                     "ionMobilityDriftTime"          
## [33] "scanWindowLowerLimit"           "scanWindowUpperLimit"          
## [35] ".original_query_index"          "target_msLevel"                
## [37] "target_rtime"                   "target_acquisitionNum"         
## [39] "target_scanIndex"               "target_dataStorage"            
## [41] "target_dataOrigin"              "target_centroided"             
## [43] "target_smoothed"                "target_polarity"               
## [45] "target_precScanNum"             "target_precursorMz"            
## [47] "target_precursorIntensity"      "target_precursorCharge"        
## [49] "target_collisionEnergy"         "target_isolationWindowLowerMz" 
## [51] "target_isolationWindowTargetMz" "target_isolationWindowUpperMz" 
## [53] "target_compound_id"             "target_formula"                
## [55] "target_exactmass"               "target_smiles"                 
## [57] "target_inchi"                   "target_inchikey"               
## [59] "target_cas"                     "target_pubchem"                
## [61] "target_name"                    "target_spectrum_id"            
## [63] "target_spectrum_name"           "target_date"                   
## [65] "target_authors"                 "target_license"                
## [67] "target_copyright"               "target_publication"            
## [69] "target_splash"                  "target_precursorMz_text"       
## [71] "target_adduct"                  "target_ionization"             
## [73] "target_ionization_voltage"      "target_fragmentation_mode"     
## [75] "target_collisionEnergy_text"    "target_instrument"             
## [77] "target_instrument_type"         "target_original_spectrum_id"   
## [79] "target_predicted"               "target_msms_mz_range_min"      
## [81] "target_msms_mz_range_max"       "target_synonym"                
## [83] "score"

We can access individual spectra variables using $ and the variable name, or multiple variables with the spectraData() function. Below we extract the retention time, the precursor m/z of the query spectrum, the precursor m/z of the target spectrum as well as the similarity score from the object using the spectraData() function.

spectraData(mtch, c("rtime", "precursorMz", "target_precursorMz", "score"))

## DataFrame with 2525 rows and 4 columns
##          rtime precursorMz target_precursorMz     score
##      <numeric>   <numeric>          <numeric> <numeric>
## 1        7.216    137.9639                 NA        NA
## 2       13.146     56.9419                 NA        NA
## 3       13.556     89.9449                 NA        NA
## 4       23.085    207.0294                 NA        NA
## 5       27.385    121.0990                 NA        NA
## ...        ...         ...                ...       ...
## 2521   895.182    137.9850                 NA        NA
## 2522   895.472     56.0495                 NA        NA
## 2523   896.252    142.9611                 NA        NA
## 2524   896.662     53.0129                 NA        NA
## 2525   898.602     91.5022                 NA        NA

The returned DataFrame contains the matching information for the full data set, i.e. of each query spectrum and hence, returns NA values for query spectra that could not be matched with a target spectrum. Note also that query spectra matching multiple target spectra are represented by multiple rows (one for each matching target spectrum).

Here we’re only interested in query spectra matching at least one target spectrum and hence subset the matching result to these using the whichQuery() function.

mtch <- mtch[whichQuery(mtch)]
mtch

## Object of class MatchedSpectra 
## Total number of matches: 113 
## Number of query objects: 39 (39 matched)
## Number of target objects: 103525 (69 matched)

Subsetting of MatchedSpectra is always relative to the query, i.e. mtch[4] would subset the object to the matching results for the 4th query spectrum.

We now extract the matching information on the data subset:

spectraData(mtch, c("rtime", "precursorMz", "target_precursorMz", "score"))

## DataFrame with 113 rows and 4 columns
##         rtime precursorMz target_precursorMz     score
##     <numeric>   <numeric>          <numeric> <numeric>
## 1     326.703     235.143            235.144  0.947192
## 2     326.703     235.143            235.144  0.925879
## 3     326.703     235.143            235.144  0.818854
## 4     327.113     235.144            235.144  0.974576
## 5     327.113     235.144            235.144  0.857344
## ...       ...         ...                ...       ...
## 109   565.646     425.214            425.215  0.947080
## 110   565.646     425.214            425.215  0.853670
## 111   570.825     373.040            373.041  0.879824
## 112   596.584     313.039            313.039  0.818335
## 113   873.144     136.112            136.112  0.804731

We can also return the compound names for the matching spectra. Depending on whether the annotations are retrieved from the MassBank MySQL database or the version from AnnotationHub we need to use a different spectra variable name for the compound name.

#' Get the name of the column containing the compound name
name_col <- intersect(spectraVariables(mtch),
                      c("target_name", "target_compound_name"))

pandoc.table(style = "rmarkdown",
    as.data.frame(spectraData(mtch, c("rtime", name_col, "score"))))

rtime	target_name	score
326.7	Lenacil	0.9472
326.7	Lenacil	0.9259
326.7	Lenacil	0.8189
327.1	Lenacil	0.9746
327.1	Lenacil	0.8573
338.5	Azaconazole	0.9089
338.5	Azaconazole	0.8028
338.9	Azaconazole	0.9226
339.3	Azaconazole	0.9168
353.5	Fosthiazate	0.9396
361.7	Azaconazole	0.9157
362.3	Azaconazole	0.9083
362.6	Azaconazole	0.906
377.7	Triphenylphosphine oxide	0.9148
377.7	Triphenylphosphine oxide	0.8734
377.7	triphenylphosphineoxide	0.8809
377.7	triphenylphosphineoxide	0.9269
377.7	Triphenylphosphine oxide	0.8352
378.5	Dimethachlor	0.8459
378.5	Dimethachlor	0.8423
378.9	Dimethachlor	0.8784
378.9	Dimethachlor	0.8729
378.9	Dimethachlor	0.9108
378.9	Dimethachlor	0.8111
379	Triphenylphosphine oxide	0.8393
379	Triphenylphosphine oxide	0.8555
379	triphenylphosphineoxide	0.8498
379	triphenylphosphineoxide	0.8962
379	Triphenylphosphine oxide	0.8577
382	Dimethachlor	0.8823
382	Dimethachlor	0.8785
382	Dimethachlor	0.9024
382	Dimethachlor	0.8017
384.5	Dimethachlor	0.8924
384.5	Dimethachlor	0.8038
384.5	Dimethachlor	0.8879
384.5	Dimethachlor	0.9096
405.1	Cyproconazole	0.8112
405.1	Cyproconazole	0.8098
405.1	Cyproconazole	0.8109
414.6	Tris(1-chloro-2-propyl) phosphate	0.8293
414.6	Tris(1-chloro-2-propyl) phosphate	0.8132
414.6	Tris(1-chloro-2-propyl)phosphate	0.8901
414.6	Tris(1-chloro-2-propyl)phosphate	0.8457
414.6	Tris(1-chloro-2-propyl)phosphate	0.9992
419.8	Fenamiphos	0.8033
431.5	Tris(1-chloro-2-propyl) phosphate	0.8146
431.5	Tris(1-chloro-2-propyl)phosphate	0.9011
435	Fluopicolide	0.8542
435.4	Fluopicolide	0.8085
452.9	Flufenacet	0.8079
452.9	Flufenacet	0.8949
452.9	Flufenacet	0.8898
452.9	Flufenacet	0.8909
452.9	Flufenacet	0.9051
453.3	Flufenacet	0.8634
453.3	Flufenacet	0.9009
453.3	Flufenacet	0.9508
453.3	Flufenacet	0.9487
453.3	Flufenacet	0.8492
455.3	Dimoxystrobin	0.8179
490.7	Triisobutyl phosphate	0.8834
490.7	Triisobutyl phosphate	0.8854
490.7	Triisobutyl phosphate	0.8253
490.7	Triisobutyl phosphate	0.8514
490.7	Tributyl phosphate	0.9988
490.7	Tributyl phosphate	0.9891
490.7	Tributyl phosphate	0.944
490.7	Tributyl phosphate	0.8639
490.7	Triisobutyl phosphate	0.8746
490.7	Triisobutyl phosphate	0.9747
490.7	Triisobutyl phosphate	0.9926
490.7	Triisobutyl phosphate	0.961
490.7	Triisobutyl phosphate	0.8165
506.7	Didecyl-dimethylammonium	0.9766
506.7	Didecyl-dimethylammonium	0.9109
506.7	Didecyl-dimethylammonium	0.9941
506.8	Diflufenican	0.8661
506.8	Diflufenican	0.877
506.8	Diflufenican	0.9272
506.8	Diflufenican	0.814
506.8	Diflufenican	0.826
507.3	Diflufenican	0.8385
507.3	Diflufenican	0.843
507.3	Diflufenican	0.9183
507.3	Diflufenican	0.9113
507.7	Diflufenican	0.8424
507.7	Diflufenican	0.8732
507.7	Diflufenican	0.9041
507.7	Diflufenican	0.8026
511	Diflufenican	0.835
512.1	Diflufenican	0.8813
512.1	Diflufenican	0.8439
512.1	Diflufenican	0.8797
512.1	Diflufenican	0.8105
512.1	Diflufenican	0.8389
526.4	Dibutyl phthalate	0.87
526.4	Dibutyl phthalate	0.9468
526.4	Diisobutyl phthalate	0.8584
526.4	Diisobutyl phthalate	0.9351
526.4	Diisobutyl phthalate	0.8683
527.4	Dibutyl phthalate	0.8506
527.4	Diisobutyl phthalate	0.8409
527.4	Dibutyl phthalate	0.818
563.6	Tributyl acetylcitrate	0.8066
564.6	Tributyl acetylcitrate	0.9893
564.6	Tributyl acetylcitrate	0.9963
565.6	Tributyl acetylcitrate	0.9506
565.6	Tributyl acetylcitrate	0.9471
565.6	Tributyl acetylcitrate	0.8537
570.8	Proquinazid	0.8798
596.6	Spirodiclofen-enol	0.8183
873.1	Trimethylphenylammonium	0.8047

We can also visually inspect the matches using mirror plots. Below we create mirror plots between the first query spectrum with all matching reference spectra. For better visualization we scale the peak intensities of all spectra to a total intensity sum of one by setting scalePeaks = TRUE.

#' plot the results for the first query spectrum
plotSpectraMirror(mtch[1], ppm = 10, scalePeaks = TRUE)

Spectra matching results could also be manually, and interactively, evaluated and validated with the validateMatchedSpectra() function.

Summarizing, the matchSpectra() function enables thus a convenient spectra matching between MS data represented as Spectra objects. As a result, a MatchedSpectra object is returned that, in addition to the matching results, contains also the query and target spectra. Pre-filtering the spectra prior to the actual spectra similarity calculation can reduce the running time of a matchSpectra() call but might also miss some potential matches. Note that in addition to the precursor m/z-based pre-filter also retention time pre-filtering would be available (see ?matchSpectra for more information). Also, a more advanced matching approach would be available with the MatchForwardReverseParam that calculates in addition to the forward score also a reverse similarity for each match.

MS2 spectra matching in an `xcms` workflow

In LC-MS/MS-based untargeted metabolomics (or small compound mass spectrometry experiments in general) quantification of the compounds is performed in MS1 while the MS2 data is used for identification of the features. Quantification of MS1 data requires a chromatographic peak detection step which can be performed using the functionality from the xcms package. Below we load thus the xcms package and import the full MS data using the readMsExperiment function.

library(xcms)
library(MsExperiment)
pest_all <- readMsExperiment(fl)

We next perform the chromatographic peak detection using the centWave algorithm (see the LC-MS/MS data analysis with xcms vignette from the xcms package for details on the chromatographic peak detection settings).

cwp <- CentWaveParam(snthresh = 5, noise = 100, ppm = 10,
                     peakwidth = c(3, 30))
pest_all <- findChromPeaks(pest_all, param = cwp)

In total 99 chromatographic peaks have been identified. Below we display the first 6 of them.

head(chromPeaks(pest_all))

##            mz    mzmin    mzmax      rt   rtmin   rtmax       into      intb
## CP01 142.9926 142.9921 142.9931 130.615 125.856 134.241  1113.8028  1106.229
## CP02 221.0918 221.0906 221.0925 240.897 236.657 246.984   756.6935   744.779
## CP03 220.0985 220.0978 220.0988 240.897 237.187 246.327  2060.5921  2052.549
## CP04 219.0957 219.0950 219.0962 241.018 236.253 246.327 15172.6662 15133.811
## CP05 153.0659 153.0655 153.0663 330.591 325.373 334.400  2148.7134  2141.943
## CP06 235.1447 235.1441 235.1454 330.591 326.431 334.400  2836.2675  2829.627
##           maxo  sn sample
## CP01  346.7006 102      1
## CP02  212.5239  21      1
## CP03  585.3036 151      1
## CP04 4877.1162 367      1
## CP05  784.9196 114      1
## CP06 1006.9720 110      1

We can now extract all MS2 spectra for each chromatographic peak with the chromPeakSpectra() function. This function identifies all MS2 spectra recorded by the instrument with a retention time within the retention time and with a precursor m/z within the m/z boundaries of the chromatographic peak. By setting return.type = "Spectra" we ensure that the data is being returned in the newer Spectra format hence enabling the simplified spectra matching with the functionality presented here.

pest_ms2 <- chromPeakSpectra(pest_all, return.type = "Spectra")
pest_ms2

## MSn data (Spectra) with 158 spectra in a MsBackendMzR backend:
##       msLevel     rtime scanIndex
##     <integer> <numeric> <integer>
## 1           2   128.237      1000
## 2           2   128.737      1008
## 3           2   129.857      1023
## 4           2   237.869      1812
## 5           2   241.299      1846
## ...       ...       ...       ...
## 154         2   575.255      5115
## 155         2   596.584      5272
## 156         2   592.424      5236
## 157         2   596.054      5266
## 158         2   873.714      7344
##  ... 34 more variables/columns.
## 
## file(s):
## PestMix1_DDA.mzML
## Processing:
##  Filter: select MS level(s) 2 [Wed Aug  7 14:06:40 2024]
##  Merge 1 Spectra into one [Wed Aug  7 14:06:40 2024]

Spectra variable peak_id contains the identified of the chromatographic peak (i.e. its row name in chromPeaks).

pest_ms2$peak_id

##   [1] "CP01" "CP01" "CP01" "CP04" "CP04" "CP05" "CP05" "CP06" "CP06" "CP08"
##  [11] "CP08" "CP11" "CP11" "CP12" "CP12" "CP13" "CP13" "CP13" "CP13" "CP14"
##  [21] "CP14" "CP14" "CP14" "CP18" "CP22" "CP22" "CP22" "CP22" "CP22" "CP25"
##  [31] "CP25" "CP25" "CP25" "CP25" "CP26" "CP26" "CP26" "CP26" "CP26" "CP26"
##  [41] "CP33" "CP33" "CP34" "CP34" "CP34" "CP34" "CP34" "CP35" "CP35" "CP35"
##  [51] "CP35" "CP35" "CP36" "CP41" "CP41" "CP41" "CP42" "CP42" "CP42" "CP42"
##  [61] "CP42" "CP44" "CP44" "CP46" "CP46" "CP46" "CP46" "CP47" "CP47" "CP47"
##  [71] "CP48" "CP48" "CP48" "CP50" "CP51" "CP51" "CP51" "CP52" "CP52" "CP52"
##  [81] "CP53" "CP53" "CP53" "CP53" "CP53" "CP57" "CP57" "CP57" "CP57" "CP57"
##  [91] "CP59" "CP59" "CP60" "CP60" "CP61" "CP61" "CP63" "CP63" "CP63" "CP63"
## [101] "CP64" "CP64" "CP64" "CP64" "CP64" "CP65" "CP66" "CP66" "CP66" "CP66"
## [111] "CP67" "CP67" "CP67" "CP69" "CP69" "CP69" "CP71" "CP71" "CP71" "CP71"
## [121] "CP72" "CP72" "CP72" "CP73" "CP81" "CP81" "CP81" "CP81" "CP82" "CP82"
## [131] "CP82" "CP85" "CP85" "CP88" "CP88" "CP89" "CP89" "CP89" "CP90" "CP90"
## [141] "CP90" "CP91" "CP91" "CP91" "CP93" "CP93" "CP93" "CP93" "CP93" "CP93"
## [151] "CP94" "CP94" "CP94" "CP94" "CP95" "CP98" "CP98" "CP99"

We next, like in the previous section, clean up the spectra removing peaks with an intensity below 5% of the largest peak intensity per spectrum and removing spectra with a single peak.

#' Remove peaks with an intensity below 5%
pest_ms2 <- filterIntensity(pest_ms2, intensity = low_int)

#' Remove peaks with a single peak
pest_ms2 <- pest_ms2[lengths(pest_ms2) > 1]

We in addition also scale the peak intensities per spectrum using the scalePeaks() function. While most spectra similarity scoring algorithms are independent of absolute peak intensities, peak scaling will improve the graphical visualization of results. We perform the peak scaling to both the experimental as well as the reference spectra in MassBank.

pest_ms2 <- scalePeaks(pest_ms2)
mbank <- scalePeaks(mbank)

Next we perform the spectra matching with the same parameters as in the previous section.

pest_match <- matchSpectra(pest_ms2, mbank, param = prm)
pest_match

## Object of class MatchedSpectra 
## Total number of matches: 33 
## Number of query objects: 155 (12 matched)
## Number of target objects: 103525 (26 matched)

Again, we restrict the MatchedSpectra to query spectra which could be matched.

pest_match <- pest_match[whichQuery(pest_match)]

The table below lists the compound names of matching spectra for the chromatographic peaks.

#' Get the name of the column containing the compound name
name_col <- intersect(spectraVariables(pest_match),
                      c("target_name", "target_compound_name"))
pandoc.table(
    style = "rmarkdown",
    as.data.frame(spectraData(pest_match, c("peak_id", "rtime", name_col))))

peak_id	rtime	target_name
CP06	327.1	Lenacil
CP06	327.1	Lenacil
CP18	362.6	Azaconazole
CP25	382	Dimethachlor
CP25	382	Dimethachlor
CP25	382	Dimethachlor
CP25	382	Dimethachlor
CP25	384.5	Dimethachlor
CP25	384.5	Dimethachlor
CP25	384.5	Dimethachlor
CP25	384.5	Dimethachlor
CP42	405.1	Cyproconazole
CP42	405.1	Cyproconazole
CP42	405.1	Cyproconazole
CP57	435.4	Fluopicolide
CP67	452.9	Flufenacet
CP67	452.9	Flufenacet
CP67	452.9	Flufenacet
CP67	452.9	Flufenacet
CP67	452.9	Flufenacet
CP67	453.3	Flufenacet
CP67	453.3	Flufenacet
CP67	453.3	Flufenacet
CP67	453.3	Flufenacet
CP67	453.3	Flufenacet
CP72	455.3	Dimoxystrobin
CP88	511	Diflufenican
CP88	512.1	Diflufenican
CP88	512.1	Diflufenican
CP88	512.1	Diflufenican
CP88	512.1	Diflufenican
CP88	512.1	Diflufenican
CP95	596.6	Spirodiclofen-enol

We can also directly plot matching (query and target) spectra against each other using the plotSpectraMirror() function subsetting the MatchedSpectra object to the query spectrum of interest. Below we plot the third query spectrum against all of its matching target spectra.

plotSpectraMirror(pest_match[3])

Summarizing, with the chromPeakSpectra() and the featureSpectra() functions, xcms allows to return MS data as Spectra objects which enables, as shown in this simple example, to perform MS2 spectra matching using the Spectra as well as the MetaboAnnotation packages hence simplifying MS/MS-based annotation of LC-MS features from xcms.

Session information

sessionInfo()

## R version 4.4.1 (2024-06-14)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 22.04.4 LTS
## 
## Matrix products: default
## BLAS:   /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3 
## LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.20.so;  LAPACK version 3.10.0
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## time zone: Etc/UTC
## tzcode source: system (glibc)
## 
## attached base packages:
## [1] stats4    stats     graphics  grDevices utils     datasets  methods  
## [8] base     
## 
## other attached packages:
##  [1] MsExperiment_1.6.0       xcms_4.2.2               MetaboAnnotation_1.8.1  
##  [4] CompoundDb_1.8.0         AnnotationFilter_1.28.0  AnnotationHub_3.12.0    
##  [7] BiocFileCache_2.12.0     dbplyr_2.5.0             MsBackendMassbank_1.12.0
## [10] RMariaDB_1.3.2           pander_0.6.5             Spectra_1.14.1          
## [13] ProtGenerics_1.36.0      S4Vectors_0.42.1         BiocGenerics_0.50.0     
## [16] BiocParallel_1.38.0      BiocStyle_2.32.1        
## 
## loaded via a namespace (and not attached):
##   [1] RColorBrewer_1.1-3          jsonlite_1.8.8             
##   [3] MultiAssayExperiment_1.30.3 magrittr_2.0.3             
##   [5] MALDIquant_1.22.2           rmarkdown_2.27             
##   [7] fs_1.6.4                    zlibbioc_1.50.0            
##   [9] ragg_1.3.2                  vctrs_0.6.5                
##  [11] memoise_2.0.1               RCurl_1.98-1.16            
##  [13] base64enc_0.1-3             progress_1.2.3             
##  [15] htmltools_0.5.8.1           S4Arrays_1.4.1             
##  [17] curl_5.2.1                  SparseArray_1.4.8          
##  [19] mzID_1.42.0                 sass_0.4.9                 
##  [21] bslib_0.8.0                 htmlwidgets_1.6.4          
##  [23] desc_1.4.3                  plyr_1.8.9                 
##  [25] impute_1.78.0               lubridate_1.9.3            
##  [27] cachem_1.1.0                igraph_2.0.3               
##  [29] iterators_1.0.14            mime_0.12                  
##  [31] lifecycle_1.0.4             pkgconfig_2.0.3            
##  [33] Matrix_1.7-0                R6_2.5.1                   
##  [35] fastmap_1.2.0               GenomeInfoDbData_1.2.12    
##  [37] MatrixGenerics_1.16.0       clue_0.3-65                
##  [39] digest_0.6.36               pcaMethods_1.96.0          
##  [41] rsvg_2.6.0                  colorspace_2.1-1           
##  [43] AnnotationDbi_1.66.0        textshaping_0.4.0          
##  [45] GenomicRanges_1.56.1        RSQLite_2.3.7              
##  [47] filelock_1.0.3              fansi_1.0.6                
##  [49] timechange_0.3.0            httr_1.4.7                 
##  [51] abind_1.4-5                 compiler_4.4.1             
##  [53] doParallel_1.0.17           bit64_4.0.5                
##  [55] withr_3.0.1                 DBI_1.2.3                  
##  [57] highr_0.11                  MASS_7.3-61                
##  [59] ChemmineR_3.56.0            rappdirs_0.3.3             
##  [61] DelayedArray_0.30.1         rjson_0.2.21               
##  [63] mzR_2.38.0                  tools_4.4.1                
##  [65] PSMatch_1.8.0               glue_1.7.0                 
##  [67] QFeatures_1.14.2            grid_4.4.1                 
##  [69] cluster_2.1.6               reshape2_1.4.4             
##  [71] generics_0.1.3              gtable_0.3.5               
##  [73] preprocessCore_1.66.0       tidyr_1.3.1                
##  [75] hms_1.1.3                   MetaboCoreUtils_1.12.0     
##  [77] xml2_1.3.6                  utf8_1.2.4                 
##  [79] XVector_0.44.0              foreach_1.5.2              
##  [81] BiocVersion_3.19.1          pillar_1.9.0               
##  [83] stringr_1.5.1               limma_3.60.4               
##  [85] dplyr_1.1.4                 lattice_0.22-6             
##  [87] bit_4.0.5                   tidyselect_1.2.1           
##  [89] Biostrings_2.72.1           knitr_1.48                 
##  [91] gridExtra_2.3               IRanges_2.38.1             
##  [93] SummarizedExperiment_1.34.0 xfun_0.46                  
##  [95] Biobase_2.64.0              statmod_1.5.0              
##  [97] MSnbase_2.30.1              matrixStats_1.3.0          
##  [99] DT_0.33                     stringi_1.8.4              
## [101] UCSC.utils_1.0.0            lazyeval_0.2.2             
## [103] yaml_2.3.10                 evaluate_0.24.0            
## [105] codetools_0.2-20            MsCoreUtils_1.16.1         
## [107] tibble_3.2.1                BiocManager_1.30.23        
## [109] affyio_1.74.0               cli_3.6.3                  
## [111] systemfonts_1.1.0           munsell_0.5.1              
## [113] jquerylib_0.1.4             MassSpecWavelet_1.70.0     
## [115] Rcpp_1.0.13                 GenomeInfoDb_1.40.1        
## [117] png_0.1-8                   XML_3.99-0.17              
## [119] parallel_4.4.1              pkgdown_2.1.0              
## [121] ggplot2_3.5.1               blob_1.2.4                 
## [123] prettyunits_1.2.0           bitops_1.0-8               
## [125] MsFeatures_1.12.0           affy_1.82.0                
## [127] scales_1.3.0                ncdf4_1.22                 
## [129] purrr_1.0.2                 crayon_1.5.3               
## [131] rlang_1.1.4                 vsn_3.72.0                 
## [133] KEGGREST_1.44.1

References

Rainer, Johannes, Andrea Vicini, Liesa Salzer, Jan Stanstrup, Josep M. Badia, Steffen Neumann, Michael A. Stravs, et al. 2022. “A Modular and Expandable Ecosystem for Metabolomics Data Annotation in R.” Metabolites 12 (2): 173. https://doi.org/10.3390/metabo12020173.

Johannes Rainer1, Michael Witting2

Overview

Introduction

Convenient spectra matching using MetaboAnnotation

MS2 spectra matching in an xcms workflow

Session information

References

Johannes Rainer¹, Michael Witting²

Convenient spectra matching using `MetaboAnnotation`

MS2 spectra matching in an `xcms` workflow