Spectra
Packagevignettes/analyzing-MS-data-from-different-sources-with-Spectra.Rmd
analyzing-MS-data-from-different-sources-with-Spectra.Rmd
Last modified: 2021-02-26 08:25:17
Compiled: Fri Feb 26 08:30:04 2021
Mass spectrometry (MS) data is a key technology in modern proteomics and metabolomics experiments. Due to continuous improvements in MS instrumentation, the generated data can easily become very large. Also, additional resources of MS data exist, such as spectra libraries and databases, all with their own specific file formats and database systems that sometimes do not support manipulations of the original data.
Learning from experiences with the MSnbase Bioconductor package, the Spectra package was developed to provide an even more flexible and expandable infrastructure for MS data in R. This package implements a clear separation of user functionality from code to provide, store and import MS data. Different backends can hence be used that enable access to data from various resources or that are designed specifically for very large MS data sets. Data manipulations are by default not directly applied to the data but cached in a lazy evaluation queue which allows analyses also of read-only data representations.
This (instructor-led live demo) workshop shows the expandability of the new infrastructure to enable a seamless integration and analysis of MS data from a variety of input formats illustrated by a simple matching of experimental MS2 spectra against public spectral databases and export of the data in a format commonly used for exchange of MS2 data.
Get the docker image of this tutorial with docker pull jorainer/spectra_tutorials:latest
.
Start docker using
docker run \
-e PASSWORD=bioc \
-p 8787:8787 \
jorainer/spectra_tutorials:latest
Enter http://localhost:8787
in a web browser and log in with username rstudio
and password bioc
.
Open this R-markdown file (vignettes/analyzing-MS-data-from-different-sources-with-Spectra.Rmd) in the RStudio server version in the web browser and evaluate the R code blocks.
To get the source code: clone this github repository, e.g. with git clone https://github.com/jorainer/SpectraTutorials
.
Optionally, to run also the code to import the MS2 spectra from HMDB the All Spectra Files (XML) archive from the hmdb downloads page has to be downloaded. The contents of the hmdb_all_spectra.zip archive should then be unzipped into the folder data/hmdb_all_spectra.
LC-MS setup
CID-based fragmentation
Spectra
packageThe Spectra
package implements a clear separation of user functionality from code to provide, store and read mass spectrometry data. Thus, different data or file format-specific backends can be implemented and directly plugged-in without affecting the way the user would access or analyze the data. This represents an extension to the in-memory and on-disk data modes already available in the MSnbase package that enabled either a fast data processing or an analysis of very large data sets by keeping only a limited amount of data in the computer’s memory (Gatto, Gibb, and Rainer 2020).
Spectra: separation into user functionality and data representation
In this workshop we will import MS data from mzML files, match the MS2 fragment spectra for one ion against MS2 spectra from public databases (i.e. MassBank and the Human Metabolom Database HMDB) and export the data as a MGF file. A different backend is used for each data import and export operation.
Below we import the MS data from the mzML files provided within this package. These files contain MSn data of a mix of 8 standard compounds (solved either in water or a pool of human serum samples) measured with a HILIC-based LC-MS/MS setup. MS2 data was generated by data dependent acquisition using two different collision energies. For data import and representation of these experimantal data we use the MsBackendMzR
backend which supports import (and export) of data from the most common raw mass spectrometry file formats (i.e. mzML, mzXML and CDF).
library(Spectra)
#' Define the input files
fls <- dir(system.file("mzML", package = "SpectraTutorials"),
full.names = TRUE)
#' Import the data
sps_all <- Spectra(fls, backend = MsBackendMzR())
The MS data is now represented by a Spectra
object, which can be thought of as a data.frame
with columns being the spectra variables (such as "rtime"
, i.e. the retention time) and rows the individual spectra. Each spectra variable can be accessed either via $
and its name or by using its dedicated access function (which is the preferred way). The spectraVariables
function can be used to list all available variables within such a Spectra
object.
#' List all available spectra variables (attributes)
spectraVariables(sps_all)
## [1] "msLevel" "rtime"
## [3] "acquisitionNum" "scanIndex"
## [5] "dataStorage" "dataOrigin"
## [7] "centroided" "smoothed"
## [9] "polarity" "precScanNum"
## [11] "precursorMz" "precursorIntensity"
## [13] "precursorCharge" "collisionEnergy"
## [15] "isolationWindowLowerMz" "isolationWindowTargetMz"
## [17] "isolationWindowUpperMz" "peaksCount"
## [19] "totIonCurrent" "basePeakMZ"
## [21] "basePeakIntensity" "ionisationEnergy"
## [23] "lowMZ" "highMZ"
## [25] "mergedScan" "mergedResultScanNum"
## [27] "mergedResultStartScanNum" "mergedResultEndScanNum"
## [29] "injectionTime" "filterString"
## [31] "spectrumId" "ionMobilityDriftTime"
## [33] "scanWindowLowerLimit" "scanWindowUpperLimit"
Below we access the retention times of the first spectra using either $rtime
or the function rtime
.
#' Access the spectras' retention time
head(sps_all$rtime)
## [1] 0.273 0.570 0.873 1.183 1.491 1.798
head(rtime(sps_all))
## [1] 0.273 0.570 0.873 1.183 1.491 1.798
Our Spectra
object contains information from in total 3203 spectra from length(unique(dataOrigin(sps_all)))
mzML files. By using the MsBackendMzR
backend only general information about each spectrum is kept in memory resulting in a low memory footprint.
print(object.size(sps_all), units = "MB")
## 0.8 Mb
With a MsBackendMzR
backend, intensity or m/z values are retrieved on demand from the original data files.
mz(sps_all)
## NumericList of length 3203
## [[1]] 50.2264681118888 52.9758613330164 ... 998.412321909748 999.815483842029
## [[2]] 50.0188439310345 51.0188982589394 ... 994.093212059943 998.65669267328
## [[3]] 50.0188439310345 51.0260087879784 ... 999.549654921888 999.904649454908
## [[4]] 50.0347997547605 51.0271913631291 ... 984.467433193755 998.839380503321
## [[5]] 50.1615393614339 51.0280734441081 ... 996.599267645346 999.81102566575
## [[6]] 50.0158524973623 50.2809808965161 ... 992.93057113729 999.650537939432
## [[7]] 50.0178467765376 51.02704020796 ... 999.217711368357 999.757528325686
## [[8]] 50.0158524973623 51.0257255950236 ... 993.780890433629 999.583671855333
## [[9]] 50.0158524973623 50.9727348052523 ... 999.508971233846 999.663911424618
## [[10]] 50.0158524973623 50.0264589058649 ... 997.312177169353 999.672827131107
## ...
## <3193 more elements>
We can also load the full data into memory by changing the backend from MsBackendMzR
to MsBackendDataFrame
. This does not affect the way we use the Spectra
object itself: the same operations and functions are available, independently of the way the data is stored (i.e. which backend is used).
#' Change backend to a MsBackendDataFrame: load data into memory
sps_all <- setBackend(sps_all, MsBackendDataFrame())
The size of our Spectra
object is now larger, since the full data has been loaded into memory.
print(object.size(sps_all), units = "MB")
## 35.9 Mb
At last we subset our data to MS2 spectra with a precursor ion that matches the m/z of the [M+H]+ ion of the metabolite Cystine (accepting a difference in m/z of 10 parts-per-million (ppm)).
#' Define the m/z ratio for the [M+H]+ ion of Cystine
mz <- 241.0311
#' Subset the dataset to MS2 spectra matching the m/z
sps <- filterPrecursorMz(sps_all, mz = mz + ppm(c(-mz, mz), 10))
sps
## MSn data (Spectra) with 11 spectra in a MsBackendDataFrame backend:
## msLevel rtime scanIndex
## <integer> <numeric> <integer>
## 1 2 209.936 673
## 2 2 220.072 714
## 3 2 231.604 734
## 4 2 211.481 687
## 5 2 224.634 718
## 6 2 236.326 731
## 7 2 215.089 761
## 8 2 225.739 781
## 9 2 240.020 804
## 10 2 212.161 771
## 11 2 224.662 801
## ... 33 more variables/columns.
## Processing:
## Switch backend from MsBackendMzR to MsBackendDataFrame [Fri Feb 26 08:30:11 2021]
## Filter: select spectra with a precursor m/z within [241.028689689, 241.033510311] [Fri Feb 26 08:30:11 2021]
In total 11 spectra matched our target precursor m/z.
The plotSpectra
function can be used to visualize spectra. Below we plot the first spectrum from our data subset.
#' Plot the first spectrum
plotSpectra(sps[1])
This raw MS2 spectrum contains many very low abundance peaks, most likely representing noise. Thus we next filter the spectra removing all peaks with an intensity smaller than 5% of the maximum intensity of each spectrum (i.e. the base peak intensity). To this end we define a function that takes intensity values from each spectrum and returns a logical value whether the peak should be kept (TRUE
) or not (FALSE
). This function is then passed to the filterIntensity
function.
#' Define a filtering function
low_int <- function(x, ...) {
x > max(x, na.rm = TRUE) * 0.05
}
#' Apply the function to filter the spectra
sps <- filterIntensity(sps, intensity = low_int)
After filtering, the spectra are cleaner:
#' Plot the first spectrum after filtering
plotSpectra(sps[1])
In addition we normalize each spectrum replacing the absolute intensity values with values relative to the spectrum’s maximum intensity (which is set to 100). For this operation we also define a function which takes a peak matrix as input and returns a matrix with the same dimensions. The peak matrix is the two-column matrix with m/z (first column) and intensity values (second column) of each peak of a spectrum. This function is then passed with parameter FUN
to the addProcessing
function which allows to apply any user-defined function to the peak matrix of each spectrum in a Spectra
object.
#' Define a function to *normalize* the intensities
norm_int <- function(x, ...) {
maxint <- max(x[, "intensity"], na.rm = TRUE)
x[, "intensity"] <- 100 * x[, "intensity"] / maxint
x
}
#' *Apply* the function to the data
sps <- addProcessing(sps, norm_int)
To show the effect of the normalization we extract the intensities of the first spectrum:
#' Get the intensities after normalization
intensity(sps)[[1]]
## [1] 13.439402 93.631648 49.287979 100.000000 9.422415 6.581111 12.969807
## [8] 54.633224
The intensity values are now all between 0 and 100. Note that all these data manipulations (intensity filtering and normalization) did not change the original m/z and intensity values. Data manipulation operations are cached by default in the lazy evaluation queue of the Spectra
object and applied to the data on-the-fly each time m/z or intensity values are accessed. This ensures that the same data manipulations can be used for any type of backend, even if the data resource is read-only (e.g. if the data is retrieved on the fly from mzML files).
This mechanism enables us also to undo cached data manipulations with the reset
function:
## [1] 10 163 41 20 31 210
We next perform a pairwise comparison of the spectra using the dot product as similarity measure. Prior to the actual similarity calculation, the peaks of the individual spectra have to be matched against each other (i.e. it has to be determined which peak from one spectrum correspond to which from the other spectrum based on their mass-to-charge ratios). We specify ppm = 20
so that peaks with a difference in m/z smaller than 20ppm will be considered matching.
#' Pairwise comparison of all spectra
cormat <- compareSpectra(sps, ppm = 20)
The pairwise spectra similarities are represented with the heatmap below (note that RStudio in the docker might crash by the pheatmap
call - to avoid this add filename = "hm.pdf"
to the heatmap
call).
The 11 spectra appear to be grouping into 3 clusters, which are in fact related to the collision energy used for the fragmentation (see below; the collision energy is encoded in the file name as CE20 and CE30). We thus use the cutree
function to cut the tree (dendrogram) into 3 clusters. Subsequently we reduce our dataset to the cluster with the spectra generated with a collision energy of 20eV.
#' Get the cluster-assignment of the spectra
cl <- cutree(hm$tree_row, 3)
cl
## [1] 1 1 1 2 2 3 1 1 1 2 3
## $`1`
## [1] "HighIS_Mix06_CE20_POS.mzML" "HighIS_Mix06_CE20_POS.mzML"
## [3] "HighIS_Mix06_CE20_POS.mzML" "QC_HighIS_Mix06_CE20_POS.mzML"
## [5] "QC_HighIS_Mix06_CE20_POS.mzML" "QC_HighIS_Mix06_CE20_POS.mzML"
##
## $`2`
## [1] "HighIS_Mix06_CE30_POS.mzML" "HighIS_Mix06_CE30_POS.mzML"
## [3] "QC_HighIS_Mix06_CE30_POS.mzML"
##
## $`3`
## [1] "HighIS_Mix06_CE30_POS.mzML" "QC_HighIS_Mix06_CE30_POS.mzML"
Although the precursor m/z of our spectra matches the m/z of Cystine, we can still not exclude that they might represent fragmentations of ions from a different compound (i.e. that would have the same precursor m/z).
Matching experimental spectra against a public spectral library can be used as a first step in the identification process. Several (public) spectral libraries for small molecules are available, such as:
For some of these databases MsBackend
interfaces are already implemented allowing inclusion of their data directly into R-based analysis workflows. Access to MassBank data is for example possible with the MsBackendMassbank package. This package provides the MsBackendMassbank
for import/export of MassBank files as well as the MsBackendMassbankSql
backend that directly interfaces the MassBank MySQL database.
Below we load the MsBackendMassbank
package and connect to a local installation of the MassBank MySQL database (release 2021.02 which is provided within the docker image of this tutorial).
library(RMariaDB)
library(MsBackendMassbank)
#' Connect to the MassBank MySQL database
con <- dbConnect(MariaDB(), user = "massbank", dbname = "MassBank",
host = "localhost", pass = "massbank")
We can now initialize a Spectra
object with a MsBackendMassbankSql
backend to access all the data in MassBank.
#' Access the spectra data in MassBank
mbank <- Spectra(con, source = MsBackendMassbankSql())
mbank
## MSn data (Spectra) with 88168 spectra in a MsBackendMassbankSql backend:
## msLevel precursorMz polarity
## <integer> <numeric> <integer>
## 1 2 506.000 0
## 2 NA NA 1
## 3 NA NA 0
## 4 NA NA 1
## 5 NA NA 0
## 6 2 449.380 1
## 7 2 426.022 0
## 8 2 131.060 0
## 9 2 183.170 1
## 10 2 358.270 0
## ... 42 more variables/columns.
## Use 'spectraVariables' to list all of them.
The Spectra
object mbank
represents now the MS data from the MassBank database with in total length(mbank)
spectra. In fact, the mbank
object does only contain the primary keys of the spectra but no MS data. Hence, it’s size in memory is only relatively small:
print(object.size(mbank), units = "MB")
## 6.1 Mb
Any operation on this Spectra
object will load the requested data from the database on-the-fly.
We can now compare our experimental spectra against the reference spectra from MassBank. Because loading data from the database takes some time, we first screen for spectra that have a peak matching the precursor m/z (even this operation takes ~ 30 seconds to finish).
#' Identify spectra that contain a peak matching the m/z of Cystine
has_mz <- containsMz(mbank, mz = mz, ppm = 20)
Note that, to improve performance, we could also load all the spectra data into memory by simply changing the backend with setBackend
to a MsBackendDataFrame
(as we did with the experimental spectra data above).
In total 217 spectra contain a peak with the required m/z and we can proceed to calculate spectral similarities between our experimental spectra and this subset from MassBank. compareSpectra
will calculate the similarity between each spectrum from sps_ce20
and each spectrum from mbank_with_mz
and returns this as a numeric matrix
with the similarity scores (number of rows equal to the number of spectra in sps_ce20
and number of columns equal to the number of spectra in mbank_with_mz
- element res[3, 15]
would thus represent the similarity between experimental spectrum 3 to MassBank spectrum 15).
#' Subset the MassBank Spectra
mbank_with_mz <- mbank[has_mz]
#' Compare experimental spectra against MassBank
res <- compareSpectra(sps_ce20, mbank_with_mz, ppm = 20)
The highest similarity between our spectra and the spectra from MassBank is 0.9236822. Below we indentify the best matching spectrum and access its intensity values. With parameter arr.ind = TRUE
which
returns an integer of length 2, the first element representing the row in res
, the second the column.
## row col
## [1,] 4 5
intensity(mbank_with_mz[idx[2]])
## NumericList of length 1
## [["PB000446"]] 50.369 88.328 45.922 37.428 ... 950.368 69.414 130.733 2338.193
Absolute intensities are reported for the MassBank spectrum, but for better visualizations we would like to normalize them the same way we did with our experimental spectra. As a side note it should also be mentioned that the used dot product function for spectra similarity is independent of absolute intensity values, thus, it did not matter if we performed the spectra comparisons on absolute or relative intensities.
The above described lazy evaluation queue of Spectra
objects allows us to perform data manipulations on the MassBank data without actually modifying the original data. We thus normalize the MassBank spectra with the same function we used for the experimental spectra.
#' *Normalize* the MassBank data
mbank_with_mz <- addProcessing(mbank_with_mz, norm_int)
Below we can then compare the two best matching spectra with a mirror plot, in the upper panel showing our experimental spectrum and in the lower panel the best matching MS2 spectrum from MassBank. Plotting functions in Spectra
are highly customizable and in the example below we add the m/z for each individual peak as an annotation to it if the intensity of the peak is higher than 5.
#' Specifying a function to draw peak labels
label_fun <- function(x) {
ints <- unlist(intensity(x))
mzs <- format(unlist(mz(x)), digits = 4)
mzs[ints < 5] <- ""
mzs
}
plotSpectraMirror(sps_ce20[idx[1]], mbank_with_mz[idx[2]], tolerance = 0.2,
labels = label_fun, labelPos = 2, labelOffset = 0.2,
labelSrt = -30)
grid()
Our experimental spectrum nicely matches the reference MS2 in MassBank. We next want to know which compound this spectrum actually represents. Spectra
objects can have arbitrarily many additional annotation fields (so called spectra variables) and we can use the spectraVariables
function to list all of them which are available in a specific Spectra
object.
#' What variables are available in MassBank
spectraVariables(mbank_with_mz)
## [1] "msLevel" "rtime"
## [3] "acquisitionNum" "scanIndex"
## [5] "dataStorage" "dataOrigin"
## [7] "centroided" "smoothed"
## [9] "polarity" "precScanNum"
## [11] "precursorMz" "precursorIntensity"
## [13] "precursorCharge" "collisionEnergy"
## [15] "isolationWindowLowerMz" "isolationWindowTargetMz"
## [17] "isolationWindowUpperMz" "spectrum_id"
## [19] "spectrum_name" "date"
## [21] "authors" "license"
## [23] "copyright" "publication"
## [25] "splash" "compound_id"
## [27] "adduct" "ionization"
## [29] "ionization_voltage" "fragmentation_mode"
## [31] "collision_energy_text" "instrument"
## [33] "instrument_type" "formula"
## [35] "exactmass" "smiles"
## [37] "inchi" "inchikey"
## [39] "cas" "pubchem"
## [41] "synonym" "precursor_mz_text"
## [43] "compound_name"
In fact we get, in addition to spectra specific information like the instrument on which it was measured or the ionization voltage used, we get also information on the compound such as its name ("compound_name"
), its chemical formula ("formula"
) or its INChI key ("inchikey"
). Thus, we have all relevant information already in the Spectra
object. Below we select the best matching spectrum from MassBank and display its associated compound name.
mbank_best_match <- mbank_with_mz[idx[2]]
mbank_best_match$compound_name
## [1] "Cystine"
Thus, the best matching spectrum is in fact a MS2 spectrum of Cystine. Below we add the name and the chemical formula for the best matching spectrum to our experimental spectra. We also set the collision energy for these spectra to 20eV and assign the ion/adduct of Cystine from which the reference spectrum was created.
#' Add annotations to the experimental spectra
sps_ce20$name <- mbank_best_match$compound_name
sps_ce20$formula <- mbank_best_match$formula
sps_ce20$adduct <- mbank_best_match$adduct
sps_ce20$collisionEnergy <- 20
At last we want to export our spectra to a file in MGF format. For this we use the MsBackendMgf R package which provides the MsBackendMgf
backend that adds support for MGF file import/export to Spectra
objects.
Data from Spectra
objects can generally be exported with the export
function. The format in which the data is exported depends on the specified MsBackend
class. By using an instance of MsBackendMgf
we can write below the data to a file in MGF format.
library(MsBackendMgf)
#' Export the spectra to a MGF file
export(sps_ce20, backend = MsBackendMgf(), file = "Cystine_ce20.mgf")
In addition to the MsBackendMassbank
, which provides access to MassBank data, there is also the MsBackendHmdb
package supporting spectral data from the public Human Metabolome Database (HMDB). This package does however not yet provide direct access to the HMDB database but, through the MsBackendHmdbXml
backend, allows to import MS2 spectra files in HMDB format. These are provided by HMDB as individual xml files in a custom file format which are bundled (and can hence be downloaded) in a single archive.
To reproduce the following code it is expected (as detailed in the Installation section) that all xml files from HMDB are available in a folder data/hmdb_all_spectra. Below we identify all xml files containing the key word "ms_ms"
in their file name and load them into a Spectra
object using the MsBackendHmdbXml
backend. Note that this import operation from the ~ 500,000 individual xml files takes up to ~ 2 hours to finish.
library(MsBackendHmdb)
#' Get all MS2 spectra xml files and import data
fls <- dir("data/hmdb_all_spectra/", full.names = TRUE, pattern = "ms_ms")
hmdb <- Spectra(fls, source = MsBackendHmdbXml(), nonStop = TRUE)
With this we have now a Spectra
object containing all MS2 spectra from HMDB. Note that with the MsBackendHmdbXml
all spectra data is kept in memory.
hmdb
## MSn data (Spectra) with 458963 spectra in a MsBackendHmdbXml backend:
## msLevel rtime scanIndex
## <integer> <numeric> <integer>
## 1 2 NA NA
## 2 2 NA NA
## 3 2 NA NA
## 4 2 NA NA
## 5 2 NA NA
## ... ... ... ...
## 458959 2 NA NA
## 458960 2 NA NA
## 458961 2 NA NA
## 458962 2 NA NA
## 458963 2 NA NA
## ... 21 more variables/columns.
Also here, to avoid comparing our experimental spectra against all these ~500,000 spectra, we first determine with the containsMz
function which of the HMDB spectra contain a peak matching the m/z of our ion of interest. We have to use a rather large tolerance
value (which defines the maximal acceptable absolute difference in m/z values) since some of the experimental spectra in HMDB seem to be recorded by not well calibrated instruments.
#' Identify spectra containing a peak matching Cystine m/z
has_mz <- containsMz(hmdb, mz = mz, tolerance = 0.2)
In total 46772 spectra contain a peak with the required m/z (+/- 0.2 Dalton) and we can proceed to calculate spectral similarities between our experimental spectra and this subset from HMDB.
#' Subset HMDB
hmdb_with_mz <- hmdb[has_mz]
#' Compare experimental spectra against HMDB
res <- compareSpectra(sps_ce20, hmdb_with_mz, tolerance = 0.2)
The highest similarity between our spectra and the spectra from HMDB is r max(res)
. Below we compare the two best matching spectra with a mirror plot, in the upper panel showing our experimental spectrum and in the lower panel the best matching MS2 spectrum from HMDB.
idx <- which(res == max(res), arr.ind = TRUE)
## Specifying a function to draw peak labels
label_fun <- function(x) {
format(unlist(mz(x)), digits = 4)
}
plotSpectraMirror(sps_ce20[idx[1]], hmdb_with_mz[idx[2]], tolerance = 0.2,
labels = label_fun, labelPos = 2, labelOffset = 0.2,
labelSrt = -30)
grid()
Our experimental spectrum seems to nicely match the reference MS2 in HMDB. Below we extract the compound identifier from the best matching HMDB spectrum (stored in a spectra variable called "compound_id"
)
hmdb_with_mz[idx[2]]$compound_id
## [1] "HMDB0000192"
In fact, the matching spectrum from HMDB is an experimental spectrum for L-Cystine.
With the simple use case of matching experimental MS2 spectra against a public database we illustrated in this short tutorial the flexibility and expandability of the Spectra
package that enables the seamless integration of mass spectrometry data from different sources. This was only possible with a clear separation of the user functionality (Spectra
object) from the representation of the data (MsBackend
object). Backends such as the MsBackendMgf
, the MsBackendMassbank
or the MsBackendHmdbXml
can provide support for additional data formats or data sources, while others, due to their much lower memory footprint (MsBackendMzR
, MsBackendHdf5Peaks
), enable the analysis of also very large data sets. Most importantly however, these backends are interchangeable and do not affect the way users can handle and analyze MS data with the Spectra
package.
Institute for Biomedicine, Eurac Research, Bolzano, Italy; johannes.rainer@eurac.edu↩︎
Research Unit Analytical BioGeoChemistry, Helmholtz Zentrum München and Chair of Analytical Food Chemistry, TUM School or Life Sciences, Technical University of Munich, Germany↩︎
Department of Anaesthesiology and Intensive Care, University Medicine Greifswald, Germany↩︎
Computational Biology Unit, de Duve Institute, UCLouvain, Brussels, Belgium↩︎