Correcting raw data, calculating peaks and indices in EEMS and absorbance (slope) parameters

staRdom: spectroscopic analysis of dissolved organic matter in R

Matthias Pucher matthias.pucher@wcl.ac.at

April 23 2020

1 Introduction

staRdom is a package for R version 3 (R Development Core Team 2019) to analyse fluorescence and absorbance data of dissolved organic matter (DOM). The important features are:

staRdom has been developed and is maintained at WasserCluster Lunz (http://www.wcl.ac.at/) and the University of Natural Resources and Life Sciences, Vienna(http://www.boku.ac.at/).

staRdom comes with an Rmd template (infos on Rmd) where you can start your analysis with example data and add your personal data and parameters whenever you feel ready. We recommend to go interactively through the template while reading this vignette to get an overview of what is possible. Each version of staRdom provides a new template and usually using templates from different versions should be fine. In case you have difficulties running the calculations in the template, please use the template of the very same version as the installed staRdom package. As an advanced user, you can just include the functions in whatever calculations you want to do (please see details in the advanced vignetts). This vignette describes the template. If you are interested in the specific functions please refer to the help in R, which can be accessed by help(function) or in RStudio pressing F1 while the cursor is on the function name in the code.

Later in the vignette there is also a chapter about troubleshooting. If you experience problems you may find a useful solution there.

1.1 Aim of this document

This file aims for beginners in R and describes an easy way of calculating EEM peaks and absorbance (slope) parameters by just setting variables (no use of functions) in an Rmd file. The options are limited and a PARAFAC analysis cannot be done this way. This way of more or less automatic analysis bears the risks of missing informations in the data and overlooking problems like outliers and noise in the data. For more possibilities, options and a PARAFAC analysis please refer to the vignette for the PARAFAC analysis.

1.2 Hint for beginners in R

If you are a beginner in R you may find some help at the R-Studio online-learning (https://docs.rstudio.com/), or Modern R with the tidyverse (https://b-rodrigues.github.io/modern_R/) by Bruno Rodrigues.

The package is available on CRAN and can be installed via install.packages("staRdom").

2 Starting the analysis using the template

The template is accessible with the command file.edit(system.file("EEM_simple_analysis.Rmd", package = "staRdom")). You can and should save this file if you want to preserve it. The example data is saved within the package structure and you can find the containing folders with the commands system.file(package = "staRdom"). Raw data is in the sub-folder “extdata”. The original settings in the template refer to sample data and you can do full featured test runs with that.

2.1 Output parameters

On top of the template there are the header parameters necessary to create a report with knitr (https://yihui.name/knitr/). Parameters can be changed and just show up in the final report (e.g. author, title) or alter the appearance of the document. It is possible to create reports in other file formats. Please find details at https://rmarkdown.rstudio.com/lesson-9.html.

The directory your generated files are saved in is set by output_dir at line 7. It is important that you keep the “;” at the end of the line. Folders are delimited by “/”. In RStudio, pressing the tab key while the cursor is in the path can reveal possible folders on your drive.

In case you want to run the code chunk-wise (https://rmarkdown.rstudio.com/lesson-3.html), you need to specify the output folder on line 61.

2.2 Input parameters

Please be sure to use the same file names for your fluorescence data, absorbance data and meta data, as differing file names are often the reason for a non working analysis.

2.2.1 Fluorometer data (EEM)

The parameter sample_dir specifies the directory where your data files from the fluorometer are. They have to be in a text format (Cary Eclipse .csv files, Aqualog .dat files, Shimadzu .TXT files, Fluoromax-4 .dat files, Hitachi .TXT files, generic .CSV files). Samples can be stored in subfolders as well. Please be sure, that your file names are unique. File names must not contain " " (space) or “-” (minus) or start with a number. The command system.file() as used in the template EEM_simple_analysis.Rmd is used to access the example data and is not needed if you want to use your own data.

2.2.2 Photometer data (absorbance)

Absorbance data are needed for inner-filter effect correction and the calculation of the slope parameters. They are taken from the folder specified by absorbance_dir. The filenames or column designations must be identical to the EEM file names to link fluorescence and absorbance data distinctly. Please be sure, that your file names are unique. File names must not contain " " (space) or “-” (minus) or start with a number. For the calculations the light path length (in cm) used in the photometric measurement has to be set.

2.2.3 Meta data

In case your samples’ measurements differ and you need to set parameters sample-wise, you can set distinct dilution factors, photometer cuvette lengths and Raman areas in a table. You can skip that if dilution factors and cuvette lengths are similar and you used blank samples for calculating the Raman area. Distinct numbers can then be set as described below. A dilution factor of e.g. 10 means, there is 1 part sample and 9 parts ultrapure water added.

2.3 Write out results and plots

2.3.1 Results table

If you want to export your picked peaks and slope parameters as a table, set the parameter output_xls = TRUE. Exporting XLS files needs a properly configured Java environment. If any problems are encountered, a CSV is written to your output directory instead and can be opened with a spreadsheet software. Furthermore, you can specify the cell separator and decimal point of you data files in case a CSV file is written.

2.3.2 Plots

The script offers several options for exporting plots. output_overview_png states whether overview plots containing a number of samples (overview_number) each are saved in the output directory and output_single_png is the parameter if you want to export single PNGs from each sample. The parameter scale_col defines if the colour range of all plots is synchronised. If you want to compare different samples, it is easier if the colour code has the same range. Weak peaks in samples with lower fluorophore presence than other samples can be found easier if the colours are not synchronised.

With the parameters overview and single_plots these plots can be included in the report using the same parameters scale_col and overview_number.

2.4 Data correction

Raw data from fluorometers and photometers bear several shortcomings. Murphy et al. (2013) addressed several ways of EEM data correction that are used in the staRdom template. Some correction methods need specific data (e.g. absorbance ) to be applied. Corrections can be necessary and can help you focus on certain aspects and information covered by noise otherwise. But depending on your aim they might not be necessary. Bro and Smilde (2003) offer additional information on correction of EEM data.

2.4.2 Dilution

If samples were diluted before the spectroscopic measurements, the dilution factor can be set here and the sample will be corrected accordingly. As an example a dilution factor of 10 means a 1:10 dilution (1 part sample and 9 parts ultrapure water). By setting dilution = "meta", data from the meta table is used and each sample can be corrected by an individual dilution factor.

The reason of diluting a sample can be inner-filter effects that cannot be corrected conveniently if they are too high (Kothawala et al. 2013). On the contrary, absorbance data might be better analysed undiluted. To combine the results of diluted EEM measurements and undiluted absorbance measurements, the following parameter can be set to do this automatically. Please check the results because depending on your sample names, automatically guessed combinations might not be recognised correctly!

2.4.4 EEM range reduction

EEM data can be cut in both dimensions. Peaks are calculated before the reduction. Cut ranges are set with vectors containing the upper and lower limits: c(lower,upper). If you want to avoid any cutting, set the vectors to c(0,Inf). Inf means infinity, so the script keeps data from 0 to infinity. The script also allows to cut all samples to the size of the sample with the shortest range which is necessary if you want to perform a PARAFAC analysis. Cutting can be necessary to remove noisy data in advance of a PARAFAC analysis or to increase the visibility of important peaks in plots.

2.4.5 Blank correction

Blank samples are data from measuring ultrapure water. Systematic biases can be removed by subtracting the blank sample from each sample. The blank samples have to contain “nano”, “miliq”, “milliq”, “mq” or “blank” (cases are ignored) in the file name. Regular samples must not contain one of these words. Blanks have to be in the same (sub)folder as the samples that are corrected with the certain blank. Multiple blanks in one (sub)folder are averaged. It needs to be measured with each sample set (e.g. once a day) (Murphy et al. 2013) and kept together in one folder.

2.4.6 Correct inner filter effects

The inner-filter effect is caused by absorbance that blocks light in the pathway from the source to the sensor during fluorescence measurements. To apply the inner-filter effects correction described in Kothawala et al. (2013), absorbance data have to be measured for each sample. By knowing the exact absorbance, this effect can be mathematically corrected. In case of a total absorbance greater than 1.5, the sample has to be diluted because otherwise, the linear relationship is not appropriate anymore.

2.4.7 Remove and interpolate scattering

Diagonal scatter peaks hinder the analysis of EEM data as they usually are much greater than peaks from DOM. They can be partly removed by subtracting the blank sample as described above. Diagonal peaks are called Rayleigh and Raman peaks of first and second order. They can also make a PARAFAC analysis impossible. Senesi (1990), Lakowicz et al. (2006) and Coble et al. (1990) offer additional information.

The width of the removed scatter slot can be set. Make sure not to lose too much data while still removing the whole peak. If you use the interpolation below, a remaining diagonal peak hints at insufficient width. Elcoroaristizabal et al. (- Elcoroaristizabal et al. 2015), Bahram et al. (2006) and Zepp et al. (2004) suggest an interpolation of the removed scattering prior to a PARAFAC analysis and offer a description.

2.4.8 Raman normalisation

Fluorescence intensities can differ between analyses on different fluorometers, different settings or different days on the same fluorometer. The so-called Raman normalisation makes samples comparable and normalises fluorescence intensities to Raman units. In staRdom, it can be applied in two ways. Either you use a blank sample (details see at chapter Blank correction above) to calculate the value for the normalisation (Lawaetz and Stedmon 2009) or you provide a certain value, that is used. Fixed values for each sample can be set in the meta table as well.

2.4.9 Smoothing

For calculating the peaks, the EEMs can be smoothed. If so, peaks and indices are calculated from smoothed EEMs but these are not saved. The smoothing parameter specifies the size of a moving average window in nm.

2.5 Running the analysis

If you reach the box below in the template, all parameters are set and you can finally run the analysis.

You can run the script by clicking the “Knit” button in the toolbar of RStudio. At the first run of the script you may be asked if you want to install several packages. Please confirm. This can take some time. Your generated files are placed in your specified output folder. In case you experience problems, consider to start over with a “fresh” template.

3 Installation

The script is running in R environment (R Development Core Team 2019). Using a graphical user interface like RStudio (https://www.rstudio.com/) can help beginners to get into it.

You can install staRdom via RStudio by klicking Tools -> Install Packages… or by entering the command install.packages("staRdom") in the command line.

If any of the programs are already installed on your computer, you can skip the respective step. In case of problems while running the script, consider re-installing/updating the respective programs.

3.1 R

Download:

https://cran.r-project.org/mirrors.html

Installation manual:

https://cran.r-project.org/doc/manuals/r-release/R-admin.html#R-Installation-and-Administration

3.2 RStudio

Download:

https://www.rstudio.com/products/rstudio/download/#download

Please choose the installer for your operating system, not the zip/tarballs.

Install RStudio by running the setup.

3.3 Optional software

Optionally(!) you need a Java runtime environment to import data from XLS files and a TeX environment (e.g. MikTeX for Windows) to export PDF files. You can use the script to the full extend without those.

4 Troubleshooting

4.1 Peaks table shows NAs

NA stands for ‘Not available’ and means the wavelength range of the certain peak is missing. Be sure that you measured the range of the certain peak on your instrument.

4.2 Only some sample plots show peaks

If samples differ considerably in the amount of DOM, scaling might be a problem. You can scale each sample plot separately by setting scale_col = FALSE.

4.3 I cannot read csv files in MS Excel

If you encounter problems with reading csv files please visit:

https://support.office.com/en-us/article/Import-data-using-the-Text-Import-Wizard-40c6d5e6-41b0-4575-a54e-967bbe63a048

4.4 I get error messages concerning my output directory

Be sure the specified drive and folder existis on your system (e.g. C:/) and you have write access.

References

Bahram, Morteza, Rasmus Bro, Colin Stedmon, and Abbas Afkhami. 2006. “Handling of Rayleigh and Raman Scatter for PARAFAC Modeling of Fluorescence Data Using Interpolation.” Journal of Chemometrics 20 (3-4): 99–105. https://doi.org/10.1002/cem.978.

Bro, Rasmus. 1997. “PARAFAC. Tutorial and Applications.” Chemometrics and Intelligent Laboratory Systems 38 (2): 149–71. https://doi.org/10.1016/S0169-7439(97)00032-4.

Bro, Rasmus, and Age K. Smilde. 2003. “Centering and Scaling in Component Analysis.” Journal of Chemometrics 17 (1): 16–33. https://doi.org/10.1002/cem.773.

Coble, Paula G. 1996. “Characterization of Marine and Terrestrial DOM in Seawater Using Excitation-Emission Matrix Spectroscopy.” Marine Chemistry 51 (4): 325–46. https://doi.org/10.1016/0304-4203(95)00062-3.

Coble, Paula G., Sarah A. Green, Neil V. Blough, and Robert B. Gagosian. 1990. “Characterization of Dissolved Organic Matter in the Black Sea by Fluorescence Spectroscopy.” Nature 348 (6300): 432–35. https://doi.org/10.1038/348432a0.

De Haan, H., and T. De Boer. 1987. “Applicability of Light Absorbance and Fluorescence as Measures of Concentration and Molecular Size of Dissolved Organic Carbon in Humic Lake Tjeukemeer.” Water Research 21 (6): 731–34. https://doi.org/10.1016/0043-1354(87)90086-8.

DeRose, Paul C., and Ute Resch-Genger. 2010. “Recommendations for Fluorescence Instrument Qualification: The New ASTM Standard Guide.” Analytical Chemistry 82 (5): 2129–33. https://doi.org/10.1021/ac902507p.

Dobbs, Richard A., Robert H. Wise, and Robert B. Dean. 1972. “The Use of Ultra-Violet Absorbance for Monitoring the Total Organic Carbon Content of Water and Wastewater.” Water Research 6 (10): 1173–80. https://doi.org/10.1016/0043-1354(72)90017-6.

Elcoroaristizabal, Saioa, Rasmus Bro, Jose Antonio García, and Lucio Alonso. 2015. “PARAFAC Models of Fluorescence Data with Scattering: A Comparative Study.” Chemometrics and Intelligent Laboratory Systems 142 (March): 124–30. https://doi.org/10.1016/j.chemolab.2015.01.017.

Fellman, Jason B., Eran Hood, and Robert G. M. Spencer. 2010. “Fluorescence Spectroscopy Opens New Windows into Dissolved Organic Matter Dynamics in Freshwater Ecosystems: A Review.” Limnology and Oceanography 55 (6): 2452–62. https://doi.org/10.4319/lo.2010.55.6.2452.

Helms, John R., Aron Stubbins, Jason D. Ritchie, Elizabeth C. Minor, David J. Kieber, and Kenneth Mopper. 2008. “Absorption Spectral Slopes and Slope Ratios as Indicators of Molecular Weight, Source, and Photobleaching of Chromophoric Dissolved Organic Matter.” Limnology and Oceanography 53 (3): 955–69. https://doi.org/10.4319/lo.2008.53.3.0955.

Helwig, Nathaniel E. 2019. “Multiway: Component Models for Multi-Way Data.” https://CRAN.R-project.org/package=multiway.

Huguet, A., L. Vacher, S. Relexans, S. Saubusse, J. M. Froidefond, and E. Parlanti. 2009. “Properties of Fluorescent Dissolved Organic Matter in the Gironde Estuary.” Organic Geochemistry 40 (6): 706–19. https://doi.org/10.1016/j.orggeochem.2009.03.002.

Kothawala, Dolly N., Kathleen R. Murphy, Colin A. Stedmon, Gesa A. Weyhenmeyer, and Lars J. Tranvik. 2013. “Inner Filter Correction of Dissolved Organic Matter Fluorescence.” Limnology and Oceanography: Methods 11 (12): 616–30. https://doi.org/10.4319/lom.2013.11.616.

Lakowicz, Joseph R. 2006. Principles of Fluorescence Spectroscopy. 3rd ed. NY, USA: Springer Science+Business Media, LLC. https://www.springer.com/us/book/9780387312781.

Lawaetz, A. J., and C. A. Stedmon. 2009. “Fluorescence Intensity Calibration Using the Raman Scatter Peak of Water:” Applied Spectroscopy, August. https://doi.org/10.1366/000370209788964548.

Li, Penghui, and Jin Hur. 2017. “Utilization of UV-Vis Spectroscopy and Related Data Analyses for Dissolved Organic Matter (DOM) Studies: A Review.” Critical Reviews in Environmental Science and Technology 47 (3): 131–54. https://doi.org/10.1080/10643389.2017.1309186.

Loiselle, Steven A., Luca Bracchini, Arduino M. Dattilo, Maso Ricci, Antonio Tognazzi, Andres Cózar, and Claudio Rossi. 2009. “The Optical Characterization of Chromophoric Dissolved Organic Matter Using Wavelength Distribution of Absorption Spectral Slopes.” Limnology and Oceanography 54 (2): 590–97. https://doi.org/10.4319/lo.2009.54.2.0590.

Massicotte, Philippe. 2019. “eemR: Tools for Pre-Processing Emission-Excitation-Matrix (EEM) Fluorescence Data.” https://CRAN.R-project.org/package=eemR.

McKnight, Diane M., Elizabeth W. Boyer, Paul K. Westerhoff, Peter T. Doran, Thomas Kulbe, and Dale T. Andersen. 2001. “Spectrofluorometric Characterization of Dissolved Organic Matter for Indication of Precursor Organic Material and Aromaticity.” Limnology and Oceanography 46 (1): 38–48. https://doi.org/10.4319/lo.2001.46.1.0038.

Molot, Lewis A., Jeff J. Hudson, Peter J. Dillon, and Sean A. Miller. 2005. “Effect of pH on Photo-Oxidation of Dissolved Organic Carbon by Hydroxyl Radicals in a Coloured, Softwater Stream.” Aquatic Sciences 67 (2): 189–95. https://doi.org/10.1007/s00027-005-0754-9.

Murphy, Kathleen R., Colin A. Stedmon, Daniel Graeber, and Rasmus Bro. 2013. “Fluorescence Spectroscopy and Multi-Way Techniques. PARAFAC.” Analytical Methods 5 (23): 6557–66. https://doi.org/10.1039/C3AY41160E.

Ohno, Tsutomu. 2002. “Fluorescence Inner-Filtering Correction for Determining the Humification Index of Dissolved Organic Matter.” Environmental Science & Technology 36 (4): 742–46. https://doi.org/10.1021/es0155276.

R Development Core Team. 2019. R: A Language and Environment for Statistical Computing. R foundation for statistical computing Vienna, Austria.

Senesi, Nicola. 1990. “Molecular and Quantitative Aspects of the Chemistry of Fulvic Acid and Its Interactions with Metal Ions and Organic Chemicals: Part II. The Fluorescence Spectroscopy Approach.” Analytica Chimica Acta 232 (January): 77–106. https://doi.org/10.1016/S0003-2670(00)81226-X.

Summers, R. S., P. K. Cornel, and P. V. Roberts. 1987. “Molecular Size Distribution and Spectroscopic Characterization of Humic Substances.” Science of the Total Environment 62 (January): 27–37. https://doi.org/10.1016/0048-9697(87)90478-5.

Zepp, Richard G, Wade M Sheldon, and Mary Ann Moran. 2004. “Dissolved Organic Fluorophores in Southeastern US Coastal Waters: Correction Method for Eliminating Rayleigh and Raman Scattering Peaks in Excitation–Emission Matrices.” Marine Chemistry, CDOM in the Ocean: Characterization, Distribution and Transformation, 89 (1): 15–36. https://doi.org/10.1016/j.marchem.2004.02.006.