Determination of Soil Properties with Visible to Near- and Mid-Infrared Spectroscopy: Effects of Spectral Variable Selection

Abstract

Spectral variable selection is an important step in spectroscopic data analysis, as it tends to parsimonious data representation and can result in multivariate models with greater predictive ability. In this study, we used VIS-NIR (visible to near-infrared) diffuse reflectance and DRIFT (diffuse reflectance infrared Fourier transform in the mid-infrared range, MIR) spectroscopy to determine a series of chemical and biological soil properties. Multivariate calibrations were performed with partial least squares regression (PLSR) using the full absorbance spectra (VIS-NIR: 400-2500 nm with 5-nm intervals; MIR: 4000-800 cm(-1) with 4-cm(-1) intervals) and with a combination of PLSR and CARS (competitive adaptive reweighted sampling) to integrate only the most informative key variables. The CARS procedure has as yet not been applied in the field of soil spectroscopy. As set heterogeneity is crucial for an optimal calibration, we tested these approaches to a sample set of 60 agricultural samples covering a broad range of different parent materials, soil textures, organic matter contents and soil pH values. Soil samples were taken from the Ap horizon (0-10 cm depth), air-dried and pulverised before the lab spectroscopic measurements were performed. In a cross-validation approach, the CARS-PLSR method was markedly more accurate than full spectrum-PLSR for all investigated soil variables and both spectral regions. With MIR data and CARS-PLSR, excellent results (indicated by a residual prediction deviation (RPD) greater than 3.0) were obtained for organic carbon (OC), nitrogen (N), microbial biomass-C (C-mic) and pH values; for hot water extractable C (C-hwe), RPD was 2.60. The accuracies obtained with VIS-NIR data were considerably lower than those with the MIR spectra; best results were retrieved for pH and C-mic (approximately quantitative as indicated by RPD values between 2.0 and 2.5). The information content of the MIR data was substantially different from the VIS-NIR information, as indicated by 2D correlation analysis. We found an overall blurred 2D correlation pattern between both spectral regions with moderate to low correlation coefficients, which suggested that the heterogeneity of the studied soil sample population had led to a very complex blurring of overtones and combination bands in the NIR region. Statistical CARS selections were physically reasonable. MIR key wavenumbers for the studied C fractions were inter alia identified at the bands at 2920 cm(-1) and 2850 cm(-1) (both aliphatic CH-groups) and the region between 1740 and 1600 cm(-1) (CO-groups) and represent hydrophobic and hydrophilic compounds of soil organic matter. Important VIS-NIR wavelengths for assessing C fractions and N were located nearby the prominent water absorption band at 1915 nm and the hydroxyl band at 2200 nm. The simplicity of the approach, parsimony of the multivariate models, accuracy levels in the cross-validation and physically reasonable selections indicated a successful operation of the CARS procedure. It should be further examined with a larger number of samples using separate calibration and validation sets. (C) 2014 Elsevier B.V. All rights reserved.

Publication
Geoderma
Michael Vohland
Michael Vohland
Professor for Geoinformatics and Remote Sensing

Professor