Diffuse Reflectance Infrared Spectroscopy Estimates for Soil Properties Using Multiple Partitions: Effects of the Range of Contents, Sample Size, and Algorithms

Abstract

Abstract The RMSE of validation (RMSE V ) and ratio of the interquartile range to RMSE V (RPIQ V ) are key quality parameters in diffuse reflectance infrared (IR) spectroscopy studies, but the effects of different factors on these parameters are often not sufficiently considered. Our objectives were to reveal the effects of range of contents, sample size, data pretreatment, wavenumber region selection, and algorithms on the evaluation of IR spectra in the wavenumber range from 1,000 to 7,000 cm -1 (mid- and long-wave near IR) estimations. Contents of soil organic C (SOC), N, clay, and sand and pH values were determined for surface soils of an arable field in India, and IR spectra were recorded for four samples consisting of 71–263 soils. For each of the four samples, five random partitions into calibration and validation datasets were carried out, and partial least squares regression (PLSR) or support vector machine regression was performed. A plot of the RMSE V values against the interquartile ranges of measured values for the validation samples (IQR V ) indicated that the IQR V was a key parameter for all soil properties: a sufficiently high IQR V —which is affected by sample size and random partitioning—resulted in generally good estimation accuracies (RPIQ V 2.70). Optimized data pretreatment and wavenumber region selection improved estimation accuracy for SOC and pH. Support vector machine regression was superior to PLSR for the estimation of SOC, clay, and sand, but worse for pH. Overall, this study indicates that multiple partitioning of the data is essential in IR studies and suggests that RPIQ V and RMSE V need to be interpreted in the context of the respective IQR V values. , Core Ideas Multiple partitioning of the data is essential in infrared studies. The interquartile range (IQR) of measured data was a key parameter affecting the evaluation. A sufficiently high IQR resulted in generally good estimation accuracies in this field study. SVMR was slightly superior to PLSR for the estimation of SOC, clay, and sand. Optimum pretreatments and wavenumber region selection were useful for SOC and pH estimations.

Publication
Soil Science Society of America Journal
Michael Vohland
Michael Vohland
Professor for Geoinformatics and Remote Sensing

Professor