Extraction of Plant Physiological Status from Hyperspectral Signatures Using Machine Learning Methods


The machine learning method, random forest (RF), is applied in order to derive biophysical and structural vegetation parameters from hyperspectral signatures. Hyperspectral data are, among other things, characterized by their high dimensionality and autocorrelation. Common multivariate regression approaches, which usually include only a limited number of spectral indices as predictors, do not make full use of the available information. In contrast, machine learning methods, such as RF, are supposed to be better suited to extract information on vegetation status. First, vegetation parameters are extracted from hyperspectral signatures simulated with the radiative transfer model, PROSAIL. Second, the transferability of these results with respect to laboratory and field measurements is investigated. In situ observations of plant physiological parameters and corresponding spectra are gathered in the laboratory for summer barley (Hordeum vulgare). Field in situ measurements focus on winter crops over several growing seasons. Chlorophyll content, Leaf Area Index and phenological growth stages are derived from simulated and measured spectra. RF performs very robustly and with a very high accuracy on PROSAIL simulated data. Furthermore, it is almost unaffected by introduced noise and bias in the data. When applied to laboratory data, the prediction accuracy is still good (C-ab: R-2 = 0.94/ LAI: R-2 = 0.80/BBCH (Growth stages of mono-and dicotyledonous plants) : R-2 = 0.91), but not as high as for simulated spectra. Transferability to field measurements is given with prediction levels as high as for laboratory data (C-ab: R-2 = 0.89/LAI: R-2 = 0.89/BBCH: R-2 = similar to 0.8). Wavelengths for deriving plant physiological status based on simulated and measured hyperspectral signatures are mostly selected from appropriate spectral regions (both field and laboratory): 700-800 nm regressing on C-ab and 800-1300 nm regressing on LAI. Results suggest that the prediction accuracy of vegetation parameters using RF is not hampered by the high dimensionality of hyperspectral signatures (given preceding feature reduction). Wavelengths selected as important for prediction might, however, vary between underlying datasets. The introduction of changing environmental factors (soil, illumination conditions) has some detrimental effect, but more important factors seem to stem from measurement uncertainties and plant geometries.