Comparison of in situ mid-infrared spectroscopy (MIRS) with laboratory MIRS is required to demonstrate the accuracy of field-scale prediction of soil properties. Application of MIRS to investigate soil management questions must also be tested. Our objectives were therefore to determine i) the accuracy of lab vs in situ calibrations using various numbers of local and/or regional soils for prediction of organic carbon (OC), total nitrogen (TN), clay and pH; ii) effects of soil moisture content and variability on model performance for coarser and finer soils; and iii) if the method of OC determination (dry combustion vs MIRS-estimation) affects evaluation of tillage effects. Surface field MIRS measurements were made at three loess sites in Germany, each featuring three tillage treatments. Material (0– 2 cm) was collected for lab MIRS measurements on dried/ground (¡ 0.2 mm) soil and determination of OC, TN, clay and pH. Spectral Principal Component Analysis (PCA) was conducted and partial least squares regression models were created for several calibration strategies: 1) local calibrations trained with n = 40 or 20 soils and tested with n = 110 soils from the same site; 2) regional calibrations trained with n = 150 or 38 soils from two sites and validated with n = 110 soils from the third site; 3) regional calibrations trained with n = 150 or 38 soils from two sites and n = 20 double- or n = 10 quadruple-weighted ``spiked’’ soils selected from the spectral PCA to be representative of the third site, and validation with n = 110 soils also from the third site. Spiking regional calibrations with local soils generally improved accuracy and decreased performance variability, though there were typically diminishing marginal returns to accuracy from increasing the number of local soils. The first two principal components of the lab-MIRS PCA correlated with OC, TN, clay and pH, while the field-MIRS PCA was dominated by soil moisture effects. Lab outperformed field MIRS for all models and properties. Lab MIRS n = 38 regional models were highly accurate for OC (ratio of prediction to interquartile distance (RPIQ) = 4.3) and TN (RPIQ = 6.7), and estimates detected the same significant differences between tillage treatments as analysis conducted with measured values— thus, small regional models can be considered optimal (balancing accuracy and workload). For field MIRS prediction of OC and TN, calibrations with 150 regional or 38 regional plus 10 quadruple-weighted local soils achieved satisfactory accuracy (RPIQ $≥$ 1.89). Although predicted changes to OC in response to tillage were more biased for field MIRS, agreement with measured effects was achieved with n = 40 local models or spiked regional models. Thus, the higher efficiency of field measurement is counterbalanced by a more arduous calibration process to achieve satisfactory accuracy. Accuracies for clay (RPIQ = 0.89– 2.8) and pH (RPIQ = 0.60– 3.2) were lower and more variable than OC and TN for both devices— thus, spiking calibrations and using more soils than OC/TN calibrations are recommended. Soil moisture more negatively affected OC prediction than clay prediction. No simple trend was established for the performances of soil subsets with low, high or variable moisture content, but accuracy was most negatively affected by moisture for the site with the highest sand content.