Characteristic band screening
Correlation analysis (r), competitive adaptive reweighting algorithm (CARS), successive projection algorithm (SPA), correlation analysis + competitive adaptive reweighting algorithm (r-CARS) and correlation analysis + successive projection algorithm (r-SPA) were used to analyze the raw spectra of the magnetite grades (R), the envelope removal (CR), the first-order derivative transform (‘R’), second-order derivative transform (R”) and inverse logarithmic transformation(left( {Logleft( {frac{1}{R}} right)} right)), etc.,21,22. The spectral reflectance data were screened for characteristic bands, and the results are shown in Fig. 3. After screening by SPA algorithm, its raw spectrum (R), envelope removal (CR), first-order derivative transform (R’) second-order derivative transform (R”) and inverse logarithmic transformation (left( {Logleft( {frac{1}{R}} right)} right)) have 23, 3, 10, 10, 10, and 10 characteristic wavelengths, respectively; screened by the CARS algorithm, its R, CR, R’, R”, and (left( {Logleft( {frac{1}{R}} right)} right)) The numbers of characteristic wavelengths are 10, 10, 10, 10, 5, and 10, respectively; screened by the r-CARS algorithm its R, CR, R’, R” and (left( {Logleft( {frac{1}{R}} right)} right)). The number of characteristic wavelengths is 23, 12, 6, 23, and 15, respectively, and the screened wavelengths are used as independent variables to establish the model of magnetite grade inversion. See in Figure 3.
Feature wavelengths selected by different algorithms, (a) raw spectral data, (b) envelope removal, (c) first-order derivative transform, (d) second-order derivative transform, (e) inverse logarithmic transform
Modeling results
Due to the high purity and well-defined content of the experimental samples, the strong linear correlation between the measured hyperspectral reflectance and taste, and the small data samples, partial least squares and multiple linear regression methods were selected. The modeling evaluation indexes were chosen to evaluate the goodness of fit (R2), and root mean square error (RMSE)23,24.
Magnetite iron grade inversion based on PLSR modeling
The experimental samples were divided into modeling and validation sets using the random division method, where 2/3 of the samples were used for modeling (12 samples) and 1/3 of the samples were used for validation (6 samples), and the effect of modeling was determined by the coefficient of determination and root mean square error are evaluated. The characteristic wavelength data selected in 3.1 was used as the independent variable of PLSR, and the grade content data of magnetite was used as the dependent variable to establish the PLSR model. The results are shown in Table 2.
From the prediction effect of the modeling set samples of magnetite grade, among the 25 preprocessing methods, the predictive ability of the model constructed by the feature bands screened by the SPA algorithm is relatively poor under the preprocessing of envelope removal (CR), which (:{text{R}}^{text{2}}) was 0.6771, and (:text{RMSE}) is 9.63%, the prediction ability of the PLSR models constructed by the remaining 24 preprocessing is better, where the coefficient of determination (:{text{R}}^{text{2}}) ranged between 0.9911 and 0.9998 and (:text{RMSE}) between 0.22% and 1.59%. By comparing the prediction effects of the 25 preprocessing methods in the modeling set samples, it is found that the raw spectral reflectance data of magnetite grades were preprocessed by the first-order derivative transform ((:{text{R}}^{{prime:}})) preprocessing, the model constructed by the feature bands extracted by the CARS algorithm has the highest prediction accuracy for the modeling set samples, which (:{text{R}}^{text{2}}) is 0.9998, and (:text{RMSE}) is 0.22%.
In terms of the prediction effect on the samples of the validation set of magnetite grade, among the 25 preprocessing methods, the prediction R” + SPA preprocessing of the samples of the validation set R2 is negative, which indicates that the prediction ability of the multivariate linear stepwise regression model constructed under this preprocessing is extremely poor; comparing the predictive accuracy of the remaining 24 preprocessing methods, CR + SPA, (:{text{R}}^{{prime:}}) + r-SPA,(:{text{:R}}^{text{”}}) + r,(:{text{:R}}^{text{”}}) + CARS,(:{text{:R}}^{text{”}}) + r-SPA and (:{text{R}}^{text{”}}) + r-CARS and other preprocessing have poor prediction of the validation set samples, with (:{text{R}}^{text{2}}) all lower than 0.9, and the remaining 18 prediction treatments for the validation set samples (:{text{R}}^{text{2}}) are all higher than 0.9, and the RMSE is between 0.95% and 6.68%, with better prediction ability. By comparing the prediction accuracies of the validation set in the 25 preprocesses, it was found that (:text{Log(}frac{text{1}}{text{R}}text{)}) + CARS and (:text{Log(}frac{text{1}}{text{R}}text{)}) + r-CARS preprocessing had the best prediction of the validation set, the (:{text{R}}^{text{2}}) and RMSE are the same, and where (:{text{R}}^{text{2}}) is 0.9982 and RMSE is 0.95%.
Comprehensive comparison (:{text{R}}^{{prime:}}) + CARS, (:text{Log:(}frac{text{1}}{text{R}}text{)}) + CARS and (:text{Log(}frac{text{1}}{text{R}}text{)}) + r-CARS prediction ability of the PLSR models constructed under the three preprocessing in the modeling set and the validation set, it was found that the magnetite raw data after inverse logarithmic transformation ((:text{Log(}frac{text{1}}{text{R}}text{)})) processing and the model constructed from the feature bands extracted using the CARS and r-CARS methods are the most effective, which is favorable for the inversion of magnetite grades.
Inversion of magnetite iron grade based on MLSR modeling
The characteristic wavelength data selected in 3.1 were used as the independent variables of the multiple linear stepwise regression model, and the magnetite grade data were used as the dependent variables to establish the multiple linear stepwise regression model. The unnecessary variables were reduced through stepwise regression and meaningful variables were selected to participate in modeling, and the results are shown in Table 3.
In terms of the predictive effectiveness of the modeled set of samples of magnetite grades, among the 25 preprocessing methods, CR + SPA, (:{text{R}}^{text{”}}) + CARS and (:{text{R}}^{text{”}}) + SPA pretreatment, the prediction of the modeling set samples is lower than 0.9, with relatively poor prediction ability. (:{text{R}}^{text{2}}) under CR + SPA, (:{text{R}}^{text{”}}) + CARS and (:{text{R}}^{text{”}}) + SPA, the prediction of the modeling set samples is lower than 0.9, which is relatively poor; the prediction of the modeling set samples under the remaining 22 preprocessing methods is relatively good, with the coefficients of determination (:{text{R}}^{text{2}}) are all higher than 0.9, and (:{text{R}}^{text{2}}) between 0.9210 and 1. (:text{RMSE}) between 0.00001% and 4.76%; by comparing the multivariate linear stepwise regression models constructed by the 25 modeling set samples, it is found that the model with the best prediction effect is the one in which the raw spectra of magnetite grade are transformed by the first-order derivative ((:{text{R}}^{{prime:}})), and at the same time the pre-processing of extracting the characteristic bands using the correlation analysis (r) method has the best prediction effect on the modeling set samples, where (:{text{R}}^{text{2}}) is 1 and (:text{RMSE}) is 0.00001%.
In terms of the prediction effect on the validation set samples of magnetite grade, among the 25 preprocessing methods, the prediction of the validation set samples under the (:{text{R}}^{text{”}}) + SPA preprocessing (:{text{R}}^{text{2}}) is negative, indicating that the prediction ability of the multivariate linear stepwise regression model constructed under this preprocessing is extremely poor; comparing the prediction accuracy of the remaining 24 preprocessing methods, CR + SPA, (:{text{R}}^{{prime:}}) + SPA, (:{text{R}}^{{prime:}}) + CARS, (:{text{R}}^{{prime:}}) + r-SPA, (:{text{R}}^{text{”}}) + r, (:{text{R}}^{text{”}}) + CARS, (:{text{R}}^{text{”}}) + r-SPA and (:{text{R}}^{text{”}}) + r-CARS preprocessing prediction accuracy for the validation set samples (:{text{R}}^{2}) are all lower than 0.9, and the prediction ability is relatively poor, the prediction of the remaining 16 preprocessing methods (:{text{R}}^{text{2}}) are all higher than 0.9, and the prediction effect is all better, which (:{text{R}}^{text{2}}) between 0.9138 and 0.9977. (:text{RMSE}) between 1.06% and 6.51%. By comparing the prediction effects of 25 preprocessing on the validation set samples, it is found that (:text{Log(}frac{text{1}}{text{R}}text{)}) + r, (:text{Log(}frac{text{1}}{text{R}}text{)}) + SPA and (:text{Log(}frac{text{1}}{text{R}}text{)}) + r-SPA predictions of the validation set under the three preprocesses (:{text{R}}^{text{2}}) and(::text{R}text{M}text{S}text{E}) are all the same and have the best prediction effect, with (:{text{R}}^{text{2}}) is 0.9977 and RMSR is 1.06%.
Comprehensive comparison of R + r, (:text{Log(}frac{text{1}}{text{R}}text{)}) + r, (:text{Log(}frac{text{1}}{text{R}}text{)}) + SPA and (:text{Log(}frac{text{1}}{text{R}}text{)}) + r-SPA multivariate linear stepwise models constructed under the four preprocessing in the modeling set and the validation set, it is found that the model constructed by the inverse logarithmic transformation ((:text{Log(}frac{text{1}}{text{R}}text{)})) processed data, the models constructed using the characteristic bands extracted by the correlation analysis (r) method, SPA algorithm and r-SPA method, respectively, are the most effective, which is conducive to the inversion of magnetite grades.
Model optimization
Multiple linear stepwise regression model (MLSR) and partial least squares regression model (PLSR), which have the best prediction effect, are selected for comparison, and as shown in Table 4, in the modeling set samples, both models have the same prediction R2 and RMSE, but in the validation set samples, the prediction accuracies of PLSR model R2 are higher than that of the MLSR model, and the RMSE are lower than that of the MLSR model. Therefore, in the (:text{Log(}frac{text{1}}{text{R}}text{)}) + CARS and (:text{Log(}frac{text{1}}{text{R}}text{)}) + r-CARS preprocessing, the partial least squares regression model (PLSR) constructed under the + r-CARS preprocessing is the best for the inversion of magnetite grade. See in Table 4.
Since the inversion equations of the two best PLSR models are the same, one of the models is selected for comparison with the MLSR model. By comparing the fitting effect of the predicted and measured values of the two inversion models, as shown in Figs. 4 and 5.

Predicted versus measured values of PLSR model for magnetite grade based on characteristic band fitting.

Predicted and measured values of the MLSR model of magnetite grade based on characteristic bands fitting.
Figure 5 Predicted and measured values of the MLSR model of magnetite grade based on characteristic bands fitting.
From the above figure, it can be seen that the fitting equation of the model based on partial least squares to the predicted and measured values is (:text{y=-0.07327+1.00488x}), R2 is 0.99703; the model based on multiple linear stepwise regression model, the fitting equation of its predicted value and measured value is (:text{y=-0.04011+1.00432x}). The equation of fit for the model based on multiple linear stepwise regression model (:{text{R}}^{text{2}}) is 0.99681; by comparing the fitting accuracies of the predicted and measured values of the two inversion models, it is found that the fitting accuracy of the model based on partial least squares regression is higher, so the model based on partial least squares regression has the best effect on the inversion of magnetite grade.