Characterisation of rocks in drill core logging affects all downstream exploration and exploitation decisions. However, multi-year projects, with evolving geological understanding, several different geologists involved at different times, and time pressure leads to inconsistencies in drill core logs. To alleviate this issue, machine learning (ML) is proposed to assist exploration and mine planning with a consistent basis for informed decisions in a timely manner. However, model outputs are usually deterministic and unjustified. Thus, the objective of this thesis is to lay the methodological foundations for a future decision support tool grounded in geological knowledge and best practices in geodata science that can handle uncertainty and justify its decisions to the exploration geologist.
The Rävliden North Zn-Pb-Ag-Cu volcanogenic massive sulphide (VMS) deposit, located in the Palaeoproterozoic Skellefte district in the Fennoscandian shield of northern Sweden, serves as a case study for this thesis. The deposit is situated in the contact between the 1.89–1.88 Ga Skellefte group comprised of metavolcanic rocks and overlying 1.89–1.87 Ga Vargfors group comprised of dominantly metasiliciclastic rocks. The deposit is hosted in tremolite-rich calc-silicate rocks, chlorite and sericite schists, and graphitic phyllite. The schists originated from felsic protoliths, with three distinct rhyolitic precursors identified. Less altered andesite and dacite also occur in the stratigraphy. The deposit is interpreted as a replacement-style mineralisation, characterised by multiple alteration stages, including early calcitic alteration in permeable rhyolitic facies with overprinting sericitic and chloritic alteration associated with massive sulphides. The early calcitic alteration phase is recognised in mass gains for Ca, whereas the later ore proximal chloritic alteration is characterised by mass gains in Mg and Fe, alongside mass losses in K and Na.
Characterisation of precursors and mass change calculations were done with whole-rock lithogeochemical samples. Mass change calculations are associated with several uncertainties and to quantify these a new method called propagated mass change error (PROMACE) has been developed. The propagated errors for Na, Mg, K, Ca and Fe are on average ±1.1 wt%. For Si they are on average ±11.1 wt%. Notably, it is found that large mass gains are associated with larger errors than mass losses of the same magnitude.
Machine learning was done on X-ray fluorescence (XRF) drill core scan data where 15 drill holes were used as training data and three as test data. Random forest (RF), support vector machine (SVM) and Multilayer perceptron (MLP) models have been tested on classifying rock types. It is found that, intra-site generalisability is low for all models, with RF achieving the highest mean F1 test score of 0.476±0.034. As for intra-dataset generalisability, model performance is higher, with SVM yielding the highest average F1 training score of 0.863±0.015. Two different cross-validation training strategies are tested, and it is shown that K-fold cross-validation is not representative of intra-site generalisability. For this level of generalisability, stratified group K-fold cross-validation is recommended.
Different variants of pre-processing are explored, and it is found that SVM benefits using a centred log-ratio transform (mean training F1 = 0.863 ± 0.015 with and F1 = 0.720 ± 0.016 without). Notably, it is found that model-based imputation of missing values and data augmentation with a synthetic minority oversampling technique has limited benefits for any of the ML models.
A more detailed study of MLP models was conducted where its performance on precursor, alteration type and rock type classification was assessed. F1 training scores for precursor classification is 0.599 ± 0.223, whereas performance on alteration and rock type is lower with F1 scores of 0.431 ± 0.038 and F1 = 0.401 ± 0.081 respectively. Model uncertainty was assessed by using Monte Carlo dropout (MCD) and showed that higher uncertainty for alteration and rock type classification than for precursor classification. This, together with generally lower model performance, suggests that classification tasks that take alteration into account are more difficult for MLP models to resolve.
SHapley Additive exPlanations (SHAP) were used to justify model classifications by identifying the features contributing most to predictions. The SHAP analysis reveals that the model relies on intuitive geological features, such as Zr and Ti to distinguish precursors or Ca to identify calcitic alteration. However, in some cases, indirect feature to target relationships are learned. For such classes it is also found that model performance in generally lower.
The results of this thesis show how geological knowledge can be structured and applied to best practice procedures in model training and pre-processing. Additionally, uncertainty estimates with PROMACE for mass change estimates and MCD for rock classification, together with SHAP for model interpretability, can provide geologists with transparent and justifiable outputs. On these foundations, future research should investigate how to incorporate model uncertainty and justification into a drill core logging workflow, thereby reducing inconsistencies in drill core logging.
Luleå: Luleå University of Technology, 2026.