Characterisation of rocks in drill core logging affects downstream exploration and extraction decisions. However, due to geological complexity, evolving geological understanding in multi-year projects, and time pressure, inconsistencies during drill core logging can arise. In order to cope with these challenges, machine learning (ML) has been proposed to assist geologists with a consistent basis for informed decisions in a timely manner, while also offering a potential approach for gaining deeper geological insights from drilling data. However, ML model outputs are usually deterministic and unjustified. Thus, the objective of this thesis is to contribute to laying the methodological foundations for a future decision support tool, grounded in geological knowledge and best practices in geodata science, that can handle uncertainty and justify its decisions to the exploration geologist.
To fulfil this objective, the Rävliden North Zn-Pb-Ag-Cu volcanogenic massive sulphide (VMS) deposit in the Palaeoproterozoic Skellefte district, Sweden, was chosen as a case study location, where volcanic facies and alteration patterns of the host rocks were characterised to formulate a geological knowledge base. This knowledge was then used to inform training of ML models. Importantly, methods for quantifying model uncertainty were assessed, as well as methods from explainable artificial intelligence (XAI) to assess their use for justifying model outputs.
The Rävliden North VMS deposit is hosted by tremolite-rich calc-silicate rocks, chlorite and sericite schists, and graphitic phyllite in the contact between the 1.89–1.88 Ga Skellefte group and overlying 1.89–1.87 Ga Vargfors group. The precursors are a heterogeneous succession of volcaniclastic and coherent rhyolites, dacites and andesites. The VMS deposit formed by replacement-style mineralisation in carbonate-rich porous volcaniclastic facies beneath an impermeable barrier of organic rich mudstone. The carbonate-rich rocks were a result of early calcitic alteration associated with mass gains of CaO. During mineralisation, ore proximal chloritic alteration was associated with mass gains in MgO and FeO, alongside mass losses in K2O and Na2O. To quantify the uncertainty of mass change calculations a method called propagated mass change error (PROMACE) was developed. The propagated errors for Na2O, MgO, K2O, CaO and FeO were on average ±1.1 wt%. For Si2O they were on average ±11.1 wt%. Notably, it is found that large mass gains are associated with larger errors than mass losses of the same magnitude.
Random forest (RF), support vector machine (SVM) and multilayer perceptron (MLP) models were applied to X-ray fluorescence drill core scan data to classify rock types. Drill core scans from 15 exploration holes were used as training data and three as test data. It was found that intra-site generalisability was low for all models, where RF achieved the highest mean F1 test score of 0.476 ± 0.034. As for intra-dataset generalisability, model performance was higher, where SVM yielded the highest average F1 training score of 0.863 ± 0.015. Importantly, for representing intra-site generalisability in training results, stratified group K-fold cross-validation is recommended.
Different variants of pre-processing were explored, and it was found that SVM benefits from implementation of a centred log-ratio transform (mean training F1 = 0.863 ± 0.015 with and F1 = 0.720 ± 0.016 without). Notably, it was found that model-based imputation of missing values and data augmentation with a synthetic minority oversampling technique made little difference for any of the ML models.
A more detailed study of MLP models was conducted in which performance on precursor, alteration type and rock type classification was assessed. F1 training scores for precursor classification was 0.599 ± 0.223, whereas performance on alteration and rock type was lower with F1 scores of 0.431 ± 0.038 and F1 = 0.401 ± 0.081 respectively. Model uncertainty was quantified with Monte Carlo dropout (MCD) that indicated higher uncertainty for alteration and rock type classification than for precursor classification suggesting that classification tasks that take alteration into account are more challenging.
SHapley Additive exPlanations (SHAP) were used to justify MLP predictions. This revealed that the model relies on meaningful geological features, such as Zr and Ti to distinguish precursors or Ca to identify calcitic alteration. However, in some cases, indirect feature to target relationships were learned. For such classes it was also found that model performance was generally lower.
The results of this thesis show how geological knowledge can be structured and applied with best practice procedures in model training and pre-processing. Additionally, uncertainty estimates with PROMACE for mass change estimates and MCD for rock classification, together with SHAP for model interpretability, can provide geologists with transparent and justifiable outputs.
Luleå: Luleå University of Technology, 2026.