89101112131411 of 15
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Technical Language Supervision and Agentic AI for Condition Monitoring
Luleå University of Technology, Department of Computer Science, Electrical and Space Engineering, Embedded Internet Systems Lab.ORCID iD: 0000-0002-0188-9337
2025 (English)Doctoral thesis, comprehensive summary (Other academic)Alternative title
Språkteknologi och agenter för AI-assisterad diagnostik av maskinskador (Swedish)
Abstract [en]

Recent advances in reasoning artificial intelligence (AI) agents powered by language models (LMs) and custom tools open new opportunities for AI-assisted condition monitoring (CM) involving unlabelled but annotated, complex industrial data. Technical language annotations written by domain experts include unstructured information regarding machine condition, maintenance actions, and tacit knowledge. This thesis investigates how LMs and agents can improve human-machine interaction and facilitate training of AI models on CM industry data using annotations as surrogate ground-truth labels. The main contribution is the introduction of technical language supervision to address the long-standing gap between idealised labelled lab datasets and complex unlabelled field data, and the development of AI agents for condition monitoring including a multimodal vector store with domain specific retrieval and generation modules.

The main contributions are: (1) the introduction and implementation of technical language supervision (TLS) for CM inspired by contrastive learning on images with natural language captions, including a literature survey and implementations of zero-shot fault diagnosis on unlabelled industry data; (2) the creation of a method to improve technical language processing by augmenting out-of-vocabulary technical words with natural language descriptions and evaluating semantic similarities of technical language representations; (3) the development of a human-centric method for language-based fault classification using visualisation and clustering; (4) the development of an open source chatbot agent as well as a published annotated industry dataset; (5) an investigation of specific CM data processing challenges, such as different data modalities, time-delays between annotations and signal properties, component-specific noise and feature levels, and nonlinear fault development over time in different data sources.

The results from the studies indicate that annotations are a viable substitute for labels when processed with regard to the technical language therein, and integrating LM powered agents on annotated CM data facilitates answering real industry queries more efficiently than current systems. By augmenting out-of-vocabulary technical words with natural language descriptions, LM performance is improved, as demonstrated in initial work on classifying technical fault descriptions with the BERT LM improving accuracy from 88.3 to 94.2%. In the industrial datasets analysed, gathered from Kraftliner paper machines over four years, the most common faults and alarms seen were cable and sensor faults, while bearing faults were the most common causes of follow up analysis and maintenance stops. Clustering CM data based on both signal and language properties indicates that cable and sensor faults can be differentiated from bearing faults with an F1-score of 92.6%, which suggests that the high number of false and redundant alarms that require follow-up analysis can be reduced. Finally, the usefulness of the developed agents was evaluated in typical CM workflows based on response usefulness and truthfulness, and the results indicate that AI agents with custom tools are capable of generating historic insight and meaningful fault descriptions. In particular, using a novel multimodal CM retrieval augmented generation approach with a custom CM vector store, the false alarm rate for sensor and cable faults is shown to be lowered from over 80% in current work flows, to under 30% with the proposed system.

Place, publisher, year, edition, pages
Luleå: Luleå University of Technology, 2025.
Series
Doctoral thesis / Luleå University of Technology 1 jan 1997 → …, ISSN 1402-1544
Keywords [en]
Natural language processing, Technical language processing, technical language supervision, natural language supervision, intelligent fault diagnosis, condition monitoring, predictive maintenance, prognstics and health management, large language models, agentic AI, retrieval augmented generation, contrastive learning, weak supervision, self-supervision
National Category
Natural Language Processing Computer Sciences
Research subject
Machine Learning
Identifiers
URN: urn:nbn:se:ltu:diva-112326ISBN: 978-91-8048-811-2 (print)ISBN: 978-91-8048-812-9 (electronic)OAI: oai:DiVA.org:ltu-112326DiVA, id: diva2:1950819
Public defence
2025-06-03, C305, Luleå University of Technology, Luleå, 10:00 (English)
Opponent
Supervisors
Funder
Vinnova, 364160Available from: 2025-04-09 Created: 2025-04-09 Last updated: 2025-04-10Bibliographically approved
List of papers
1. Processing of Condition Monitoring Annotations with BERT and Technical Language Substitution: A Case Study
Open this publication in new window or tab >>Processing of Condition Monitoring Annotations with BERT and Technical Language Substitution: A Case Study
Show others...
2022 (English)In: Proceedings of the 7th European Conference of the Prognostics and Health Management Society 2022 / [ed] Phuc Do, Gabriel Michau, Cordelia Ezhilarasu, PHM Society , 2022, Vol. 7, p. 306-314Conference paper, Published paper (Refereed)
Abstract [en]

Annotations in condition monitoring systems contain information regarding asset history and fault characteristics in the form of unstructured text that could, if unlocked, be used for intelligent fault diagnosis. However, processing these annotations with pre-trained natural language models such as BERT is problematic due to out-of-vocabulary (OOV) technical terms, resulting in inaccurate language embeddings. Here we investigate the effect of OOV technical terms on BERT and SentenceBERT embeddings by substituting technical terms with natural language descriptions. The embeddings were computed for each annotation in a pre-processed corpus, with and without substitution. The K-Means clustering score was calculated on sentence embeddings, and a Long Short-Term Memory (LSTM) network was trained on word embeddings with the objective to recreate the output from a keyword-based annotation classifier. The K-Means score for SentenceBERT annotation embeddings improved by 40% at seven clusters by technical language substitution, and the labelling capacityof the BERT-LSTM model was improved from 88.3 to 94.2%. These results indicate that the substitution of OOV technical terms can improve the representation accuracy of the embeddings of the pre-trained BERT and SentenceBERT models, and that pre-trained language models can be used to process technical language.

Place, publisher, year, edition, pages
PHM Society, 2022
Series
PHM Society European Conference ; 1
Keywords
Technical Language Processing, Natural Language Processing, Condition Monitoring, Intelligent Fault Diagnosis
National Category
Natural Language Processing
Research subject
Machine Learning; Cyber-Physical Systems
Identifiers
urn:nbn:se:ltu:diva-95407 (URN)10.36001/phme.2022.v7i1.3356 (DOI)978-1-936263-36-3 (ISBN)
Conference
7th European Conference of the Prognostics and Health Management Society 2022 (PHME22), July 6-8 2022, Turin, Italy
Projects
KnowIT FAST
Note

Funder: Process industrial IT and Automation (PiIA), (2019-02533)

Available from: 2023-01-27 Created: 2023-01-27 Last updated: 2025-04-09Bibliographically approved
2. Technical Language Supervision for Intelligent Fault Diagnosis in Process Industry
Open this publication in new window or tab >>Technical Language Supervision for Intelligent Fault Diagnosis in Process Industry
Show others...
2022 (English)In: International Journal of Prognostics and Health Management, E-ISSN 2153-2648, Vol. 13, no 2Article in journal (Refereed) Published
Abstract [en]

In the process industry, condition monitoring systems with automated fault diagnosis methods assist human experts and thereby improve maintenance efficiency, process sustainabil-ity, and workplace safety. Improving the automated fault diagnosis methods using data and machine learning-based models is a central aspect of intelligent fault diagnosis (IFD). A major challenge in IFD is to develop realistic datasets with accurate labels needed to train and validate models, and to transfer models trained with labeled lab data to heterogeneous process industry environments. However, fault descriptions and work-orders written by domain experts are increasingly digi-tised in modern condition monitoring systems, for example in the context of rotating equipment monitoring. Thus, domain-specific knowledge about fault characteristics and severities exists as technical language annotations in industrial datasets. Furthermore, recent advances in natural language processing enable weakly supervised model optimisation using natural language annotations, most notably in the form of natural language supervision (NLS). This creates a timely opportu-nity to develop technical language supervision (TLS) solu-tions for IFD systems grounded in industrial data, for example as a complement to pre-training with lab data to address problems like overfitting and inaccurate out-of-sample gen-eralisation. We surveyed the literature and identify a con-siderable improvement in the maturity of NLS over the last two years, facilitating applications beyond natural language; a rapid development of weak supervision methods; and transfer learning as a current trend in IFD which can benefit from these developments. Finally we describe a general framework for TLS and implement a TLS case study based on Sentence-BERT and contrastive learning based zero-shot inference on annotated industry data.

Place, publisher, year, edition, pages
Prognostics and Health Management Society, 2022
Keywords
Intelligent Fault Diagnosis, Natural Language Supervision, Technical Language Processing, Condition Monitoring, Technical Language Supervision, Natural Language Processing
National Category
Computer Systems
Research subject
Machine Learning; Cyber-Physical Systems
Identifiers
urn:nbn:se:ltu:diva-93815 (URN)10.36001/ijphm.2022.v13i2.3137 (DOI)000879222400001 ()2-s2.0-85140639510 (Scopus ID)
Funder
Swedish Energy AgencyVinnovaSwedish Research Council Formas
Note

Validerad;2022;Nivå 2;2022-11-09 (sofila);

Funder: Strategic innovation programProcess industrial IT and Automation (PiIA) (grant no. 2019-02533)

Available from: 2022-11-09 Created: 2022-11-09 Last updated: 2025-04-09Bibliographically approved
3. Labelling of Annotated Condition Monitoring Data Through Technical Language Processing
Open this publication in new window or tab >>Labelling of Annotated Condition Monitoring Data Through Technical Language Processing
Show others...
2023 (English)In: Proceedings of the Annual Conference of the PHM Society 2023 / [ed] Chetan S. Kulkarni; Indranil Roychoudhury, The Prognostics and Health Management Society , 2023Conference paper, Published paper (Refereed)
Abstract [en]

We propose a novel approach to facilitate supervised fault diagnosis on unlabelled but annotated industry datasets using human-centric technical language processing and weak supervision. Fault diagnosis through Condition Monitoring (CM) is vital for high safety and resource efficiency in the green transition and digital transformation of the process industry. Learning-based Intelligent Fault Diagnosis (IFD) methods are required to automate maintenance decisions and improve decision support for analysts. A major challenge is the lack of labelled industry datasets, limiting supervised IFD research to lab datasets. However, features learned from lab environments generalise poorly to field environments due to different signal distributions, artificial induction or acceleration of lab faults, and lab set-up properties such as average frequency profiles affecting learned features. In this study, we investigate how the unstructured free text fault annotations and maintenance work orders that are present in many industrial CM systems can be used for IFD through technical language processing, based on recent advances in natural language supervision. We introduce two distinct pipelines, one based on contrastive pre-training on large datasets, and one based on a small-data human-centric approach with unsupervised clustering methods. Finally, we showcase one example of the small-data fault classification implementation on a CM industry dataset with a SentenceBERT language model, kMeans clustering, and conventional signal processing methods. Fault class imbalance and time-shift uncertainty is overcome with weak supervision through aggregates of features, and human-centric clustering is used to integrate technical knowledge with the annotation-based fault classes. We show that our model can separate cable and sensor fault recordings from bearing-related fault recordings with an F1-score of 93. To our knowledge, this is the first system to classify faults in field industry CM data based only on associated unstructured fault annotations.

Place, publisher, year, edition, pages
The Prognostics and Health Management Society, 2023
Series
Annual Conference of the PHM Society (PHM), ISSN 2325-0178
Keywords
Intelligent Fault Diagnosis, Technical Language Processing, Natural Language Processing, Condition Monitoring, Technical Language Supervision, Natural Language Supervision, Prognostics and Health Management, Industry Data
National Category
Natural Language Processing
Research subject
Machine Learning; Cyber-Physical Systems
Identifiers
urn:nbn:se:ltu:diva-95406 (URN)10.36001/phmconf.2023.v15i1.3507 (DOI)2-s2.0-85178380051 (Scopus ID)
Conference
15th Annual Conference of the Prognostics and health Management Society, Salt Lake City, Utah, USA, October 28 - November 2, 2023
Projects
KnowIT FAST
Funder
VinnovaSwedish Research Council FormasSwedish Energy Agency
Note

Funder: Process industrial IT and Automation(PiIA) (2019-02533);

Full text license: CC BY;

This paper has previously appeared as a manuscript in a thesis;

ISBN for host publication: 978-1-936263-29-5

Available from: 2023-01-27 Created: 2023-01-27 Last updated: 2025-04-09Bibliographically approved
4. Towards Agentic Predictive Maintenance through Multimodal Industrial Retrieval Augmented Generation
Open this publication in new window or tab >>Towards Agentic Predictive Maintenance through Multimodal Industrial Retrieval Augmented Generation
Show others...
(English)Manuscript (preprint) (Other academic)
National Category
Natural Language Processing Artificial Intelligence Computer Sciences Human Computer Interaction
Research subject
Machine Learning
Identifiers
urn:nbn:se:ltu:diva-112324 (URN)
Available from: 2025-04-09 Created: 2025-04-09 Last updated: 2025-04-09Bibliographically approved
5. Dataset with condition monitoring vibration data annotated with technical language, from paper machine industries in northern Sweden
Open this publication in new window or tab >>Dataset with condition monitoring vibration data annotated with technical language, from paper machine industries in northern Sweden
2023 (English)Data set, Primary data
Alternative title[sv]
Dataset med tillståndsövervakningsvibrationsdata annoterat med tekniskt språk, från pappersmaskinsindustri i norra Sverige
Abstract [en]

Labelled industry datasets are one of the most valuable assets in prognostics and health management (PHM) research. However, creating labelled industry datasets is both difficult and expensive, making publicly available industry datasets rare at best, in particular labelled datasets.Recent studies have showcased that industry annotations can be used to train artificial intelligence models directly on industry data ( https://doi.org/10.36001/ijphm.2022.v13i2.3137 , https://doi.org/10.36001/phmconf.2023.v15i1.3507 ), but while many industry datasets also contain text descriptions or logbooks in the form of annotations and maintenance work orders, few, if any, are publicly available.Therefore, we release a dataset consisting with annotated signal data from two large (80mx10mx10m) paper machines, from a Kraftliner production company in northern Sweden. The data consists of 21 090 pairs of signals and annotations from one year of production. The annotations are written in Swedish, by on-site Swedish experts, and the signals consist primarily of accelerometer vibration measurements from the two machines.The dataset is structured as a Pandas dataframe and serialized as a pickle (.pkl) file and a JSON (.json) file. The first column (‘id’) is the ID of the samples; the second column (‘Spectra’) are the fast Fourier transform and envelope-transformed vibration signals; the third column (‘Notes’) are the associated annotations, mapped so that each annotation is associated with all signals from ten days before the annotation date, up to the annotation date; and finally the fourth column (‘Embeddings’) are pre-computed embeddings using Swedish SentenceBERT. Each row corresponds to a vibration measurement sample, though there is no distinction in this data between which sensor or machine part each measurement is from.

Abstract [sv]

Industridataset med labels är bland de mest värdefulla tillgångarna att tillgå inom prognostik- och tillståndsövervaknings-forskning. Att tillverka labellade dataset är både svårt och dyrt, vilket medför att allmänt tillgängliga industridataset är sällsynta, särskilt de med labels. Studier har dock visat att industriannoteringar kan användas för att träna AI-modeller direkt på industridata ( https://doi.org/10.36001/ijphm.2022.v13i2.3137 , https://doi.org/10.36001/phmconf.2023.v15i1.3507 ), men trots att många industridataset innehåller de nödvändiga texterna så är få, om ens några, sådana dataset allmänt tillgängliga.Därför ger vi ut ett dataset innehållandes annoterade signaldata från två stora (80x10x10m) pappersmaskiner från ett pappersbruk i norra Sverige. Datan består av 21 090 par av signaler och annoteringar från ett års produktion. Annoteringarna är skrivna på svenska av experter på plats, och signalerna består huvudsakligen av accelerometervibrationsmätningar från de två maskinerna.Datasetet består av ett års annoterade vibrationsensormätningar från två pappersmaskiner, strukturerade som en Pandas dataframe och serialiserade som en pickle-fil (.pkl) samt en JSON-fil (.json). Den första kolumnen (’id’) är ID per sample; den andra kolumnen (’Spectra’) är fast-Fourier-transformerade och envelope-transformerade vibrationssignaler; den tredje kolumnen (’Notes’) är de tillhörande annoteringarna, kartlagda så att varje annotering är kopplad till alla signaler från tio dagar före annoteringsdatumet upp till annoteringsdatumet; och slutligen den fjärde kolumnen (’Embeddings’) är förberäknade text-representationer från Swedish SentenceBERT. Varje rad motsvarar ett vibrationsmätningsprov, även om det inte finns någon åtskillnad i denna data mellan vilken sensor och maskindel varje mätning kommer från.

Place, publisher, year
Svensk nationell datatjänst (SND), 2023
Keywords
Paper industry, Condition monitoring, Language technology, Signal processing, Fault detection, Natural language processing, Technical language processing, Technical language supervision, Natural language supervision, Fault diagnosis, Intelligent fault diagnosis, Prognostics and health management
National Category
Natural Language Processing Computer Sciences
Research subject
Machine Learning; Cyber-Physical Systems
Identifiers
urn:nbn:se:ltu:diva-103146 (URN)10.5878/z34p-qj52 (DOI)
Funder
Vinnova, 2019-02533
Note

CC BY-NC 4.0 

Available from: 2023-12-01 Created: 2023-12-01 Last updated: 2025-04-09Bibliographically approved
6. Integration of Large Language Models into Control Systems for Shared Appliances
Open this publication in new window or tab >>Integration of Large Language Models into Control Systems for Shared Appliances
Show others...
2024 (English)In: AMBIENT 2024 : The Fourteenth International Conference on Ambient Computing, Applications, Services and Technologies / [ed] Hiroshi Tanaka, Lorena Parra Boronoat, International Academy, Research and Industry Association (IARIA), 2024, p. 6-11Conference paper, Published paper (Refereed)
Place, publisher, year, edition, pages
International Academy, Research and Industry Association (IARIA), 2024
Series
AMBIENT, ISSN 2326-9324
National Category
Computer Sciences
Research subject
Machine Learning
Identifiers
urn:nbn:se:ltu:diva-112323 (URN)978-1-68558-185-5 (ISBN)
Conference
AMBIENT 2024 : The Fourteenth International Conference on Ambient Computing, Applications, Services and Technologies AMBIENT 2024, Venice, Italy, September 29 - October 3, 2024
Available from: 2025-04-09 Created: 2025-04-09 Last updated: 2025-04-10Bibliographically approved

Open Access in DiVA

No full text in DiVA

Authority records

Löwenmark, Karl

Search in DiVA

By author/editor
Löwenmark, Karl
By organisation
Embedded Internet Systems Lab
Natural Language ProcessingComputer Sciences

Search outside of DiVA

GoogleGoogle Scholar

isbn
urn-nbn

Altmetric score

isbn
urn-nbn
Total: 257 hits
89101112131411 of 15
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf