Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Technical Language Supervision for Intelligent Fault Diagnosis in Process Industry
Luleå University of Technology, Department of Computer Science, Electrical and Space Engineering, Embedded Internet Systems Lab.ORCID iD: 0000-0002-0188-9337
SKF Research & Technology Development, Meidoornkade 14, 3992 AE, Houten, P.O. Box 2350, 3430 DT, Nieuwegein, The Netherlands.
SKF Condition Monitoring Center Luleå AB, 977 75, Luleå, Sweden.
Luleå University of Technology, Department of Computer Science, Electrical and Space Engineering, Embedded Internet Systems Lab.ORCID iD: 0000-0003-4029-6574
Show others and affiliations
2022 (English)In: International Journal of Prognostics and Health Management, E-ISSN 2153-2648, Vol. 13, no 2Article in journal (Refereed) Published
Abstract [en]

In the process industry, condition monitoring systems with automated fault diagnosis methods assist human experts and thereby improve maintenance efficiency, process sustainabil-ity, and workplace safety. Improving the automated fault diagnosis methods using data and machine learning-based models is a central aspect of intelligent fault diagnosis (IFD). A major challenge in IFD is to develop realistic datasets with accurate labels needed to train and validate models, and to transfer models trained with labeled lab data to heterogeneous process industry environments. However, fault descriptions and work-orders written by domain experts are increasingly digi-tised in modern condition monitoring systems, for example in the context of rotating equipment monitoring. Thus, domain-specific knowledge about fault characteristics and severities exists as technical language annotations in industrial datasets. Furthermore, recent advances in natural language processing enable weakly supervised model optimisation using natural language annotations, most notably in the form of natural language supervision (NLS). This creates a timely opportu-nity to develop technical language supervision (TLS) solu-tions for IFD systems grounded in industrial data, for example as a complement to pre-training with lab data to address problems like overfitting and inaccurate out-of-sample gen-eralisation. We surveyed the literature and identify a con-siderable improvement in the maturity of NLS over the last two years, facilitating applications beyond natural language; a rapid development of weak supervision methods; and transfer learning as a current trend in IFD which can benefit from these developments. Finally we describe a general framework for TLS and implement a TLS case study based on Sentence-BERT and contrastive learning based zero-shot inference on annotated industry data.

Place, publisher, year, edition, pages
Prognostics and Health Management Society , 2022. Vol. 13, no 2
Keywords [en]
Intelligent Fault Diagnosis, Natural Language Supervision, Technical Language Processing, Condition Monitoring, Technical Language Supervision, Natural Language Processing
National Category
Computer Systems
Research subject
Machine Learning; Cyber-Physical Systems
Identifiers
URN: urn:nbn:se:ltu:diva-93815DOI: 10.36001/ijphm.2022.v13i2.3137ISI: 000879222400001Scopus ID: 2-s2.0-85140639510OAI: oai:DiVA.org:ltu-93815DiVA, id: diva2:1709631
Funder
Swedish Energy AgencyVinnovaSwedish Research Council Formas
Note

Validerad;2022;Nivå 2;2022-11-09 (sofila);

Funder: Strategic innovation programProcess industrial IT and Automation (PiIA) (grant no. 2019-02533)

Available from: 2022-11-09 Created: 2022-11-09 Last updated: 2025-04-09Bibliographically approved
In thesis
1. Technical Language Supervision for Intelligent Fault Diagnosis
Open this publication in new window or tab >>Technical Language Supervision for Intelligent Fault Diagnosis
2023 (English)Licentiate thesis, comprehensive summary (Other academic)
Alternative title[sv]
Språkteknologi för intelligent diagnostik av maskinskador
Abstract [en]

Condition Monitoring (CM) is widely used in industry to meet sustainability, safety, and equipment efficiency requirements. Intelligent Fault Diagnosis (IFD) research focuses on automating CM data analysis tasks, to detect and prevent machine faults, and provide decision support. IFD enables trained analysts to focus their efforts on advanced tasks such as fault severity estimation and preventive maintenance optimization, instead of performing routine tasks. Industry datasets are rarely labelled, and IFD models are therefore typically trained on labelled data generated in laboratory environments with artificial or accelerated fault development. In the process industry, fault characteristics are often context-dependent and difficult to predict in sufficient detail due to the heterogeneous environment of machine parts. Furthermore, fault development is non-linear and measurements are subject to varying background noise. Thus, IFD models trained on lab data are not expected to transfer well to process industry environments, and require on-site pre-training or fine-tuning to facilitate accurate and advanced fault diagnosis. While ground truth labels are absent in industrial CM datasets, analysts sometimes write annotations of faults and maintenance work orders that describe the fault characteristics and required actions. These annotations deviate from typical natural language due to the technical language used, characterised by a high frequency of technical terms and abbreviations. Recent advances in natural language processing have enabled simultaneous learning from unlabelled pairs of images and captions through Natural Language Supervision (NLS). In this thesis, opportunities to enable weakly supervised IFD using annotated but otherwise unlabelled CM data are investigated. This thesis proposes novel machine learning methods for joint representation learning for IFD directly on annotated CM data. The main contributions are: (1) the introduction and implementation of technical language supervision to merge advances in natural language processing and, including a literature survey; (2) the creation of a method to improve technical languageprocessing by substituting out-of-vocabulary technical words with natural language descriptions, and to evaluate language model performance without explicit labels or downstream tasks; (3) the creation of a method for small-data language-based fault classification using human-centricvisualisation and clustering. Preliminary results for sensor and cable fault detection show an accuracy of over 90%. These results imply a considerable increase in the value of annotated CM datasets through the implementation of IFD models directly on industry data, e.g. for improving the decision support to avoid unplanned stops.

Place, publisher, year, edition, pages
Luleå: Luleå University of Technology, 2023
Series
Licentiate thesis / Luleå University of Technology, ISSN 1402-1757
Keywords
Technical Language Processing, Natural Language Processing, Intelligent Fault Diagnosis, Natural Language Supervision, Condition Monitoring
National Category
Natural Language Processing
Research subject
Machine Learning
Identifiers
urn:nbn:se:ltu:diva-95414 (URN)978-91-8048-254-7 (ISBN)978-91-8048-255-4 (ISBN)
Presentation
2023-03-24, C305, Laboratorievägen 14, Luleå tekniska universitet, Luleå, 09:00 (English)
Opponent
Supervisors
Projects
KnowIT FAST
Funder
Vinnova, 20190253
Available from: 2023-01-30 Created: 2023-01-27 Last updated: 2025-02-07Bibliographically approved
2. Technical Language Supervision and Agentic AI for Condition Monitoring
Open this publication in new window or tab >>Technical Language Supervision and Agentic AI for Condition Monitoring
2025 (English)Doctoral thesis, comprehensive summary (Other academic)
Alternative title[sv]
Språkteknologi och agenter för AI-assisterad diagnostik av maskinskador
Abstract [en]

Recent advances in reasoning artificial intelligence (AI) agents powered by language models (LMs) and custom tools open new opportunities for AI-assisted condition monitoring (CM) involving unlabelled but annotated, complex industrial data. Technical language annotations written by domain experts include unstructured information regarding machine condition, maintenance actions, and tacit knowledge. This thesis investigates how LMs and agents can improve human-machine interaction and facilitate training of AI models on CM industry data using annotations as surrogate ground-truth labels. The main contribution is the introduction of technical language supervision to address the long-standing gap between idealised labelled lab datasets and complex unlabelled field data, and the development of AI agents for condition monitoring including a multimodal vector store with domain specific retrieval and generation modules.

The main contributions are: (1) the introduction and implementation of technical language supervision (TLS) for CM inspired by contrastive learning on images with natural language captions, including a literature survey and implementations of zero-shot fault diagnosis on unlabelled industry data; (2) the creation of a method to improve technical language processing by augmenting out-of-vocabulary technical words with natural language descriptions and evaluating semantic similarities of technical language representations; (3) the development of a human-centric method for language-based fault classification using visualisation and clustering; (4) the development of an open source chatbot agent as well as a published annotated industry dataset; (5) an investigation of specific CM data processing challenges, such as different data modalities, time-delays between annotations and signal properties, component-specific noise and feature levels, and nonlinear fault development over time in different data sources.

The results from the studies indicate that annotations are a viable substitute for labels when processed with regard to the technical language therein, and integrating LM powered agents on annotated CM data facilitates answering real industry queries more efficiently than current systems. By augmenting out-of-vocabulary technical words with natural language descriptions, LM performance is improved, as demonstrated in initial work on classifying technical fault descriptions with the BERT LM improving accuracy from 88.3 to 94.2%. In the industrial datasets analysed, gathered from Kraftliner paper machines over four years, the most common faults and alarms seen were cable and sensor faults, while bearing faults were the most common causes of follow up analysis and maintenance stops. Clustering CM data based on both signal and language properties indicates that cable and sensor faults can be differentiated from bearing faults with an F1-score of 92.6%, which suggests that the high number of false and redundant alarms that require follow-up analysis can be reduced. Finally, the usefulness of the developed agents was evaluated in typical CM workflows based on response usefulness and truthfulness, and the results indicate that AI agents with custom tools are capable of generating historic insight and meaningful fault descriptions. In particular, using a novel multimodal CM retrieval augmented generation approach with a custom CM vector store, the false alarm rate for sensor and cable faults is shown to be lowered from over 80% in current work flows, to under 30% with the proposed system.

Place, publisher, year, edition, pages
Luleå: Luleå University of Technology, 2025
Series
Doctoral thesis / Luleå University of Technology 1 jan 1997 → …, ISSN 1402-1544
Keywords
Natural language processing, Technical language processing, technical language supervision, natural language supervision, intelligent fault diagnosis, condition monitoring, predictive maintenance, prognstics and health management, large language models, agentic AI, retrieval augmented generation, contrastive learning, weak supervision, self-supervision
National Category
Natural Language Processing Computer Sciences
Research subject
Machine Learning
Identifiers
urn:nbn:se:ltu:diva-112326 (URN)978-91-8048-811-2 (ISBN)978-91-8048-812-9 (ISBN)
Public defence
2025-06-03, C305, Luleå University of Technology, Luleå, 10:00 (English)
Opponent
Supervisors
Funder
Vinnova, 364160
Available from: 2025-04-09 Created: 2025-04-09 Last updated: 2025-04-10Bibliographically approved

Open Access in DiVA

No full text in DiVA

Other links

Publisher's full textScopus

Authority records

Löwenmark, KarlLiwicki, MarcusSandin, Fredrik

Search in DiVA

By author/editor
Löwenmark, KarlSchnabel, StephanLiwicki, MarcusSandin, Fredrik
By organisation
Embedded Internet Systems Lab
In the same journal
International Journal of Prognostics and Health Management
Computer Systems
Löwenmark, K., Sandin, F., Liwicki, M. & Schnabel, S. (2023). Dataset with condition monitoring vibration data annotated with technical language, from paper machine industries in northern Sweden. Svensk nationell datatjänst (SND)

Search outside of DiVA

GoogleGoogle Scholar

doi
urn-nbn

Altmetric score

doi
urn-nbn
Total: 190 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf