Open this publication in new window or tab >>2025 (English)Doctoral thesis, comprehensive summary (Other academic)
Språkteknologi och agenter för AI-assisterad diagnostik av maskinskador
Abstract [en]
Recent advances in reasoning artificial intelligence (AI) agents powered by language models (LMs) and custom tools open new opportunities for AI-assisted condition monitoring (CM) involving unlabelled but annotated, complex industrial data. Technical language annotations written by domain experts include unstructured information regarding machine condition, maintenance actions, and tacit knowledge. This thesis investigates how LMs and agents can improve human-machine interaction and facilitate training of AI models on CM industry data using annotations as surrogate ground-truth labels. The main contribution is the introduction of technical language supervision to address the long-standing gap between idealised labelled lab datasets and complex unlabelled field data, and the development of AI agents for condition monitoring including a multimodal vector store with domain specific retrieval and generation modules.
The main contributions are: (1) the introduction and implementation of technical language supervision (TLS) for CM inspired by contrastive learning on images with natural language captions, including a literature survey and implementations of zero-shot fault diagnosis on unlabelled industry data; (2) the creation of a method to improve technical language processing by augmenting out-of-vocabulary technical words with natural language descriptions and evaluating semantic similarities of technical language representations; (3) the development of a human-centric method for language-based fault classification using visualisation and clustering; (4) the development of an open source chatbot agent as well as a published annotated industry dataset; (5) an investigation of specific CM data processing challenges, such as different data modalities, time-delays between annotations and signal properties, component-specific noise and feature levels, and nonlinear fault development over time in different data sources.
The results from the studies indicate that annotations are a viable substitute for labels when processed with regard to the technical language therein, and integrating LM powered agents on annotated CM data facilitates answering real industry queries more efficiently than current systems. By augmenting out-of-vocabulary technical words with natural language descriptions, LM performance is improved, as demonstrated in initial work on classifying technical fault descriptions with the BERT LM improving accuracy from 88.3 to 94.2%. In the industrial datasets analysed, gathered from Kraftliner paper machines over four years, the most common faults and alarms seen were cable and sensor faults, while bearing faults were the most common causes of follow up analysis and maintenance stops. Clustering CM data based on both signal and language properties indicates that cable and sensor faults can be differentiated from bearing faults with an F1-score of 92.6%, which suggests that the high number of false and redundant alarms that require follow-up analysis can be reduced. Finally, the usefulness of the developed agents was evaluated in typical CM workflows based on response usefulness and truthfulness, and the results indicate that AI agents with custom tools are capable of generating historic insight and meaningful fault descriptions. In particular, using a novel multimodal CM retrieval augmented generation approach with a custom CM vector store, the false alarm rate for sensor and cable faults is shown to be lowered from over 80% in current work flows, to under 30% with the proposed system.
Place, publisher, year, edition, pages
Luleå: Luleå University of Technology, 2025
Series
Doctoral thesis / Luleå University of Technology 1 jan 1997 → …, ISSN 1402-1544
Keywords
Natural language processing, Technical language processing, technical language supervision, natural language supervision, intelligent fault diagnosis, condition monitoring, predictive maintenance, prognstics and health management, large language models, agentic AI, retrieval augmented generation, contrastive learning, weak supervision, self-supervision
National Category
Natural Language Processing Computer Sciences
Research subject
Machine Learning
Identifiers
urn:nbn:se:ltu:diva-112326 (URN)978-91-8048-811-2 (ISBN)978-91-8048-812-9 (ISBN)
Public defence
2025-06-03, C305, Luleå University of Technology, Luleå, 10:00 (English)
Opponent
Supervisors
Funder
Vinnova, 364160
2025-04-092025-04-092025-04-10Bibliographically approved