Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Dataset with condition monitoring vibration data annotated with technical language, from paper machine industries in northern Sweden
Luleå University of Technology, Department of Computer Science, Electrical and Space Engineering, Embedded Internet Systems Lab.ORCID iD: 0000-0002-0188-9337
Luleå University of Technology, Department of Computer Science, Electrical and Space Engineering, Embedded Internet Systems Lab.ORCID iD: 0000-0001-5662-825x
Luleå University of Technology, Department of Computer Science, Electrical and Space Engineering, Embedded Internet Systems Lab.ORCID iD: 0000-0003-4029-6574
Svenska Kullagerfabriken.
Responsible organisation
2023 (English)Data set, Primary dataAlternative title
Dataset med tillståndsövervakningsvibrationsdata annoterat med tekniskt språk, från pappersmaskinsindustri i norra Sverige (Swedish)
Physical description [en]

Vibration data collected through accelerometers (SKF IMx-system with CMSS sensors)

Physical description [sv]

Vibrationsdata insamlad med accelerometrar (SKF IMx-system med CMSS-sensorer)

Abstract [en]

Labelled industry datasets are one of the most valuable assets in prognostics and health management (PHM) research. However, creating labelled industry datasets is both difficult and expensive, making publicly available industry datasets rare at best, in particular labelled datasets.Recent studies have showcased that industry annotations can be used to train artificial intelligence models directly on industry data ( https://doi.org/10.36001/ijphm.2022.v13i2.3137 , https://doi.org/10.36001/phmconf.2023.v15i1.3507 ), but while many industry datasets also contain text descriptions or logbooks in the form of annotations and maintenance work orders, few, if any, are publicly available.Therefore, we release a dataset consisting with annotated signal data from two large (80mx10mx10m) paper machines, from a Kraftliner production company in northern Sweden. The data consists of 21 090 pairs of signals and annotations from one year of production. The annotations are written in Swedish, by on-site Swedish experts, and the signals consist primarily of accelerometer vibration measurements from the two machines.The dataset is structured as a Pandas dataframe and serialized as a pickle (.pkl) file and a JSON (.json) file. The first column (‘id’) is the ID of the samples; the second column (‘Spectra’) are the fast Fourier transform and envelope-transformed vibration signals; the third column (‘Notes’) are the associated annotations, mapped so that each annotation is associated with all signals from ten days before the annotation date, up to the annotation date; and finally the fourth column (‘Embeddings’) are pre-computed embeddings using Swedish SentenceBERT. Each row corresponds to a vibration measurement sample, though there is no distinction in this data between which sensor or machine part each measurement is from.

Abstract [sv]

Industridataset med labels är bland de mest värdefulla tillgångarna att tillgå inom prognostik- och tillståndsövervaknings-forskning. Att tillverka labellade dataset är både svårt och dyrt, vilket medför att allmänt tillgängliga industridataset är sällsynta, särskilt de med labels. Studier har dock visat att industriannoteringar kan användas för att träna AI-modeller direkt på industridata ( https://doi.org/10.36001/ijphm.2022.v13i2.3137 , https://doi.org/10.36001/phmconf.2023.v15i1.3507 ), men trots att många industridataset innehåller de nödvändiga texterna så är få, om ens några, sådana dataset allmänt tillgängliga.Därför ger vi ut ett dataset innehållandes annoterade signaldata från två stora (80x10x10m) pappersmaskiner från ett pappersbruk i norra Sverige. Datan består av 21 090 par av signaler och annoteringar från ett års produktion. Annoteringarna är skrivna på svenska av experter på plats, och signalerna består huvudsakligen av accelerometervibrationsmätningar från de två maskinerna.Datasetet består av ett års annoterade vibrationsensormätningar från två pappersmaskiner, strukturerade som en Pandas dataframe och serialiserade som en pickle-fil (.pkl) samt en JSON-fil (.json). Den första kolumnen (’id’) är ID per sample; den andra kolumnen (’Spectra’) är fast-Fourier-transformerade och envelope-transformerade vibrationssignaler; den tredje kolumnen (’Notes’) är de tillhörande annoteringarna, kartlagda så att varje annotering är kopplad till alla signaler från tio dagar före annoteringsdatumet upp till annoteringsdatumet; och slutligen den fjärde kolumnen (’Embeddings’) är förberäknade text-representationer från Swedish SentenceBERT. Varje rad motsvarar ett vibrationsmätningsprov, även om det inte finns någon åtskillnad i denna data mellan vilken sensor och maskindel varje mätning kommer från.

Place, publisher, year
Svensk nationell datatjänst (SND) , 2023.
Keywords [en]
Paper industry, Condition monitoring, Language technology, Signal processing, Fault detection, Natural language processing, Technical language processing, Technical language supervision, Natural language supervision, Fault diagnosis, Intelligent fault diagnosis, Prognostics and health management
National Category
Natural Language Processing Computer Sciences
Research subject
Machine Learning; Cyber-Physical Systems
Identifiers
URN: urn:nbn:se:ltu:diva-103146DOI: 10.5878/z34p-qj52OAI: oai:DiVA.org:ltu-103146DiVA, id: diva2:1816144
Funder
Vinnova, 2019-02533
Note

CC BY-NC 4.0 

Available from: 2023-12-01 Created: 2023-12-01 Last updated: 2025-10-21Bibliographically approved
In thesis
1. Technical Language Supervision and AI Agents for Condition Monitoring
Open this publication in new window or tab >>Technical Language Supervision and AI Agents for Condition Monitoring
2025 (English)Doctoral thesis, comprehensive summary (Other academic)
Alternative title[sv]
Språkteknologi och agenter för AI-assisterad diagnostik av maskinskador
Abstract [en]

Recent advances in reasoning artificial intelligence (AI) agents powered by language models (LMs) and custom tools open new opportunities for AI-assisted condition monitoring (CM) involving unlabelled but annotated, complex industrial data. Technical language annotations written by domain experts include unstructured information regarding machine condition, maintenance actions, and tacit knowledge. This thesis investigates how LMs and LM agents can improve human-machine interaction and facilitate training of AI models on CM industry data using annotations as surrogate ground-truth labels. The main contribution is the introduction of technical language supervision (TLS) to address the long-standing gap between idealised labelled lab datasets and complex unlabelled field data, and the development of AI agents for condition monitoring including a multimodal vector store with domain specific retrieval and generation modules.

Specific contributions are: (1) the introduction and implementation of TLS for CM through contrastive learning with annotated sensor data, including a literature survey and implementations of zero-shot fault diagnosis on unlabelled industry data; (2) the creation of a method to improve technical language processing by augmenting out-of-vocabulary technical words with natural language descriptions and evaluating semantic similarities of technical language representations; (3) the development of a human-centric method for language-based fault classification using visualisation and clustering; (4) the development of an open source chatbot agent which facilitates natural language interaction with industry data and models through a custom CM vector store with data-specific retrieval augmented generation, LM analysis of annotations and hierarchy data, and LM assisted CM; (5) the compilation of a publicly available annotated industry dataset; (6) an investigation of specific CM data processing challenges, such as different data modalities, time-delays between annotations and signal properties, component-specific noise and feature levels, and non-linear fault development over time.

The results of the studies indicate that annotations are a viable substitute for labels when processed with regard to the technical language therein, and integrating LM-based agents on annotated CM data facilitates answering queries corresponding to industrial analysis tasks. By augmenting out-of-vocabulary technical words with natural language descriptions, LM performance can be improved, as demonstrated in initial work on classifying technical fault descriptions with the BERT LM improving accuracy from 88.3% to 94.2%, thereby halving the error rate.

In the industrial datasets analysed, gathered from kraftliner paper machines over four years, the most common faults and alarms were cable and sensor faults, while bearing faults were the most common causes of follow up analysis and maintenance stops. Clustering CM data based on both signal and language properties indicates that cable and sensor faults can be differentiated from bearing faults with an F1-score of 92.6%. The usefulness of the developed agents was evaluated in typical CM workflows, and the results indicate that AI agents with custom tools are capable of generating historic insight and meaningful fault descriptions. In particular, using a custom multimodal CM retrieval augmented generation approach with a custom CM vector store, the false alarm rate for sensor and cable faults is shown to be lowered from over 80% in current work flows, to under 30% with the proposed method. This suggests that false and redundant alarms which negatively impact maintenance planning by prompting time-consuming human analysis can be reduced.

The main takeaways of this thesis are that annotations can facilitate the development of AI models on field industry data, and bring meaningful historic insights. This approach has the potential to augment existing CM practices by reducing false alarm prevalence, providing more meaningful alarms, and improving upskilling.

Place, publisher, year, edition, pages
Luleå: Luleå University of Technology, 2025
Series
Doctoral thesis / Luleå University of Technology 1 jan 1997 → …, ISSN 1402-1544
Keywords
Natural language processing, Technical language processing, technical language supervision, natural language supervision, intelligent fault diagnosis, condition monitoring, predictive maintenance, prognstics and health management, large language models, agentic AI, retrieval augmented generation, contrastive learning, weak supervision, self-supervision
National Category
Natural Language Processing Computer Sciences
Research subject
Machine Learning
Identifiers
urn:nbn:se:ltu:diva-112326 (URN)978-91-8048-811-2 (ISBN)978-91-8048-812-9 (ISBN)
Public defence
2025-06-03, C305, Luleå University of Technology, Luleå, 10:00 (English)
Opponent
Supervisors
Funder
Vinnova, 364160
Available from: 2025-04-09 Created: 2025-04-09 Last updated: 2025-10-21Bibliographically approved

Open Access in DiVA

No full text in DiVA

Other links

Publisher's full text

Authority records

Löwenmark, KarlSandin, FredrikLiwicki, Marcus

Search in DiVA

By author/editor
Löwenmark, KarlSandin, FredrikLiwicki, Marcus
By organisation
Embedded Internet Systems Lab
Natural Language ProcessingComputer Sciences
Löwenmark, K., Taal, C., Vurgaft, A., Nivre, J., Liwicki, M. & Sandin, F. (2023). Labelling of Annotated Condition Monitoring Data Through Technical Language Processing. In: Chetan S. Kulkarni; Indranil Roychoudhury (Ed.), Proceedings of the Annual Conference of the PHM Society 2023: . Paper presented at 15th Annual Conference of the Prognostics and health Management Society, Salt Lake City, Utah, USA, October 28 - November 2, 2023. The Prognostics and Health Management SocietyLöwenmark, K., Taal, C., Schnabel, S., Liwicki, M. & Sandin, F. (2022). Technical Language Supervision for Intelligent Fault Diagnosis in Process Industry. International Journal of Prognostics and Health Management, 13(2)Löwenmark, K. (2023). Technical Language Supervision for Intelligent Fault Diagnosis. (Licentiate dissertation). Luleå: Luleå University of TechnologyLöwenmark, K., Taal, C., Nivre, J., Liwicki, M. & Sandin, F. (2022). Processing of Condition Monitoring Annotations with BERT and Technical Language Substitution: A Case Study. In: Phuc Do, Gabriel Michau, Cordelia Ezhilarasu (Ed.), Proceedings of the 7th European Conference of the Prognostics and Health Management Society 2022: . Paper presented at 7th European Conference of the Prognostics and Health Management Society 2022 (PHME22), July 6-8 2022, Turin, Italy (pp. 306-314). PHM Society, 7

Search outside of DiVA

GoogleGoogle Scholar

doi
urn-nbn

Altmetric score

doi
urn-nbn
Total: 573 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf