Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Subword Semantic Hashing for Intent Classification on Small Datasets
Luleå University of Technology, Department of Computer Science, Electrical and Space Engineering, Embedded Internet Systems Lab. (Machine Learning)ORCID iD: 0000-0003-0100-4030
Show others and affiliations
2019 (Estonian)In: 2019 International Joint Conference on Neural Networks (IJCNN), 2019Conference paper, Published paper (Refereed)
Abstract [en]

n this paper, we introduce the use of Semantic Hashing as embedding for the task of Intent Classification and achieve state-of-the-art performance on three frequently used benchmarks. Intent Classification on a small dataset is a challenging task for data-hungry state-of-the-art Deep Learning based systems. Semantic Hashing is an attempt to overcome such a challenge and learn robust text classification. Current word embedding based methods [11], [13], [14] are dependent on vocabularies. One of the major drawbacks of such methods is out-of-vocabulary terms, especially when having small training datasets and using a wider vocabulary. This is the case in Intent Classification for chatbots, where typically small datasets are extracted from internet communication. Two problems arise with the use of internet communication. First, such datasets miss a lot of terms in the vocabulary to use word embeddings efficiently. Second, users frequently make spelling errors. Typically, the models for intent classification are not trained with spelling errors and it is difficult to think about ways in which users will make mistakes. Models depending on a word vocabulary will always face such issues. An ideal classifier should handle spelling errors inherently. With Semantic Hashing, we overcome these challenges and achieve state-of-the-art results on three datasets: Chatbot, Ask Ubuntu, and Web Applications [3]. Our benchmarks are available online.

Place, publisher, year, edition, pages
2019.
Keywords [en]
Internet, learning (artificial intelligence), natural language processing, pattern classification, semantic networks, text analysis, vocabulary, word processing, intent classification, robust text classification, vocabulary, spelling errors, subword semantic hashing, deep learning based systems, Internet communication, ideal classifier, Natural Language Processing, Intent Classification, Chatbots, Semantic Hashing, Machine Learning, State-of-the-art
National Category
Language Technology (Computational Linguistics)
Research subject
Machine Learning
Identifiers
URN: urn:nbn:se:ltu:diva-76841DOI: 10.1109/IJCNN.2019.8852420ISBN: 978-1-7281-1985-4 (electronic)OAI: oai:DiVA.org:ltu-76841DiVA, id: diva2:1372648
Conference
International Joint Conference on Neural Networks (IJCNN)
Available from: 2019-11-25 Created: 2019-11-25 Last updated: 2019-12-04

Open Access in DiVA

No full text in DiVA

Other links

Publisher's full texthttps://ieeexplore.ieee.org/abstract/document/8852420

Authority records BETA

Alonso, PedroKovács, G

Search in DiVA

By author/editor
Grund Pihlgren, GustavAlonso, PedroKovács, GLiwicki, Marcus
By organisation
Embedded Internet Systems Lab
Language Technology (Computational Linguistics)

Search outside of DiVA

GoogleGoogle Scholar

doi
isbn
urn-nbn

Altmetric score

doi
isbn
urn-nbn
Total: 17 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf