Ändra sökning
RefereraExporteraLänk till posten
Permanent länk

Direktlänk
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annat format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annat språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf
Scene analysis by mid-level attribute learning using 2D LSTM networks and an application to web-image tagging
University of Kaiserslautern, Kaiserslautern, Germany; German Research Center for Artificial Intelligence (DFKI), Kaiserslautern, Germany.
University of Kaiserslautern, Kaiserslautern, Germany.ORCID-id: 0000-0003-4029-6574
University of Kaiserslautern, Kaiserslautern, Germany.
2015 (Engelska)Ingår i: Pattern Recognition Letters, ISSN 0167-8655, E-ISSN 1872-7344, Vol. 63, s. 23-29Artikel i tidskrift (Refereegranskat) Published
Abstract [en]

Abstract This paper describes an approach to scene analysis based on supervised training of 2D Long Short-Term Memory recurrent neural networks (LSTM networks). Unlike previous methods, our approach requires no manual construction of feature hierarchies or incorporation of other prior knowledge. Rather, like deep learning approaches using convolutional networks, our recognition networks are trained directly on raw pixel values. However, in contrast to convolutional neural networks, our approach uses 2D LSTM networks at all levels. Our networks yield per pixel mid-level classifications of input images; since training data for such applications is not available in large numbers, we describe an approach to generating artificial training data, and then evaluate the trained networks on real-world images. Our approach performed significantly better than others methods including Convolutional Neural Networks (ConvNet), yet using two orders of magnitude fewer parameters. We further show the experiment on a recently published dataset, outdoor scene attribute dataset for fair comparisons of scene attribute learning which had significant performance improvement (ca. 21%). Finally, our approach is successfully applied on a real-world application, automatic web-image tagging.

Ort, förlag, år, upplaga, sidor
2015. Vol. 63, s. 23-29
Nyckelord [en]
LSTM, Mid-level attribute learning, Recurrent neural network, Scene analysis, Web-image tagging
Identifikatorer
URN: urn:nbn:se:ltu:diva-72203DOI: 10.1016/j.patrec.2015.06.003OAI: oai:DiVA.org:ltu-72203DiVA, id: diva2:1271595
Tillgänglig från: 2018-12-17 Skapad: 2018-12-17 Senast uppdaterad: 2019-01-29Bibliografiskt granskad

Open Access i DiVA

fulltext(2043 kB)50 nedladdningar
Filinformation
Filnamn FULLTEXT01.pdfFilstorlek 2043 kBChecksumma SHA-512
27e904ef6d072a8344122eec65bca2119a4a0f1632fdf70fff4548ce694b436ecfb62818dd3ab16bdc8cad2ca3c3e915d398a727e548ffc568f4ce750a11c7f7
Typ fulltextMimetyp application/pdf

Övriga länkar

Förlagets fulltexthttp://www.sciencedirect.com/science/article/pii/S0167865515001634

Sök vidare i DiVA

Av författaren/redaktören
Liwicki, Marcus
I samma tidskrift
Pattern Recognition Letters

Sök vidare utanför DiVA

GoogleGoogle Scholar
Totalt: 50 nedladdningar
Antalet nedladdningar är summan av nedladdningar för alla fulltexter. Det kan inkludera t.ex tidigare versioner som nu inte längre är tillgängliga.

doi
urn-nbn

Altmetricpoäng

doi
urn-nbn
Totalt: 44 träffar
RefereraExporteraLänk till posten
Permanent länk

Direktlänk
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annat format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annat språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf