Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
BOX3D: Lightweight Camera-LiDAR Fusion for 3D Object Detection and Localization
Luleå University of Technology, Department of Computer Science, Electrical and Space Engineering, Signals and Systems.ORCID iD: 0000-0001-8132-4178
Luleå University of Technology, Department of Computer Science, Electrical and Space Engineering, Signals and Systems.ORCID iD: 0000-0002-0108-6286
Luleå University of Technology, Department of Computer Science, Electrical and Space Engineering, Computer Science.ORCID iD: 0000-0002-7921-8568
Luleå University of Technology, Department of Computer Science, Electrical and Space Engineering, Signals and Systems.ORCID iD: 0000-0001-8870-6718
Show others and affiliations
2024 (English)In: 2024 32nd Mediterranean Conference on Control and Automation (MED), IEEE, 2024Conference paper, Published paper (Refereed)
Abstract [en]

Object detection and global localization play a crucial role in robotics, spanning across a great spectrum of applications from autonomous cars to multi-layered 3D Scene Graphs for semantic scene understanding. This article proposes BOX3D, a novel multi-modal and lightweight scheme for localizing objects of interest by fusing the information from RGB camera and 3D LiDAR. BOX3D is structured around a three-layered architecture, building up from the local perception of the incoming sequential sensor data to the global perception refinement that covers for outliers and the general consistency of each object's observation. More specifically, the first layer handles the low-level fusion of camera and LiDAR data for initial 3D bounding box extraction. The second layer converts each LiDAR's scan 3D bounding boxes to the world coordinate frame and applies a spatial pairing and merging mechanism to maintain the uniqueness of objects observed from different viewpoints. Finally, BOX3D integrates the third layer that supervises the consistency of the results on the global map iteratively, using a point-to-voxel comparison for identifying all points in the global map that belong to the object. Benchmarking results of the proposed novel architecture are showcased in multiple experimental trials on public state-of-the-art large-scale dataset of urban environments.

Place, publisher, year, edition, pages
IEEE, 2024.
National Category
Computer graphics and computer vision
Research subject
Robotics and Artificial Intelligence
Identifiers
URN: urn:nbn:se:ltu:diva-105327DOI: 10.1109/MED61351.2024.10566236Scopus ID: 2-s2.0-85198221796OAI: oai:DiVA.org:ltu-105327DiVA, id: diva2:1855835
Conference
The 32nd Mediterranean Conference on Control and Automation (MED2024), Chania, Crete, Greece, June 11-14, 2024
Funder
Swedish Energy AgencyEU, Horizon Europe, 101091462 m4mining
Note

Funder: SP14 ‘Autonomous Drones for Underground Mining Operations’;

ISBN for host publication: 979-8-3503-9545-7; 979-8-3503-9544-0

Available from: 2024-05-03 Created: 2024-05-03 Last updated: 2025-02-07Bibliographically approved
In thesis
1. Towards human-inspired perception in robotic systems by leveraging computational methods for semantic understanding
Open this publication in new window or tab >>Towards human-inspired perception in robotic systems by leveraging computational methods for semantic understanding
2024 (English)Licentiate thesis, comprehensive summary (Other academic)
Abstract [en]

This thesis presents a recollection of developments and results towards the research of human-like semantic understanding of the environment for robotics systems. Achieving a level of understanding in robots comparable to humans has proven to be a significant challenge in robotics, although modern sensors like stereo cameras and neuromorphic cameras enable robots to perceive the world in a manner akin to human senses, extracting and interpreting semantic information proves to be significantly inefficient by comparison. This thesis explores different aspects of the machine vision field to level computational methods in order to address real-life challenges for the task of semantic scene understanding in both everyday environments as well as challenging unstructured environments. 

The works included in this thesis present key contributions towards three main research directions. The first direction establishes novel perception algorithms for object detection and localization, aimed at real-life deployments in onboard mobile devices for %perceptually degraded unstructured environments. Along this direction, the contributions focus on the development of robust detection pipelines as well as fusion strategies for different sensor modalities including stereo cameras, neuromorphic cameras, and LiDARs. 

The second research direction establishes a computational method for levering semantic information into meaningful knowledge representations to enable human-inspired behaviors for the task of traversability estimation for reactive navigation. The contribution presents a novel decay function for traversability soft image generation based on exponential decay, by fusing semantic and geometric information to obtain density images that represent the pixel-wise traversability of the scene. Additionally, it presents a novel Encoder-Decoder lightweight network architecture for coarse semantic segmentation of terrain, integrated with a memory module based on a dynamic certainty filter.

Finally, the third research direction establishes the novel concept of Belief Scene Graphs, which are utility-driven extensions of partial 3D scene graphs, that enable efficient high-level task planning with partial information.The research thus presents an approach to meaningfully incorporate unobserved objects as nodes into an incomplete 3D scene graph using the proposed method Computation of Expectation based on Correlation Information (CECI), to reasonably approximate the probability distribution of the scene by learning histograms from available training data. Extensive simulations and real-life experimental setups support the results and assumptions presented in this work.

Place, publisher, year, edition, pages
Luleå: Luleå University of Technology, 2024
Series
Licentiate thesis / Luleå University of Technology, ISSN 1402-1757
National Category
Computer graphics and computer vision
Research subject
Robotics and Artificial Intelligence
Identifiers
urn:nbn:se:ltu:diva-105329 (URN)978-91-8048-568-5 (ISBN)978-91-8048-569-2 (ISBN)
Presentation
2024-06-17, A117, Luleå University of Technology, Luleå, 09:00 (English)
Opponent
Supervisors
Available from: 2024-05-03 Created: 2024-05-03 Last updated: 2025-02-07Bibliographically approved

Open Access in DiVA

No full text in DiVA

Other links

Publisher's full textScopus

Authority records

Saucedo, Mario Alberto ValdesStathoulopoulos, NikolaosMololoth, VidyaKanellakis, ChristoforosNikolakopoulos, George

Search in DiVA

By author/editor
Saucedo, Mario Alberto ValdesStathoulopoulos, NikolaosMololoth, VidyaKanellakis, ChristoforosNikolakopoulos, George
By organisation
Signals and SystemsComputer Science
Computer graphics and computer vision

Search outside of DiVA

GoogleGoogle Scholar

doi
urn-nbn

Altmetric score

doi
urn-nbn
Total: 221 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf