Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Towards human-inspired perception in robotic systems by leveraging computational methods for semantic understanding
Luleå University of Technology, Department of Computer Science, Electrical and Space Engineering, Signals and Systems.ORCID iD: 0000-0001-8132-4178
2024 (English)Licentiate thesis, comprehensive summary (Other academic)
Abstract [en]

This thesis presents a recollection of developments and results towards the research of human-like semantic understanding of the environment for robotics systems. Achieving a level of understanding in robots comparable to humans has proven to be a significant challenge in robotics, although modern sensors like stereo cameras and neuromorphic cameras enable robots to perceive the world in a manner akin to human senses, extracting and interpreting semantic information proves to be significantly inefficient by comparison. This thesis explores different aspects of the machine vision field to level computational methods in order to address real-life challenges for the task of semantic scene understanding in both everyday environments as well as challenging unstructured environments. 

The works included in this thesis present key contributions towards three main research directions. The first direction establishes novel perception algorithms for object detection and localization, aimed at real-life deployments in onboard mobile devices for %perceptually degraded unstructured environments. Along this direction, the contributions focus on the development of robust detection pipelines as well as fusion strategies for different sensor modalities including stereo cameras, neuromorphic cameras, and LiDARs. 

The second research direction establishes a computational method for levering semantic information into meaningful knowledge representations to enable human-inspired behaviors for the task of traversability estimation for reactive navigation. The contribution presents a novel decay function for traversability soft image generation based on exponential decay, by fusing semantic and geometric information to obtain density images that represent the pixel-wise traversability of the scene. Additionally, it presents a novel Encoder-Decoder lightweight network architecture for coarse semantic segmentation of terrain, integrated with a memory module based on a dynamic certainty filter.

Finally, the third research direction establishes the novel concept of Belief Scene Graphs, which are utility-driven extensions of partial 3D scene graphs, that enable efficient high-level task planning with partial information.The research thus presents an approach to meaningfully incorporate unobserved objects as nodes into an incomplete 3D scene graph using the proposed method Computation of Expectation based on Correlation Information (CECI), to reasonably approximate the probability distribution of the scene by learning histograms from available training data. Extensive simulations and real-life experimental setups support the results and assumptions presented in this work.

Place, publisher, year, edition, pages
Luleå: Luleå University of Technology, 2024.
Series
Licentiate thesis / Luleå University of Technology, ISSN 1402-1757
National Category
Computer graphics and computer vision
Research subject
Robotics and Artificial Intelligence
Identifiers
URN: urn:nbn:se:ltu:diva-105329ISBN: 978-91-8048-568-5 (print)ISBN: 978-91-8048-569-2 (print)OAI: oai:DiVA.org:ltu-105329DiVA, id: diva2:1855854
Presentation
2024-06-17, A117, Luleå University of Technology, Luleå, 09:00 (English)
Opponent
Supervisors
Available from: 2024-05-03 Created: 2024-05-03 Last updated: 2025-02-07Bibliographically approved
List of papers
1. Event Camera and LiDAR based Human Tracking for Adverse Lighting Conditions in Subterranean Environments
Open this publication in new window or tab >>Event Camera and LiDAR based Human Tracking for Adverse Lighting Conditions in Subterranean Environments
Show others...
2023 (English)In: 22nd IFAC World Congress: Proceedings / [ed] Hideaki Ishii; Yoshio Ebihara; Jun-ichi Imura; Masaki Yamakita, Elsevier, 2023, Vol. 56, no 2, p. 9257-9262Conference paper, Published paper (Refereed)
Abstract [en]

In this article, we propose a novel LiDAR and event camera fusion modality for subterranean (SubT) environments for fast and precise object and human detection in a wide variety of adverse lighting conditions, such as low or no light, high-contrast zones and in the presence of blinding light sources. In the proposed approach, information from the event camera and LiDAR are fused to localize a human or an object-of-interest in a robot's local frame. The local detection is then transformed into the inertial frame and used to set references for a Nonlinear Model Predictive Controller (NMPC) for reactive tracking of humans or objects in SubT environments. The proposed novel fusion uses intensity filtering and K-means clustering on the LiDAR point cloud and frequency filtering and connectivity clustering on the events induced in an event camera by the returning LiDAR beams. The centroids of the clusters in the event camera and LiDAR streams are then paired to localize reflective markers present on safety vests and signs in SubT environments. The efficacy of the proposed scheme has been experimentally validated in a real SubT environment (a mine) with a Pioneer 3AT mobile robot. The experimental results show real-time performance for human detection and the NMPC-based controller allows for reactive tracking of a human or object of interest, even in complete darkness.

Place, publisher, year, edition, pages
Elsevier, 2023
Series
IFAC-PapersOnLine, ISSN 2405-8971, E-ISSN 2405-8963
Keywords
Event-based vision, Event camera and LiDAR fusion, Human detection and tracking, NMPC-based tracking
National Category
Computer graphics and computer vision Atom and Molecular Physics and Optics
Research subject
Robotics and Artificial Intelligence
Identifiers
urn:nbn:se:ltu:diva-104460 (URN)10.1016/j.ifacol.2023.10.008 (DOI)001122557300481 ()2-s2.0-85183658513 (Scopus ID)
Conference
22nd IFAC World Congress, Yokohama, Japan, July 9-14, 2023
Funder
EU, Horizon 2020, 101003591
Note

Full text license: CC BY-NC-ND

Available from: 2024-03-12 Created: 2024-03-12 Last updated: 2025-02-01Bibliographically approved
2. BOX3D: Lightweight Camera-LiDAR Fusion for 3D Object Detection and Localization
Open this publication in new window or tab >>BOX3D: Lightweight Camera-LiDAR Fusion for 3D Object Detection and Localization
Show others...
2024 (English)In: 2024 32nd Mediterranean Conference on Control and Automation (MED), IEEE, 2024Conference paper, Published paper (Refereed)
Abstract [en]

Object detection and global localization play a crucial role in robotics, spanning across a great spectrum of applications from autonomous cars to multi-layered 3D Scene Graphs for semantic scene understanding. This article proposes BOX3D, a novel multi-modal and lightweight scheme for localizing objects of interest by fusing the information from RGB camera and 3D LiDAR. BOX3D is structured around a three-layered architecture, building up from the local perception of the incoming sequential sensor data to the global perception refinement that covers for outliers and the general consistency of each object's observation. More specifically, the first layer handles the low-level fusion of camera and LiDAR data for initial 3D bounding box extraction. The second layer converts each LiDAR's scan 3D bounding boxes to the world coordinate frame and applies a spatial pairing and merging mechanism to maintain the uniqueness of objects observed from different viewpoints. Finally, BOX3D integrates the third layer that supervises the consistency of the results on the global map iteratively, using a point-to-voxel comparison for identifying all points in the global map that belong to the object. Benchmarking results of the proposed novel architecture are showcased in multiple experimental trials on public state-of-the-art large-scale dataset of urban environments.

Place, publisher, year, edition, pages
IEEE, 2024
National Category
Computer graphics and computer vision
Research subject
Robotics and Artificial Intelligence
Identifiers
urn:nbn:se:ltu:diva-105327 (URN)10.1109/MED61351.2024.10566236 (DOI)2-s2.0-85198221796 (Scopus ID)
Conference
The 32nd Mediterranean Conference on Control and Automation (MED2024), Chania, Crete, Greece, June 11-14, 2024
Funder
Swedish Energy AgencyEU, Horizon Europe, 101091462 m4mining
Note

Funder: SP14 ‘Autonomous Drones for Underground Mining Operations’;

ISBN for host publication: 979-8-3503-9545-7; 979-8-3503-9544-0

Available from: 2024-05-03 Created: 2024-05-03 Last updated: 2025-02-07Bibliographically approved
3. Memory Enabled Segmentation of Terrain for Traversability based Reactive Navigation
Open this publication in new window or tab >>Memory Enabled Segmentation of Terrain for Traversability based Reactive Navigation
2023 (English)In: 2023 IEEE International Conference on Robotics and Biomimetics (ROBIO), IEEE, 2023Conference paper, Published paper (Refereed)
Place, publisher, year, edition, pages
IEEE, 2023
National Category
Computer graphics and computer vision Robotics and automation
Research subject
Robotics and Artificial Intelligence
Identifiers
urn:nbn:se:ltu:diva-103974 (URN)10.1109/ROBIO58561.2023.10354930 (DOI)2-s2.0-85182558371 (Scopus ID)979-8-3503-2570-6 (ISBN)979-8-3503-2571-3 (ISBN)
Conference
2023 IEEE International Conference on Robotics and Biomimetics, ROBIO 2023, Koh Samui, Thailand, December 4-9, 2023
Funder
EU, Horizon 2020, 101003591
Available from: 2024-01-29 Created: 2024-01-29 Last updated: 2025-02-05Bibliographically approved
4. Belief Scene Graphs: Expanding Partial Scenes with Objects through Computation of Expectation
Open this publication in new window or tab >>Belief Scene Graphs: Expanding Partial Scenes with Objects through Computation of Expectation
Show others...
2024 (English)Conference paper, Published paper (Refereed)
Abstract [en]

In this article, we propose the novel concept of Belief Scene Graphs, which are utility-driven extensions of partial 3D scene graphs, that enable efficient high-level task planning with partial information. We propose a graph-based learning methodology for the computation of belief (also referred to as expectation) on any given 3D scene graph, which is then used to strategically add new nodes (referred to as blind nodes) that are relevant to a robotic mission. We propose the method of Computation of Expectation based on Correlation Information (CECI), to reasonably approximate real Belief/Expectation, by learning histograms from available training data. A novel Graph Convolutional Neural Network (GCN) model is developed, to learn CECI from a repository of 3D scene graphs. As no database of 3D scene graphs exists for the training of the novel CECI model, we present a novel methodology for generating a 3D scene graph dataset based on semantically annotated real-life 3D spaces. The generated dataset is then utilized to train the proposed CECI model and for extensive validation of the proposed method. We establish the novel concept of \textit{Belief Scene Graphs} (BSG), as a core component to integrate expectations into abstract representations. This new concept is an evolution of the classical 3D scene graph concept and aims to enable high-level reasoning for task planning and optimization of a variety of robotics missions. The efficacy of the overall framework has been evaluated in an object search scenario, and has also been tested in a real-life experiment to emulate human common sense of unseen-objects. 

For a video of the article, showcasing the experimental demonstration, please refer to the following link: \url{https://youtu.be/hsGlSCa12iY}

Place, publisher, year, edition, pages
IEEE, 2024
National Category
Computer graphics and computer vision
Research subject
Robotics and Artificial Intelligence
Identifiers
urn:nbn:se:ltu:diva-105326 (URN)10.1109/ICRA57147.2024.10611352 (DOI)2-s2.0-85202433848 (Scopus ID)
Conference
The 2024 IEEE International Conference on Robotics and Automation (ICRA2024), Yokohama, Japan, May 13-17, 2024
Note

Funder: European Union’s HorizonEurope Research and Innovation Programme (101119774 SPEAR);

ISBN for host publication: 979-8-3503-8457-4;

Available from: 2024-05-03 Created: 2024-05-03 Last updated: 2025-02-07Bibliographically approved
5. MSL3D: Pointcloud-based muck pile Segmentation and Localization in Unknown SubT Environments
Open this publication in new window or tab >>MSL3D: Pointcloud-based muck pile Segmentation and Localization in Unknown SubT Environments
2023 (English)In: 2023 31st Mediterranean Conference on Control and Automation, MED 2023, Institute of Electrical and Electronics Engineers Inc. , 2023, p. 269-274Conference paper, Published paper (Refereed)
Abstract [en]

This article presents MSL3D, a novel framework for pointcloud-based muck pile Segmentation and Localization in unknown Sub-Terranean (Sub-T) environments. The proposed framework is capable of progressively segmenting the muck piles and extracting their location in a global constructed point cloud map, using the autonomy sensor payload of mining or robotic platforms. MSL3D is structured in a two layer novel architecture that relies on the geometric properties of muck piles in underground tunnels, where the first layer extracts a local Volume Of Interest (VOI) proposal area out of the registered point cloud and the second layer is refining the muck pile extraction of each VOI proposal in the global optimized point cloud map. The first layer of MSL3D is extracting local VOIs bounded in the look-ahead surroundings of the platform. More specifically, the ceiling, left and right walls as well as the ground are continuously segmented using progessive RANSAC, searching for inclination in the segmented ground area to keep as the next-best local VOI. Once a local VOI is extracted, it is transmitted to the second layer, where it is converted to the world frame coordinates. In the sequel, a morphological filter is applied, in order to segment ground and nonground points, followed by RANSAC once again to extract the remaining points corresponding to the right and left walls. In this approach, Euclidean clustering is utilized to keep the cluster with the majority of points, which is assumed to belong to the muck pile. The efficacy of the proposed novel scheme was successfully and experimentally validated in real and large scale SubT environments by utilizing a custom-made UAV.

Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers Inc., 2023
Series
Mediterranean Conference on Control and Automation, ISSN 2325-369X, E-ISSN 2473-3504
Keywords
Automatic muckl pile extraction, Muck pile Localization, Muck pile segmentation, Pointcloud processing
National Category
Electrical Engineering, Electronic Engineering, Information Engineering Computer and Information Sciences
Research subject
Robotics and Artificial Intelligence
Identifiers
urn:nbn:se:ltu:diva-101102 (URN)10.1109/MED59994.2023.10185912 (DOI)001042336800045 ()2-s2.0-85167798931 (Scopus ID)979-8-3503-1544-8 (ISBN)979-8-3503-1543-1 (ISBN)
Conference
31st Mediterranean Conference on Control and Automation, MED 2023, Limassol, Cyprus, June 26-29, 2023
Available from: 2023-08-30 Created: 2023-08-30 Last updated: 2024-05-03Bibliographically approved
6. EAT: Environment Agnostic Traversability for reactive navigation
Open this publication in new window or tab >>EAT: Environment Agnostic Traversability for reactive navigation
2024 (English)In: Expert systems with applications, ISSN 0957-4174, E-ISSN 1873-6793, Vol. 244, article id 122919Article in journal (Refereed) Published
Abstract [en]

This work presents EAT (Environment Agnostic Traversability for Reactive Navigation) a novel framework for traversability estimation in indoor, outdoor, subterranean (SubT) and other unstructured environments. The architecture provides updates on traversable regions online during the mission, adapts to varying environments, while being robust to noisy semantic image segmentation. The proposed framework considers terrain prioritization based on a novel decay exponential function to fuse the semantic information and geometric features extracted from RGB-D images to obtain the traversability of the scene. Moreover, EAT introduces an obstacle inflation mechanism on the traversability image, based on mean-window weighting module, allowing to adapt the proximity to untraversable regions. The overall architecture uses two LRASPP MobileNet V3 large Convolutional Neural Networks (CNN) for semantic segmentation over RGB images, where the first one classifies the terrain types and the second one classifies see-through obstacles in the scene. Additionally, the geometric features profile the underlying surface properties of the local scene, extracting normals from depth images. The proposed scheme was integrated with a control architecture in reactive navigation scenarios and was experimentally validated in indoor and outdoor environments as well as in subterranean environments with a Pioneer 3AT mobile robot.

Place, publisher, year, edition, pages
Elsevier Ltd, 2024
Keywords
Navigation in unstructured environments, Traversability estimation with RGB-D data, Traversability guided reactive navigation, Vision based autonomous systems
National Category
Computer graphics and computer vision Robotics and automation
Research subject
Robotics and Artificial Intelligence
Identifiers
urn:nbn:se:ltu:diva-103739 (URN)10.1016/j.eswa.2023.122919 (DOI)001144940800001 ()2-s2.0-85180941472 (Scopus ID)
Note

Validerad;2024;Nivå 2;2024-02-12 (joosat);

Funder: European Unions Horizon 2020 Research and Innovation Programme (101003591 NEXGEN-SIMS);

Full text license: CC BY

Available from: 2024-01-16 Created: 2024-01-16 Last updated: 2025-02-05Bibliographically approved

Open Access in DiVA

fulltext(121983 kB)621 downloads
File information
File name FULLTEXT02.pdfFile size 121983 kBChecksum SHA-512
2f67044d14d9fa640123e01bf2fec96b639c50e724ba44bda85067f7d7f8cad2d49bbf162a53f21502d55c537e4a94e83a190a4274bdf6f16c90b6d831f7d1ca
Type fulltextMimetype application/pdf

Authority records

Saucedo, Mario Alberto Valdes

Search in DiVA

By author/editor
Saucedo, Mario Alberto Valdes
By organisation
Signals and Systems
Computer graphics and computer vision

Search outside of DiVA

GoogleGoogle Scholar
Total: 621 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

isbn
urn-nbn

Altmetric score

isbn
urn-nbn
Total: 408 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf