Search results
1 – 2 of 2Dan Feng, Zhenyu Yin, Xiaohui Wang, Feiqing Zhang and Zisong Wang
Traditional visual simultaneous localization and mapping (SLAM) systems are primarily based on the assumption that the environment is static, which makes them struggle with the…
Abstract
Purpose
Traditional visual simultaneous localization and mapping (SLAM) systems are primarily based on the assumption that the environment is static, which makes them struggle with the interference caused by dynamic objects in complex industrial production environments. This paper aims to improve the stability of visual SLAM in complex dynamic environments through semantic segmentation and its optimization.
Design/methodology/approach
This paper proposes a real-time visual SLAM system for complex dynamic environments based on YOLOv5s semantic segmentation, named YLS-SLAM. The system combines semantic segmentation results and the boundary semantic enhancement algorithm. By recognizing and completing the semantic masks of dynamic objects from coarse to fine, it effectively eliminates the interference of dynamic feature points on the pose estimation and enhances the retention and extraction of prominent features in the background, thereby achieving stable operation of the system in complex dynamic environments.
Findings
Experiments on the Technische Universität München and Bonn data sets show that, under monocular and Red, Green, Blue - Depth modes, the localization accuracy of YLS-SLAM is significantly better than existing advanced dynamic SLAM methods, effectively improving the robustness of visual SLAM. Additionally, the authors also conducted tests using a monocular camera in a real industrial production environment, successfully validating its effectiveness and application potential in complex dynamic environment.
Originality/value
This paper combines semantic segmentation algorithms with boundary semantic enhancement algorithms to effectively achieve precise removal of dynamic objects and their edges, while ensuring the system's real-time performance, offering significant application value.
Details
Keywords
Li Shaochen, Zhenyu Liu, Yu Huang, Daxin Liu, Guifang Duan and Jianrong Tan
Assembly action recognition plays an important role in assembly process monitoring and human-robot collaborative assembly. Previous works overlook the interaction relationship…
Abstract
Purpose
Assembly action recognition plays an important role in assembly process monitoring and human-robot collaborative assembly. Previous works overlook the interaction relationship between hands and operated objects and lack the modeling of subtle hand motions, which leads to a decline in accuracy for fine-grained action recognition. This paper aims to model the hand-object interactions and hand movements to realize high-accuracy assembly action recognition.
Design/methodology/approach
In this paper, a novel multi-stream hand-object interaction network (MHOINet) is proposed for assembly action recognition. To learn the hand-object interaction relationship in assembly sequence, an interaction modeling network (IMN) comprising both geometric and visual modeling is exploited in the interaction stream. The former captures the spatial location relation of hand and interacted parts/tools according to their detected bounding boxes, and the latter focuses on mining the visual context of hand and object at pixel level through a position attention model. To model the hand movements, a temporal enhancement module (TEM) with multiple convolution kernels is developed in the hand stream, which captures the temporal dependences of hand sequences in short and long ranges. Finally, assembly action prediction is accomplished by merging the outputs of different streams through a weighted score-level fusion. A robotic arm component assembly dataset is created to evaluate the effectiveness of the proposed method.
Findings
The method can achieve the recognition accuracy of 97.31% and 95.32% for coarse and fine assembly actions, which outperforms other comparative methods. Experiments on human-robot collaboration prove that our method can be applied to industrial production.
Originality/value
The author proposes a novel framework for assembly action recognition, which simultaneously leverages the features of hands, objects and hand-object interactions. The TEM enhances the representation of dynamics of hands and facilitates the recognition of assembly actions with various time spans. The IMN learns the semantic information from hand-object interactions, which is significant for distinguishing fine assembly actions.
Details