DHP-Mapping: A Dense Panoptic Mapping System with Hierarchical World Representation and Label Optimization Techniques

Tianshuai Hu1 Jianhao Jiao3 Yucheng Xu4 Hongji Liu1 Sheng Wang1 Ming Liu2*
*Corresponding Author.
1Hong Kong University of Science and Technology 2Hong Kong University of Science and Technology (GuangZhou) 3University College London 4University of Edinburgh

[Arxiv Report]     [Github]     [BibTeX]


Maps provide robots with crucial environmental knowledge, thereby enabling them to perform interactive tasks effectively. Easily accessing accurate abstract-to-detailed geometric and semantic concepts from maps is crucial for robots to make informed and efficient decisions.

We propose DHP-Mapping, a dense mapping system that represented the environment as a collection of TSDF submaps, with each submap representing a unique object. The map structure enable hierarchical modeling. It stores the voxel level geometry and label information into the TSDF and label layer of each submap and maintain the instance-level information via the submap collection.

Our system converts sensor data into point segments. (A): A data association process assign each segment a submap ID. (B): Segment information is integrated into the assigned submap's TSDF and label layers through ray-tracing. Two modules are proposed to enhance mapping quality. (C): Fusing information among voxels sharing identical spatila inforamtion to avoid submap overlaping. (D): A CRF algorithm encourages label consistency among voxels exhibiting similar color and nearby position.

We conduct experiments on indoor simulation and outdoor real-world datasets. (flat SemanticKITTI) Qualitative results demonstrate our system's advancement in comprehensively reconstructing scenes. Compared to panmap, our system can categorize semantic classes and track objects IDs with higher accuracy, and can produce denser map with high precision.

Fig. 1 - Visualization results of dense panoptic mapping systems run on flat and semanticKITTI datasets. Meshes are extracted from the TSDF map using the marching cubes algorithm. The first line displays our system's map reconstruction results using the color values stored in their TSDF layers. The second to the fifth lines show the maps with label results. Different colors in each sub-figure represent unique object IDs. Compared with Panmap, our DHP-Mapping produces more consistent labels and reconstructs denser and more accurate geometry~(obvious in columns d-e-f). The proposed refinement techniques help to reduce submap overlaps and enhance label accuracy, without which submaps tend to intertwine with each other~(highlighted by black circles).

The exclusion of the refinement module results in an obvious decline in performance. Directly integrating imperfect panoptic segmentation results into the map leads to unclear submap boundaries, causing submaps to mix with others.

Quantatitive results show our method provides more accurate metric-semantic map. It categorize semantic classes and track objects IDs with higher accuracy, and produce denser map with high precision.

Tab. 1 - Label accuracy comparisons with a SOTA panoptic mapping method and impact of our proposed label refinement modules. This table displays the quantitative evaluation results of our method versus the SOTA panoptic mapping approach, across two datasets. We compare their performances using the Panoptic, Semantic, and Instance Metrics. Additionally, we compare the effect of with/without our label refinement module.We bold the best results and underline the second best results. Note: mAP-(x) indicates the evaluation is conducted with IoU = x.
Tab. 2 - Comparison of semantic label accuracy between Kimera and DHP-mapping across two datasets.
Tab. 3 - Comparisons of geometry reconstruction quality between DHP-Mapping and SOTA metric-semantic mapping algorithms. We bold the best results and underline the second best results. Note: Acc., Comp. and C-L1 are reported in meters.

The use of a comprehensive labeling system in our mapping system greatly enriches scene representation Besides, the hierarchical submap-based data structure facilitates submap-level manipulation and ensures rapid information retrieval.

Fig. 2 - An example of object manipulations using the submap data structure. We add a new object (green) by duplicating an existing submap (blue) and adjusting object position by modifying the relative pose between the submap and the global coordinate (pink).
Tab. 4 - Objects retrieval time with single TSDF map structure, while leveraging our hierarchical submap-based data structure the retrieval time can be ignored.

Conclusion

In this work, we design a dense volumetric mapping system that uses multiple TSDF submaps and panoptic labels to represent the scene hierarchically and holistically, while maintaining voxel-level and submap-level metric and label information. The proposed inter-submaps label management module ensures the disjoint of spatial information in each submap. The label refinement module improves the accuracy of panoptic labels by taking advantage of the inherent cohesion of objects and incorporating contextual information from the entire scene. This hierarchical TSDF submaps with panoptic labels data structure enable high-level interactive tasks and dynamic environment modeling. In future work, more abstract and high-level representation can be further extracted and integrated above this data structure. This includes establishing topological connections between entities and integrating language-based expressions that go beyond metric and symbolic representations.

BibTeX

@misc{hu2024dhpmapping,
  title={DHP-Mapping: A Dense Panoptic Mapping System with Hierarchical World Representation and Label Optimization Techniques},
  author={Tianshuai Hu and Jianhao Jiao and Yucheng Xu and Hongji Liu and Sheng Wang and Ming Liu},
  year={2024},
  eprint={2403.16880},
  archivePrefix={arXiv},
  primaryClass={cs.RO}
}

Project page template is borrowed from DreamBooth.