C-GLANCE:

Uncertainty-Aware Active Perception
for Heterogeneous Robot Teaming

Abstract

Perceptual uncertainty is a central challenge for heterogeneous robot teams op- erating in unstructured outdoor environments, where no single viewpoint affords reliable scene understanding. Perceptual uncertainty, arising from sources such as occlusions, manifests differently across robot viewpoints depending on scene structure. Detecting and resolving sources of perceptual uncertainty requires both scene-based contextual reasoning and capability-aware robot allocation. While vision-language models provide strong semantic priors for both, they are com- putationally prohibitive for onboard inference and lack calibrated uncertainty quantification.

We introduce Co-GLANCE, a real-time onboard perception and decision-making system for uncertainty resolution in heterogeneous robot teams. Co-GLANCE distills the semantic reasoning capabilities of a vision-language model into an end-to-end model for occlusion segmentation and robot allocation, elim- inating the need for cloud-based inference. To quantify perceptual uncertainty, Co-GLANCE combines conformal prediction with selective abstention to provide statistically valid coverage guarantees for segmentation, robot allocation, and de- tection outputs. These calibrated uncertainty estimates directly trigger active per- ception, dispatching the most appropriate robot to acquire informative viewpoints and resolve uncertainty.

Across real-world scenarios, Co-GLANCE outperforms cloud-based vision-language model baselines in occlusion segmentation and robot allocation accuracy by 25% and 36%, respectively, while reducing per-frame in- ference latency 350×. We also release an air-ground dataset for future research. Code, videos, and dataset available at: co-glance.github.io.

An aerial robot and ground robot coordinate through communication for active perception.

Framework Overview

System overview: (1) perceptual uncertainty detection, (2) occlusion uncertainty, (3) resolution of high-uncertainty areas, (4) object detection, (5) detection uncertainty, and (6) uncertainty-driven active perception.

VLM Distillation

Perceptual uncertainty detection: (1) occlusion segmentation and robot allocation by VLM with self-review, (2) knowledge distillation, and (3) onboard inference using the distilled model.

Dataset

Real air-ground data is costly to collect, requiring two robots operating outdoors simultaneously with synchronized sensing and metric localization across platforms. The Co-GLANCE dataset provides more than 4,000 synchronized aerial and ground RGB frames across semi-structured outdoor scenarios, recorded with a DJI Matrice 600 and a Boston Dynamics Spot.

Depending on the scenario, available streams include RGB, estimated depth, RTK GPS, and IMU data. Raw ROS 2 bags from both platforms are also released to support evaluation of perception and autonomy stacks beyond static image benchmarks.

Scenario	Run	Frame Pairs
Construction	1	118
Construction	2	326
Construction	3	280
Construction	4	485
Construction	Total	1,209
Camouflage	1	186
Camouflage	2	545
Camouflage	3	131
Camouflage	Total	862

The construction scenario contains four runs and 1,209 annotated frame pairs. The camouflage scenario contains three runs and 862 annotated frame pairs, with two camouflage-wearing individuals moving through visually occluded areas.

Event –

Run –

Category –

Aerial Ground

Aerial GPS Ground GPS

Frame 0 / 0 ← → keys

C-GLANCE:

Uncertainty-Aware Active Perception
for Heterogeneous Robot Teaming

Abstract

Framework Overview

VLM Distillation

Dataset

Real-World Experiments

Related Links

Code

Dataset Viewer

Dataset

BibTeX

C-GLANCE:

Uncertainty-Aware Active Perception for Heterogeneous Robot Teaming

Abstract

Framework Overview

VLM Distillation

Dataset

Real-World Experiments

Related Links

Code

Dataset Viewer

Dataset

BibTeX

Uncertainty-Aware Active Perception
for Heterogeneous Robot Teaming