Collaborative Robotics · Winter 2026
Bimanual Granular Scooping
A bimanual mobile robot that perceives objects through RGB-D sensing, plans grasps in 3D, navigates autonomously, and scoops granular materials using coordinated two-arm control.
Listed alphabetically
Overview
We demonstrate object manipulation, navigation-integrated grasping, and coordinated bimanual candy scooping with tool use, in simulation and in the real world.
Task 3. Bimanual candy scooping in simulation.
Tasks 1 & 2. Manipulation and locomotion simulation
Tasks 1 & 2. Manipulation and locomotion with real robot
System Architecture
The control architecture connects perception, navigation, grasp planning, and a high-level controller. An initial state-machine approach was augmented with an LLM-powered controller (Claude Opus 4.6) for generalizable voice-driven interaction.
Figure 1. Robot system architecture. External GPU servers handle perception inference and voice processing. A high-level controller orchestrates all modules via services and actions. Execution nodes translate plans and forward to hardware commands. The system is implemented in ROS 2.
Manipulation & Locomotion
Bimanual pick-and-place powered by RGB-D perception, real-time tracking, and autonomous navigation. AnyGrasp processes depth and color data to generate 6-DOF grasp poses, while SAM3 provides segmentation masks for continuous object tracking as the arms move. A PID-based controller handles straight-line path following and turning, coordinating base locomotion with arm manipulation.
Key Components
- AnyGrasp server — generates 6-DOF grasp poses from RGB-D point clouds
- Iterative closest point (ICP) — refines point clouds using known 3D object mesh models (STL) before grasp detection
- SAM3 tracker — provides segmentation masks for continuous object tracking with proportional camera control
- PID navigation controller — handles straight-line path following and turning for autonomous locomotion
- Execute grasp server — plans and executes 6-DOF grasps from AnyGrasp poses
Figure 2. Sequence of manipulation actions on the real robot — CV detection, object tracking, navigation, and task execution.
Figure 3a. AnyGrasp point cloud with grasp pose for a banana.
Figure 3b. AnyGrasp point cloud with grasp pose for a cube.
Scooping Granular Objects
Full candy scooping pipeline: the robot grasps a scoop and bucket, navigates to a candy box, scoops the candy, and transfers it to the bucket. This integrates bimanual grasping, autonomous navigation, and coordinated tool use in one continuous sequence.
Candy Scooping Pipeline
Figure 4. Candy scooping pipeline with six coordinated steps.
Key Components
- ArUco detection node — locates the scoop and bucket using ArUco fiducial markers
- Execute grasp server — coordinates bimanual grasp sequences
- Waypoint trajectories — arm paths for scooping and transfer motions
Figure 5a. Bimanual candy scooping scene setup.
Figure 5b. ArUco tag-based pose estimation for tool localization.
Figure 5c. Scooping perspective during candy pickup.
Development & Debugging Tools
Custom tools for visualizing sensor streams, diagnosing perception failures, and validating grasp plans before deploying to hardware.
Figure 6a. Teleoperation interface for manual robot control.
Figure 6b. Real-time sensor and state display.
Figure 7a. Graphical interface with Claude Opus integrated as an autonomous controller. Originally built to record examples for training a vision-language-action (VLA) or action chunking with transformers (ACT) policy.
Figure 7b. Claude control panel for voice-driven interaction with Google Cloud text-speech processing, Gemini task parsing, and active listening with keyword detection.