Dataset
LIBERO demonstration data was converted into local LeRobot datasets with agent-view RGB, wrist RGB, robot state, actions, and task IDs for multitask conditioning.
Robotics machine learning case study
This project trains LeRobot ACT policies for simulated robot manipulation. The work is organized as an empirical debugging loop: observe rollout failures, make one targeted change, then verify the change on benchmark initial states.
Project setup
LIBERO demonstration data was converted into local LeRobot datasets with agent-view RGB, wrist RGB, robot state, actions, and task IDs for multitask conditioning.
The experiments focus on LIBERO-Spatial bowl-placement tasks. The single-task track starts with placing the black bowl on the plate; the multitask track covers all ten spatial variants.
Evaluation runs in LIBERO off-screen simulation on pruned benchmark initial states, not demonstration starts, so reported success reflects benchmark rollout behavior.
Section 1
Goal: make one ACT policy reliably solve the black-bowl placement task, then use rollout failures to decide which training changes are worth testing.
41 / 50 benchmark episodes
Single-task thinking process
Before the KL trick
Success plateaued at 31 / 50. Remaining failures were sensitive around grasp and placement.
After the KL trick
Reducing the KL penalty gave the policy enough action flexibility to reach 41 / 50.
Section 2
Goal: train one task-conditioned ACT policy across all ten LIBERO-Spatial tasks and understand why weak tasks do not improve just by sampling them more often.
10-task benchmark average
Multi-task thinking process
| Task | Scenario | Success |
|---|---|---|
| 02 | from table center | 0.94 |
| 03 | on cookie box | 0.92 |
| 09 | on wooden cabinet | 0.86 |
| 00 | between plate and ramekin | 0.82 |
| 05 | on ramekin | 0.16 |
Before the oversampling trick
The baseline has a strong mean, but task 05 remains a hard failure case.
After the oversampling trick
The plausible fix backfires: benchmark mean drops from 69.4% to 22.0%.