Progress Report · KTH RPL

Unitree G1 — Whole-body
Teleoperation & Data Collection

Platform
Unitree G1 (humanoid)

Window
Weeks 1–12

Status
VLA policy deployed · scaling up

Completed Current direction Decision / transition Hardware risk Data collection

Goal Goal project north star

Humanoid whole-body training data on G1

Objective

Collect coordinated locomotion + manipulation demonstrations on the Unitree G1 to support future whole-body policy / VLA training.

01 · Input

Human VR motion

02 · Control

Whole-body controller

03 · Capture

Demonstration data

04 · Output

VLA / policy training

Week 1–2 Upper-body validation · sim & real

Upper-body VR teleoperation is working

Done

Deployed upper-body teleoperation using Meta Quest 2.
Validated in simulation and on the real G1.

Upper-body teleoperation in simulation — Simulation · upper-body tracking

Upper-body teleoperation on real G1 — Real G1 · Meta Quest 2 input

Week 3 VLA scope expansion

Extend VLA to humanoid

Start

Over the past two years, VLA models (such as the π-series) have shown promising results on both single-arm and dual-arm manipulation tasks.
A growing research direction is to extend VLA from arm-centric manipulation to humanoid whole-body control (such as GR00T).

A humanoid VLA using SONIC as controller

Week 4 Pico 4 hardware switch

New VR device for teleoperation

Purchase done

Meta Quest covered hands & head, but has no official leg motion tracker.
Switched to Pico 4 + body trackers so whole-body teleoperation has full-pose input.

Week 5–6 Option A IK upper · RL lower

Option A — IK upper body + RL lower body

Hardware risk

Method

Upper bodyIK-based joint-position control

Lower bodypretrained RL locomotion policy

Control objectives

AMO uses upper-body IK for sparse target tracking and a lower-body policy for locomotion: $$ (\mathbf{q}^{u}, \mathbf{c}) = \mathrm{IK}(\mathbf{x}^{\star}), \qquad \mathbf{a}^{l} = \pi^{l}(\mathbf{o}, \mathbf{v}^{\star}, \mathbf{c}). $$ Here, $\mathbf{x}^{\star}$ are head/wrist target poses, $\mathbf{q}^{u}$ upper-body joints, $\mathbf{c}$ torso/height commands, $\mathbf{o}$ proprioception, $\mathbf{v}^{\star}$ target velocity, $\pi^{l}$ the lower-body policy, and $\mathbf{a}^{l}$ lower-body actions.

Under upper-body manipulation, the lower body struggled to maintain balance.
Robot damage occurred during data collection runs.

IK + RL demonstration — Decouple whole-body control

Trade-off observed

+ IK advantage — better hand accuracy at the end-effector.
− IK problem — unnatural motion and poor stability during manipulation.

Decision Not suitable as the main data-collection method.

Week 7–8 Option B SONIC deployment

Option B — One policy for whole-body

Current main direction

Method

PolicySONIC · pretrained whole-body motion tracking

InputVR device motion / VLA command

Outputcoordinated full-body G1 motion

Training objective

$$ \mathcal{L}_{\text{SONIC}} = \mathcal{L}_{\text{RL}} + \mathcal{L}_{\text{recon}} + \mathcal{L}_{\text{token}} + \mathcal{L}_{\text{cycle}}, $$ where $\mathcal{L}_{\text{RL}}$ is the PPO surrogate that trains physically stable motion tracking, $\mathcal{L}_{\text{recon}}$ reconstructs robot motion from the learned tokens, $\mathcal{L}_{\text{token}}$ aligns human and robot motion tokens, and $\mathcal{L}_{\text{cycle}}$ enforces cross-embodiment consistency.

Produces more natural whole-body motion than IK.
Better balance during manipulation.
Main limitation: hand / end-effector accuracy is lower than IK.

SONIC deployment · run 1

SONIC deployment · run 2

SONIC deployment · run 3

Dimension

IK + RL Policy

SONIC

Hand accuracy

High

Medium

Whole-body naturalness

Low

High

Balance during manipulation

Weak

Better

Suitability for data collection

Risky

Promising

Decision Use SONIC as the current main whole-body teleoperation direction for data collection.

Week 9 Collecting first-person demos

First-person VLA data collection

Data collecting

Target data: human teleoperation command + first-person visual observations.
Goal scope: ~300 trajectories to fine-tune a locomotion + manipulation policy.

Third-person · session view

First-person · onboard camera

Week 10 Camera OAK switch

New camera for humanoid VLA

Purchase done

G1 humanoid onboard camera (RealSense 435) has a narrow field of view.
Switch to OAK camera.
Recent egocentric datasets for humanoid VLA pretraining, such as EgoScale, use the OAK camera.

EgoScale dataset · OAK-camera egocentric capture

Week 11 Lower locked upper-body data only

Lock lower body, collect data

Hardware risk

Humanoid keeps warning right ankle overheating after ~5 minutes of walking.
Workaround: lock the lower body and continue collecting upper-body manipulation data.
New right humanoid leg has shipped from Hong Kong — ETA ~3 weeks.

Upper-body-only data collection · lower body suspended

Risk Right ankle thermal warning blocks safe walking; downstream loco-manipulation collection is paused until the replacement leg arrives.

Week 12 Psi0 VLA policy deployed

Psi0 VLA policy works

Done

Successfully fine-tuned and deployed Psi0 VLA for upper-body manipulation on G1.
Fully autonomous, no teleoperation — the policy drives the arms directly from on-board observations.
Next step: train on more ambitious tasks using the Berzelius supercomputer.

Psi0 policy rollout · autonomous, no teleoperation

Milestone First end-to-end VLA policy running on the G1 from collected demonstrations — validates the data-collection pipeline from Weeks 1–11.

Summary Summary timeline takeaway

Timeline summary

Takeaway

Upper-body teleoperation works in simulation and on the real G1.
Whole-body teleoperation is necessary for loco-manipulation VLA training.
IK + RL was tested but is not reliable enough as the main data-collection method.
SONIC is the current main direction — more natural whole-body motion, better suited to coordinated demonstrations.
Hardware reliability remains a real constraint; robot damage was incurred during earlier collection runs.
Camera upgrade to OAK aligns the onboard view with recent humanoid VLA pretraining datasets (e.g. EgoScale).
Right ankle thermal issue forced a temporary lock of the lower body; replacement leg in shipping (Week 11).
Psi0 VLA policy deployed for upper-body manipulation — first autonomous rollout on G1 from collected data (Week 12).

End · Week 12