← All writing
Progress Report · KTH RPL

Unitree G1 — Whole-body
Teleoperation & Data Collection

Platform
Unitree G1 (humanoid)
Window
Weeks 1–12
Status
VLA policy deployed · scaling up
Completed Current direction Decision / transition Hardware risk Data collection
Goal Goal project north star

Humanoid whole-body training data on G1

Objective

Collect coordinated locomotion + manipulation demonstrations on the Unitree G1 to support future whole-body policy / VLA training.

01 · Input
Human VR motion
02 · Control
Whole-body controller
03 · Capture
Demonstration data
04 · Output
VLA / policy training
Week 1–2 Upper-body validation · sim & real

Upper-body VR teleoperation is working

Done
  • Deployed upper-body teleoperation using Meta Quest 2.
  • Validated in simulation and on the real G1.
Upper-body teleoperation in simulation
Simulation · upper-body tracking
Upper-body teleoperation on real G1
Real G1 · Meta Quest 2 input
Week 3 VLA scope expansion

Extend VLA to humanoid

Start
  • Over the past two years, VLA models (such as the π-series) have shown promising results on both single-arm and dual-arm manipulation tasks.
  • A growing research direction is to extend VLA from arm-centric manipulation to humanoid whole-body control (such as GR00T).
A humanoid VLA using SONIC as controller
Week 4 Pico 4 hardware switch

New VR device for teleoperation

Purchase done
  • Meta Quest covered hands & head, but has no official leg motion tracker.
  • Switched to Pico 4 + body trackers so whole-body teleoperation has full-pose input.
Pico 4 VR setup
Pico 4
Week 5–6 Option A IK upper · RL lower

Option A — IK upper body + RL lower body

Hardware risk
Method
Upper bodyIK-based joint-position control
Lower bodypretrained RL locomotion policy
Control objectives
AMO uses upper-body IK for sparse target tracking and a lower-body policy for locomotion: $$ (\mathbf{q}^{u}, \mathbf{c}) = \mathrm{IK}(\mathbf{x}^{\star}), \qquad \mathbf{a}^{l} = \pi^{l}(\mathbf{o}, \mathbf{v}^{\star}, \mathbf{c}). $$ Here, \(\mathbf{x}^{\star}\) are head/wrist target poses, \(\mathbf{q}^{u}\) upper-body joints, \(\mathbf{c}\) torso/height commands, \(\mathbf{o}\) proprioception, \(\mathbf{v}^{\star}\) target velocity, \(\pi^{l}\) the lower-body policy, and \(\mathbf{a}^{l}\) lower-body actions.
  • Under upper-body manipulation, the lower body struggled to maintain balance.
  • Robot damage occurred during data collection runs.
IK + RL demonstration
Decouple whole-body control
Hardware damage
Hardware damage during teleoperation
Trade-off observed
+ IK advantage — better hand accuracy at the end-effector.
− IK problem — unnatural motion and poor stability during manipulation.
Decision Not suitable as the main data-collection method.
Week 7–8 Option B SONIC deployment

Option B — One policy for whole-body

Current main direction
Method
PolicySONIC · pretrained whole-body motion tracking
InputVR device motion / VLA command
Outputcoordinated full-body G1 motion
Training objective
$$ \mathcal{L}_{\text{SONIC}} = \mathcal{L}_{\text{RL}} + \mathcal{L}_{\text{recon}} + \mathcal{L}_{\text{token}} + \mathcal{L}_{\text{cycle}}, $$ where $\mathcal{L}_{\text{RL}}$ is the PPO surrogate that trains physically stable motion tracking, $\mathcal{L}_{\text{recon}}$ reconstructs robot motion from the learned tokens, $\mathcal{L}_{\text{token}}$ aligns human and robot motion tokens, and $\mathcal{L}_{\text{cycle}}$ enforces cross-embodiment consistency.
  • Produces more natural whole-body motion than IK.
  • Better balance during manipulation.
  • Main limitation: hand / end-effector accuracy is lower than IK.
SONIC deployment · run 1
SONIC deployment · run 2
SONIC deployment · run 3
Dimension
IK + RL Policy
SONIC
Hand accuracy
High
Medium
Whole-body naturalness
Low
High
Balance during manipulation
Weak
Better
Suitability for data collection
Risky
Promising
Decision Use SONIC as the current main whole-body teleoperation direction for data collection.
Week 9 Collecting first-person demos

First-person VLA data collection

Data collecting
  • Target data: human teleoperation command + first-person visual observations.
  • Goal scope: ~300 trajectories to fine-tune a locomotion + manipulation policy.
Third-person · session view
First-person · onboard camera
Week 10 Camera OAK switch

New camera for humanoid VLA

Purchase done
  • G1 humanoid onboard camera (RealSense 435) has a narrow field of view.
  • Switch to OAK camera.
  • Recent egocentric datasets for humanoid VLA pretraining, such as EgoScale, use the OAK camera.
EgoScale dataset · OAK-camera egocentric capture
Week 11 Lower locked upper-body data only

Lock lower body, collect data

Hardware risk
  • Humanoid keeps warning right ankle overheating after ~5 minutes of walking.
  • Workaround: lock the lower body and continue collecting upper-body manipulation data.
  • New right humanoid leg has shipped from Hong Kong — ETA ~3 weeks.
Upper-body-only data collection · lower body suspended
Risk Right ankle thermal warning blocks safe walking; downstream loco-manipulation collection is paused until the replacement leg arrives.
Week 12 Psi0 VLA policy deployed

Psi0 VLA policy works

Done
  • Successfully fine-tuned and deployed Psi0 VLA for upper-body manipulation on G1.
  • Fully autonomous, no teleoperation — the policy drives the arms directly from on-board observations.
  • Next step: train on more ambitious tasks using the Berzelius supercomputer.
Psi0 policy rollout · autonomous, no teleoperation
Milestone First end-to-end VLA policy running on the G1 from collected demonstrations — validates the data-collection pipeline from Weeks 1–11.
Summary Summary timeline takeaway

Timeline summary

Takeaway
  1. Upper-body teleoperation works in simulation and on the real G1.
  2. Whole-body teleoperation is necessary for loco-manipulation VLA training.
  3. IK + RL was tested but is not reliable enough as the main data-collection method.
  4. SONIC is the current main direction — more natural whole-body motion, better suited to coordinated demonstrations.
  5. Hardware reliability remains a real constraint; robot damage was incurred during earlier collection runs.
  6. Camera upgrade to OAK aligns the onboard view with recent humanoid VLA pretraining datasets (e.g. EgoScale).
  7. Right ankle thermal issue forced a temporary lock of the lower body; replacement leg in shipping (Week 11).
  8. Psi0 VLA policy deployed for upper-body manipulation — first autonomous rollout on G1 from collected data (Week 12).
End · Week 12