Completed
Current direction
Decision / transition
Hardware risk
Data collection
Goal
Goal
project north star
Humanoid whole-body training data on G1
Objective
Collect coordinated locomotion + manipulation demonstrations on the Unitree G1 to support future whole-body policy / VLA training.
01 · Input
Human VR motion
02 · Control
Whole-body controller
03 · Capture
Demonstration data
04 · Output
VLA / policy training
Week 1–2
Upper-body
validation · sim & real
Upper-body VR teleoperation is working
Done
- Deployed upper-body teleoperation using Meta Quest 2.
- Validated in simulation and on the real G1.
Week 3
VLA
scope expansion
Extend VLA to humanoid
Start
-
Over the past two years, VLA models (such as the π-series) have shown promising results on both single-arm and dual-arm
manipulation tasks.
-
A growing research direction is to extend VLA from arm-centric manipulation to humanoid whole-body control (such as GR00T).
Week 4
Pico 4
hardware switch
New VR device for teleoperation
Purchase done
- Meta Quest covered hands & head, but has no official leg motion tracker.
- Switched to Pico 4 + body trackers so whole-body teleoperation has full-pose input.
Week 5–6
Option A
IK upper · RL lower
Option A — IK upper body + RL lower body
Hardware risk
Method
Upper bodyIK-based joint-position control
Lower bodypretrained RL locomotion policy
Control objectives
AMO uses upper-body IK for sparse target tracking and a lower-body policy for locomotion: $$ (\mathbf{q}^{u}, \mathbf{c}) =
\mathrm{IK}(\mathbf{x}^{\star}), \qquad \mathbf{a}^{l} = \pi^{l}(\mathbf{o}, \mathbf{v}^{\star}, \mathbf{c}). $$ Here,
\(\mathbf{x}^{\star}\) are head/wrist target poses, \(\mathbf{q}^{u}\) upper-body joints, \(\mathbf{c}\) torso/height commands,
\(\mathbf{o}\) proprioception, \(\mathbf{v}^{\star}\) target velocity, \(\pi^{l}\) the lower-body policy, and \(\mathbf{a}^{l}\)
lower-body actions.
- Under upper-body manipulation, the lower body struggled to maintain balance.
- Robot damage occurred during data collection runs.
Trade-off observed
+ IK advantage — better hand accuracy at the end-effector.
− IK problem — unnatural motion and poor stability during manipulation.
Decision
Not suitable as the main data-collection method.
Week 7–8
Option B
SONIC deployment
Option B — One policy for whole-body
Current main direction
Method
PolicySONIC · pretrained whole-body motion tracking
InputVR device motion / VLA command
Outputcoordinated full-body G1 motion
Training objective
$$ \mathcal{L}_{\text{SONIC}} = \mathcal{L}_{\text{RL}} + \mathcal{L}_{\text{recon}} + \mathcal{L}_{\text{token}} +
\mathcal{L}_{\text{cycle}}, $$ where $\mathcal{L}_{\text{RL}}$ is the PPO surrogate that trains physically stable motion tracking,
$\mathcal{L}_{\text{recon}}$ reconstructs robot motion from the learned tokens, $\mathcal{L}_{\text{token}}$ aligns human and robot
motion tokens, and $\mathcal{L}_{\text{cycle}}$ enforces cross-embodiment consistency.
- Produces more natural whole-body motion than IK.
- Better balance during manipulation.
- Main limitation: hand / end-effector accuracy is lower than IK.
Dimension
IK + RL Policy
SONIC
Hand accuracy
High
Medium
Whole-body naturalness
Low
High
Balance during manipulation
Weak
Better
Suitability for data collection
Risky
Promising
Decision
Use SONIC as the current main whole-body teleoperation direction for data collection.
Week 9
Collecting
first-person demos
First-person VLA data collection
Data collecting
- Target data: human teleoperation command + first-person visual observations.
- Goal scope: ~300 trajectories to fine-tune a locomotion + manipulation policy.
Week 10
Camera
OAK switch
New camera for humanoid VLA
Purchase done
- G1 humanoid onboard camera (RealSense 435) has a narrow field of view.
- Switch to OAK camera.
- Recent egocentric datasets for humanoid VLA pretraining, such as EgoScale, use the OAK camera.
Week 11
Lower locked
upper-body data only
Lock lower body, collect data
Hardware risk
- Humanoid keeps warning right ankle overheating after ~5 minutes of walking.
- Workaround: lock the lower body and continue collecting upper-body manipulation data.
- New right humanoid leg has shipped from Hong Kong — ETA ~3 weeks.
Risk
Right ankle thermal warning blocks safe walking; downstream loco-manipulation collection is paused until the replacement leg arrives.
Week 12
Psi0 VLA
policy deployed
Psi0 VLA policy works
Done
- Successfully fine-tuned and deployed Psi0 VLA for upper-body manipulation on G1.
- Fully autonomous, no teleoperation — the policy drives the arms directly from on-board observations.
- Next step: train on more ambitious tasks using the Berzelius supercomputer.
Milestone
First end-to-end VLA policy running on the G1 from collected demonstrations — validates the data-collection pipeline from Weeks 1–11.
Summary
Summary
timeline takeaway
Timeline summary
Takeaway
- Upper-body teleoperation works in simulation and on the real G1.
- Whole-body teleoperation is necessary for loco-manipulation VLA training.
- IK + RL was tested but is not reliable enough as the main data-collection method.
- SONIC is the current main direction — more natural whole-body motion, better suited to coordinated demonstrations.
- Hardware reliability remains a real constraint; robot damage was incurred during earlier collection runs.
- Camera upgrade to OAK aligns the onboard view with recent humanoid VLA pretraining datasets (e.g. EgoScale).
- Right ankle thermal issue forced a temporary lock of the lower body; replacement leg in shipping (Week 11).
- Psi0 VLA policy deployed for upper-body manipulation — first autonomous rollout on G1 from collected data (Week 12).
End · Week 12