Whenever you encounter a Sim2Real hole like this, there are two choices. The simple choice is to introduce a brand new reward, telling the robotic to not do no matter unhealthy factor it’s doing. However the issue is that these rewards are a bit like duct tape on the robotic — inelegant, lacking the foundation causes. They pile up, and so they cloud the unique goal of the coverage with many different phrases. It results in a coverage which may work, however will not be comprehensible, and behaves unpredictably when composed with new rewards.
The opposite, more durable, choice is to take a step again and determine what it’s in regards to the simulations that differ from actuality. Agility as an organization has at all times been targeted on understanding the bodily instinct behind what we do. It’s how we designed our robotic, all the best way from the actuators to the software program.
Our RL method isn’t any completely different. We need to perceive the why and use that to drive the how. So we started a six-month journey to determine why our simulated toes don’t do the identical factor as our actual toes.
It turns on the market are quite a lot of causes. There have been simplifying assumption within the collision geometry, inaccuracies in how power propagated by our actuators and transmissions, and instabilities in how constraints are solved in our distinctive closed-chain kinematics (shaped by the connecting rods hooked up to our toe plates and tarsus). And we’ve been systematically learning, fixing, and eliminating these gaps.
The web end result has been an enormous step ahead in our RL software program stack. As an alternative of a pile of stacked-reward features over every little thing from “Cease wiggling your foot” to “Arise straighter,” we’ve a handful of rewards round issues like power consumption and symmetry that aren’t solely less complicated, but in addition observe our fundamental intuitions about how Digit ought to transfer.
Investing the time to know why the simulation differed has taught us much more about why we would like Digit to maneuver a sure manner within the first place. And most significantly, coupled with quick NVIDIA Isaac Sim, a reference software constructed on NVIDIA Omniverse for simulating an testing AI-driven robots, it’s enabled us to discover the impression of various bodily traits that we would need in future generations of Digit.