State#
MetaSim use state to describe the state of a simulation environment at a given time.
A unified state is the key to align different simulators.
state Structure#
The state is a dictionary that contains the following keys:
objects: a dictionary that map object name to its stateobject_state.robots: a dictionary that map robot name to its staterobot_state.cameras: a dictionary that map camera name to its statecamera_state.
object_state Structure#
The object_state is a dictionary that contains the following keys:
pos: the position of the object, as atensor([x, y, z]).rot: the quaternion of the object, as atensor([w, x, y, z]).vel: the linear velocity of the object, as atensor([vx, vy, vz]).ang_vel: the angular velocity of the object, as atensor([wx, wy, wz]).
The following keys are optional and only used for articulation objects:
dof_pos: the joint positions, as a dict{'joint1': qpos1, 'joint2': qpos2, ...}.dof_vel: the joint velocities, as a dict{'joint1': qvel1, 'joint2': qvel2, ...}.body: a dictionary that maps body link name to its statebody_state.
The body_state is a dictionary that contains pos, rot, vel and ang_vel keys. The definition is the same as above, but for the body link.
robot_state Structure#
The robot_state contains all the above keys of an articulation object. Plus, it also contains the following keys:
dof_pos_target: the target joint positions, as a dict{'joint1': qpos1, 'joint2': qpos2, ...}.dof_vel_target: the target joint velocities, as a dict{'joint1': qvel1, 'joint2': qvel2, ...}.
camera_state Structure#
The camera_state is a dictionary that contains the following keys:
rgb: the RGB images, as a tensor of shape[H, W, 3].depth: the depth images, as a tensor of shape[H, W].pos: the position of the camera, as atensor([x, y, z]). (not supported yet)look_at: the look at point of the camera, as atensor([x, y, z]). (not supported yet)intrinsic: the intrinsic matrix of the camera, as a tensor of shape[3, 3]. (not supported yet)extrinsic: the extrinsic matrix of the camera, as a tensor of shape[4, 4]. (not supported yet)
State Example#
Here is an feasible example of a state:
{
"objects": {
"cube": {
"pos": tensor([0.0, 0.0, 0.0]),
"rot": tensor([1.0, 0.0, 0.0, 0.0]),
"vel": tensor([0.0, 0.0, 0.0]),
"ang_vel": tensor([0.0, 0.0, 0.0]),
},
"box": {
"pos": tensor([0.0, 0.0, 0.0]),
"rot": tensor([1.0, 0.0, 0.0, 0.0]),
"vel": tensor([0.0, 0.0, 0.0]),
"ang_vel": tensor([0.0, 0.0, 0.0]),
"dof_pos": { "box_joint": 0.0 },
"dof_vel": { "box_joint": 0.0 },
"body": {
"box_lid": {
"pos": tensor([0.0, 0.0, 0.0]),
"rot": tensor([1.0, 0.0, 0.0, 0.0]),
"vel": tensor([0.0, 0.0, 0.0]),
"ang_vel": tensor([0.0, 0.0, 0.0]),
},
"box_body": {
"pos": tensor([0.0, 0.0, 0.0]),
"rot": tensor([1.0, 0.0, 0.0, 0.0]),
"vel": tensor([0.0, 0.0, 0.0]),
"ang_vel": tensor([0.0, 0.0, 0.0]),
},
}
},
},
"robots": {
"franka": {
"pos": tensor([0.0, 0.0, 0.0]),
"rot": tensor([1.0, 0.0, 0.0, 0.0]),
"vel": tensor([0.0, 0.0, 0.0]),
"ang_vel": tensor([0.0, 0.0, 0.0]),
"dof_pos": {
"panda_joint1": 0.0,
"panda_joint2": -0.785398,
"panda_joint3": 0.0,
"panda_joint4": -2.356194,
"panda_joint5": 0.0,
"panda_joint6": 1.570796,
"panda_joint7": 0.785398,
"panda_finger_joint1": 0.04,
"panda_finger_joint2": 0.04,
},
"dof_vel": {
"panda_joint1": 0.0,
"panda_joint2": 0.0,
"panda_joint3": 0.0,
"panda_joint4": 0.0,
"panda_joint5": 0.0,
"panda_joint6": 0.0,
"panda_joint7": 0.0,
"panda_finger_joint1": 0.0,
"panda_finger_joint2": 0.0,
},
"dof_pos_target": {
"panda_joint1": 0.0,
"panda_joint2": -0.785398,
"panda_joint3": 0.0,
"panda_joint4": -2.356194,
"panda_joint5": 0.0,
"panda_joint6": 1.570796,
"panda_joint7": 0.785398,
"panda_finger_joint1": 0.04,
"panda_finger_joint2": 0.04,
},
"dof_vel_target": {
"panda_joint1": 0.0,
"panda_joint2": 0.0,
"panda_joint3": 0.0,
"panda_joint4": 0.0,
"panda_joint5": 0.0,
"panda_joint6": 0.0,
"panda_joint7": 0.0,
"panda_finger_joint1": 0.0,
"panda_finger_joint2": 0.0,
},
}
},
"cameras": {
"camera0": {
"rgb": torch.zeros((H, W, 3)),
"depth": torch.zeros((H, W)),
}
},
}
state with Functions#
MetaSim APIs always deal with states as a list of state. The length of the list is the number of environments. The observation term returned by env.reset() and env.step() is also unified to states.
handler.get_states() -> list[State]handler.set_states(states: list[State]) -> Noneenv.reset(init_states: list[State]) -> tuple[list[State], Extra]env.step(actions: list[Action]) -> tuple[list[State], list[Reward], list[Success], list[TimeOut], Extra]