Robot Perception and Control

Robot Perception in 3D

Last updated: Jul / 25 /2024
Kashu Yamazaki
kyamazak@andrew.cmu.edu

Homogeneous Transformations

Rigid motions can be represented in set of matrices of the following form so that composition of rigid motions can be reduced to matrix multiplication.

This represents a homogeneous transformation matrix H, where R is a rotation matrix from the special orthogonal group SO(3), and d is a translation vector in 3D. The inverse transformation is given by:

Kashu Yamazaki, 2024

Homogeneous Transformations

The most general homogeneous transformation that we consider may be written as:

Here, is a vector representing the direction of ( axis of new frame) in the original frame, represents the direction of , and represents the direction of . The vector represents the position of the new origin in the original frame.

Kashu Yamazaki, 2024

Rotation Matrices

Translation along the axis:

Rotation around the axis:

Kashu Yamazaki, 2024

Traditional 3D representations

Voxel: simple extension of concept of pixel into 3D
we can reuse the thechniques (CNNs, etc.) used in images
✗ occupies too much memory (thus usually limited to )

Octree: hierarchical voxel
high quality 3D with less memory
✗ hard to generate and store

Point Cloud: group of points represents the 3D scene
much compact compared to voxel
✗ cannot represent the surface

Mesh: group of triangles (polygons) represents the 3D scene
very compact
✗ hard to obtain the mesh

Kashu Yamazaki, 2024

Kashu Yamazaki, 2024

Neural Fields

A field is a physical quantity that has a value for each point in space and time. A field can be expressed as a function that takes spacial coordinates as independent variables. A neural field is a field that is parameterized fully or partially by neural networks.

fields input/output example
Occupancy Field position existance Occupancy Networks
Distance Field position distance DeepSDF, PIFu
Radiance Field position + direction color + density NeRF
Scene Flow Field position scene flow Neural Scene Flow Fields
Semantic Field position semantics LeRF
Kashu Yamazaki, 2024

NeRF arxiv

Neural Rediance Field (NeRF) is a field represented by 5D vector (3D location and 2D viewing direction ) and has color and volume density for each point in space. NeRF approximate this continuous 5D scene representation with an MLP.

  • the weights of the MLP are the model of the world (overfits the model to one scene).
  • the most famous instance of neural fields.

Kashu Yamazaki, 2024

Dex-NeRF arxiv

Kashu Yamazaki, 2024

Gaussian Splatting

Kashu Yamazaki, 2024