Robot Perception and Control

Introduction

Last updated: Jul / 25 /2024
Kashu Yamazaki
kyamazak@andrew.cmu.edu

Logistics

Logistics

Lectures

  • Time: 00:00 - 00:00 (CST), MWF / TuTh
  • Location: JBHT 000 (in person)

Office Hours

  • Instructor: By appointment via email
  • TA: 00:00 - 00:00 (CST), MWF
Kashu Yamazaki, 2024

Logistics

Grading Policy

  • HWs (30%): 5 home works
  • Quizzes (10%): 5 quizzes each worth 20 points
  • Midterm (30%):
  • Final Project (30%): project report + presentation

A: 90% ~ , B: 80% ~ 90 %, C: 70% ~ 80 %, D: 60% ~ 70 %

Submission Policy

  • HWs: penalty / day from the total points of the assignment after due.
  • Final Project: No late submission.

Every submission is due midnight (11:59 pm) on the date specified.

Kashu Yamazaki, 2024

Robot Learning

What is Robot Learning?

Robot learning is a research field at the intersection of machine learning and robotics. It studies techniques allowing a robot to acquire novel skills or adapt to its environment through learning algorithms.

  • Sensing: observe the physical world through multimodal senses
  • Perception: acquiring knowledge from sensor data
  • Action: act on the environment to execute task / acquire new observation

A key challenge in Robot Learning is to close the perception-action loop.

Kashu Yamazaki, 2024

Applications of Robot Learning

#center
#center

Manipulation

#center
#center

Locomotion

#center
#center

Mobile Manipulation

Kashu Yamazaki, 2024

When Should Robots Learn?

Robots should be designed to learn in situations where pre-existing knowledge or established protocols are insufficient or non-existent, requiring them to discover knowledge from data:

  • High Environmental Uncertainty
  • Significant Variation in Observations
  • Lack of Reliable Priors
  • Complex or Unstructured Environments
  • Continuous Improvement

Learning is NOT the solution to every problem in robotics.

When the task can be modeled without knowledge from data, learning algorithm is not required (and learning algorithm tend to perform worse). We can also combine the learning system with classical techniques.

Kashu Yamazaki, 2024

How to make robots learn?

These days, many robot learning methods are based on deep neural networks with various learning algorithms (supervised learning, unsupervised learning, reinforcement learning, etc.).

#center

Kashu Yamazaki, 2024

Multi-modal Sensory

#center
LiDAR sensor
#center
Stereo Depth sensor


RGBD camera, Microphone
#center
IMU (Gyro/Acceleration/Barometer)


Tactile sensor
#center
Joint Position/Velocity/Torque

Kashu Yamazaki, 2024

Deep Learning

Basics

Kashu Yamazaki, 2024

Backpropagation

Kashu Yamazaki, 2024

Linear/Dense Layer

Kashu Yamazaki, 2024

Convolution Layer

Kashu Yamazaki, 2024

Recurrent Cells

#center

Kashu Yamazaki, 2024

Scaled Dot product Attention

An attention mechanism where the dot products are scaled down by .

  • Motivated by the concern when the input is large, the softmax function may have an extremely small gradient, hard for efficient learning.

  • Calculate similarity from and , and scale by the similarity.

  • Can be viewed as differentiable dictionary.

Kashu Yamazaki, 2024

Multi-Head Attention (MHA)

A module for attention mechanisms which runs through an attention mechanism several times in parallel.

  • The multiple attention heads allows for attending to parts of the sequence differently.
  • When , this is called self-attention.
  • Only a small subset of heads appear to be important for the translation task. Especially the encoder self-attention heads, can be removed without seriously affecting performance [1].
Kashu Yamazaki, 2024

Masked Multi-Head Attention

Masking of the unwanted tokens can be done by setting them to . The binary mask is added to the attention scores so that attention weight will be zero on those unwanted tokens.

Sample implementation:

qkv = to_qvk(x) # to_qvk = nn.Linear(dim, dim_head * heads * 3, bias=False)
q, k, v = tuple(rearrange(qkv, 'b t (d k h) -> k b h t d ', k=3, h=num_heads))
scaled_dot_prod = torch.einsum('b h i d , b h j d -> b h i j', q, k) * scale
scaled_dot_prod = scaled_dot_prod.masked_fill(mask, -np.inf)
attention = torch.softmax(scaled_dot_prod, dim=-1)
out = torch.einsum('b h i j , b h j d -> b h i d', attention, v)
out = rearrange(out, "b h t d -> b t (h d)")
Kashu Yamazaki, 2024

Self-Attention vs Cross-Attention

Self-Attention: all of the and come from the same input source.

  • Each position in the encoder can attend to all positions in the previous layer of the encoder.

Cross-Attention: the come from the reference source and the come from the querying source.

  • This allows every position in the decoder to attend over all positions in the input sequence.
  • One way to realize cross-modal fusion.
Kashu Yamazaki, 2024

Inductive Bias

#center

Inductive bias (IB) is the assumption(s) of data that the model holds [1].

  • CNN: information in the data is locally aggregated. (strong IB)
  • RNN: the data is highly correlated with the previous time. (strong IB)
  • Self Attention: just correlating all features with each other. (weak IB)
Kashu Yamazaki, 2024

Resources: Books

#center

Deep Learning
by Ian Goodfellow, Yoshua Bengio, Aaron Courville

#center

Modern Robotics: Mechanics, Planning, and Control
by Kevin M. Lynch, Frank C. Park

Kashu Yamazaki, 2024

Resources: Online Materials

Kashu Yamazaki, 2024