01 — How it runs

The closed loop

The system runs as a closed feedback loop at the camera frame rate. The same six steps repeat at every control tick.

① Camera RealSense D435i RGB frame ② Perception ArUco + Kalman X ③ Controller PID · MPC · PID+resid. (ux, uy) ④ Joint mapping tilt → joint motion Δq ⑤ Arm firmware internal torque loop arm moves ⑥ Plate tilts; ball rolls ẍ ∝ θy next camera frame
Figure. One control cycle. The camera and perception stages recover the ball state from the image; the controller and joint-mapping stages turn that state into a joint command; the arm firmware actuates and the plate tilts. The next camera frame closes the loop.
01

Perception

Estimates the ball position and velocity in the plate frame from the camera image.

  • ArUco markers + homography
  • HSV blob detection
  • Kalman filter (position + velocity)
output: $(x, y, \dot x, \dot y)$
02

Control

Maps the tracking error to a joint trajectory consumable by the UR firmware.

  • PID per axis (P · I · D)
  • Jacobian inverse (least squares)
  • Smoothing + safety
output: $q_{\text{tgt}}$
03

Actuation

Sends the joint increment to the arm and lets the manufacturer's controller close the inner torque loop.

  • joint trajectory at 30 Hz
  • gravity compensation built-in
  • PD on each joint
→ UR7e arm

02 — Inputs and reference frames

Inputs and reference frames

Plate frame $\{P\}$

Origin at the plate centre. $\hat x_P$ along the right edge, $\hat y_P$ up (overhead view), $\hat z_P$ along the plate normal.

World frame $\{W\}$

Fixed at the UR7e base, with $\hat z_W$ pointing up against gravity. The Jacobian acts on rotations expressed in $\{W\}$.

Ball state & reference

At every control step the ball position and velocity in $\{P\}$ are measured: $X = (x, y, \dot x, \dot y) \in \mathbb{R}^4$. A reference trajectory $r(t)$ supplies the desired ball position and velocity at the same instant: $(x_r, y_r, \dot x_r, \dot y_r)$.

$X = (x, y, \dot x, \dot y) \in \mathbb{R}^4$

Tracking errors

The position error $e_p = (x - x_r, y - y_r)$ measures how far the ball is from where it should be; the velocity error $e_v$ measures the rate at which that gap is changing.

Controller output: target tilt

The position-level controller consumes the errors and emits a scalar tilt pair $(u_x, u_y) \in \mathbb{R}^2$ for the plate. Three schemes are compared: a stand-alone PID, a model-predictive controller (MPC), and the PID baseline augmented by a learned residual policy. By convention $u_x = 0$ is horizontal; $u_x > 0$ specifies a tilt that drives the ball toward $-\hat x_P$.

$(u_x, u_y) \in \mathbb{R}^2$
P ŷP {P} target (xr, yr) ball (x, y) −ep

Plate frame $\{P\}$, ball position, reference target, and tracking error vector.

A 03 — Position-level controller

PID — feedback on the error

Two independent scalar loops, one per plate axis. Each loop maps its tracking error to a tilt command.

1

The error signal

Per axis, two scalars: $e_p = x - x_r$ (position error) and $e_v = \dot x - \dot x_r$ (velocity error). Both come from the perception module.

2

P, I, D terms

$K_p\, e_p$ pulls the ball toward the reference, $K_d\, e_v$ damps the response, $K_i \!\int\! e_p\, dt$ cancels slow steady-state offsets. Three gains per axis.

3

One loop per axis

At small tilts the $x$ and $y$ dynamics decouple. Two scalar loops with shared gains; no cross-coupling.

4

Tuned on the real plate

Gains hand-picked on the live arm. This is the controller deployed on the live system.

$u_x \;=\;$ $K_p\, e_p$ + $K_i\!\!\int\! e_p\, dt$ + $K_d\, e_v$

position  +  integral  +  damping  ·  per axis

B 04 — Position-level controller

MPC — planning ahead

An optimisation-based controller that plans the immediate future. At each control tick it picks the sequence of future tilt commands that would best follow the reference, applies only the first command, and replans on the next tick from the new state.

1

Plant model

Under small tilts the ball-on-plate becomes a double integrator: a tilt $u$ produces a constant horizontal acceleration $\ddot x = -\beta\, u$, with $\beta = \tfrac{5}{7}\, g$ from the rolling equation.

Each plate axis sees its own copy of this two-state, one-input system, with state $(x, \dot x)$.

2

Plan $N$ steps ahead

At every control tick the controller proposes a candidate sequence of $N = 25$ future tilt commands $u_0, u_1, \dots, u_{N-1}$.

The plant model is rolled forward under this sequence to predict the resulting ball trajectory $x_1, x_2, \dots, x_N$ over the horizon.

3

Score the plan

Each candidate sequence is scored by a single number: the sum, over the horizon, of the tracking error $\|x_k - r_k\|$ (weighted by $Q$) and the control effort $\|u_k\|$ (weighted by $R$).

The MPC keeps the candidate sequence that minimises this cost.

4

Apply the first input, replan

Only the first command $u_0$ of the optimal sequence is sent to the plate. By the next tick the state has changed, so the controller replans from scratch.

This receding-horizon principle is what makes MPC a feedback controller: every plan uses fresh state measurements.

$\displaystyle \arg\min_{u_{0:N}}$ $\sum_{k=0}^{N} \|x_k - r_k\|_Q^2$ + $\|u_k\|_R^2$

tracking  +  effort  ·  horizon $N{=}25$

C 05 — Position-level controller

Residual PPO — learned correction on top of PID

The hand-tuned PID is reliable at the conditions it was tuned for, and brittle outside them. The goal of the Residual PPO scheme is to extend that robust operating envelope: keep the PID, add a small neural correction that activates when conditions drift away from nominal.

1

The problem PID alone cannot solve

At nominal conditions the PID from § 03 tracks the reference reliably. But ball mass varies a few percent, surface friction depends on humidity and dust, the camera has latency, the model is not quite the real plate.

Outside the tuning point the PID degrades, and it has no parameter to compensate.

2

The residual idea

Keep the PID running. Add a small learned correction on top:

$u = \mathrm{PID}(e) + 0.3 \cdot \pi_\theta(o)$

where $\pi_\theta(o) \in [-1, 1]^2$ is the output of a neural policy. The factor $0.3$ bounds the correction: the network can fine-tune but cannot overrule the baseline.

3

A richer observation

The network sees more than the PID does: a window of recent ball trajectory, plate orientation, and past commands.

With that context it can infer, for example, that friction is higher than usual, and inject the corrective tilt the PID would have needed.

4

Trained across many worlds

Training in MuJoCo sweeps the parameters the PID is blind to: ball mass, surface friction, sensing delay, gravity scale.

The residual learns to recognise each regime and apply the correction that closes the gap to the nominal performance.

$u \;=\;$ $\mathrm{PID}(e)$ + $0.3 \cdot \pi_\theta(o)$

baseline  +  residual, bounded

On the hardest benchmark domain (gravity scaled by $1.5\times$), the residual keeps the ball on the plate $97\%$ of the time, while the stand-alone PID baseline drops it on $70\%$ of episodes.

PID only

70%
drop rate · high gravity

Residual PPO

3%
drop rate · high gravity

06 — Control pipeline

From tracking error to joint trajectory

With the input, frame and controller conventions fixed, the system maps the tracking error in $\{P\}$ to a joint trajectory in five sequential stages. The first stage is one of the schemes above (PID, MPC, or PID with a learned residual); the remaining four are shared.

INPUT  ball position & velocity error OUTPUT  joint trajectory to the arm
1

Position-level controller

$(e_x,\, e_y,\, \dot e_x,\, \dot e_y) \;\longmapsto\; (u_x,\, u_y)$

Maps the tracking error to a pair of scalar target tilt angles, one per plate axis. Three schemes expose the same $(u_x, u_y)$ output: PID, MPC, or PID augmented by a learned residual policy. See the controller section above for their equations and roles.

Output: $u_x, u_y \in \mathbb{R}$, scalar tilt angles in the plate frame. The next step lifts them into a 3-D rotation expressed in world coordinates so the manipulator Jacobian can act on them.
2

Wiring

$\theta_{\text{tar}} \;=\; u_y\, \hat x_P^{\,W}(q) \;-\; u_x\, \hat y_P^{\,W}(q)$

The plate's body axes expressed in world coordinates, $\hat x_P^{\,W}(q)$ and $\hat y_P^{\,W}(q)$, are the first two columns of $R_{WP}(q)$. The wiring formula composes the two scalar tilts into a single rotation 3-vector in $\{W\}$. The minus sign follows from the rolling-without-slipping equation (derivation).

Output: $\theta_{\text{tar}} \in \mathbb{R}^3$, the absolute target rotation of the plate. The Jacobian, however, is a local linearisation; passing it the absolute target produces inconsistent joint motion as soon as the plate is no longer near home.
3

Increment

$\Delta\theta \;=\; \theta_{\text{tar}} \;-\; \theta_{\text{cur}}$

The current plate rotation $\theta_{\text{cur}}$ is read off the forward-kinematics of the arm and subtracted from the target. The residual passed to the inverse-kinematics step is genuinely small, so the linearisation $J\,\Delta q \approx \Delta\theta$ holds.

Output: $\Delta\theta \in \mathbb{R}^3$, a small rotation increment from the current pose. The IK now needs to find a joint displacement $\Delta q$ that produces this rotation through the manipulator's wrist.
4

Inverse kinematics

$\Delta q \;=\; J_r(q_{\text{cur}})^{+}\, \Delta\theta$

The rotational Jacobian $J_r \in \mathbb{R}^{3\times 6}$ relates a small joint displacement to the resulting plate rotation. Locking the upper-arm joints at their home configuration reduces the system to a well-conditioned $3\times 3$ on the three wrist columns, and the pseudoinverse maps $\Delta\theta$ back to a wrist-joint correction.

Output: $\Delta q \in \mathbb{R}^6$, a per-axis joint correction. Before reaching the arm it is clipped and low-pass filtered to remove the high-frequency content that would excite the structure of the manipulator.
5

Smoothing and dispatch

$q_{\text{tgt}} \;=\; q_{\text{cur}} \;+\; \Delta q_{\text{filt}}$

The filtered joint correction is added to the current configuration and sent to the arm at 30 Hz. The Universal Robots firmware tracks the command and the plate moves accordingly.

07 — Wiring

Lifting the scalars to a world-frame rotation

The controller outputs $u_x, u_y \in \mathbb{R}$ are scalar tilt angles about the plate's local $\hat x_P$ and $\hat y_P$ axes. The manipulator Jacobian acts on rotation vectors expressed in $\{W\}$, so the wiring stage assembles the target rotation 3-vector from the scalars and the plate-axis columns of $R_{WP}(q)$ at the current configuration.

$$\theta_{\text{tar}} = u_y\, \hat x_P^{\,W}(q) - u_x\, \hat y_P^{\,W}(q)$$

The sign convention is fixed by the rolling-without-slipping equation $\ddot x \approx \tfrac{5}{7} g\, \theta_y$: a negative tilt about the world's $\hat y_W$ axis is required to produce a negative ball acceleration along $\hat x_P$.

W world P θy wrist (pivot)

08 — Increment

Rotation increment from current to target

The rotational Jacobian $J_r(q_{\text{cur}})$ is a first-order Taylor expansion around the current configuration; the relation $J_r\,\Delta q \approx \Delta\theta$ holds only for small rotation increments from $q_{\text{cur}}$. Subtracting the current plate rotation $\theta_{\text{cur}}$ from the absolute target yields a residual $\Delta\theta = \theta_{\text{tar}} - \theta_{\text{cur}}$ small enough for the linearisation to be accurate.

$$\Delta\theta = \theta_{\text{tar}} - \theta_{\text{cur}}$$
target θ_tar current θ_cur Δθ to IK

09 — Inverse kinematics

Inverse kinematics via the rotational Jacobian

The rotational Jacobian $J_r(q) \in \mathbb{R}^{3\times 6}$ relates joint displacements to plate rotations. The system $J_r\,\Delta q = \Delta\theta$ is underdetermined (three equations in six unknowns); the minimum-norm solution is given by the Moore–Penrose pseudoinverse $J_r^{+}$. With the upper-arm joints locked at the home configuration, only the three wrist columns of $J_r$ are free, so the effective system is $3\times 3$ and well-conditioned.

$$\Delta q \;=\; J_r(q_{\text{cur}})^{+}\, \Delta\theta$$

The pseudoinverse is solved via a singular-value decomposition for numerical stability.

Jr (3×6) Δq (6×1) Δθ (3×1) · 0 0 0 = locked (zero) wrist (free)

10 — Rolling-without-slipping dynamics

Underlying physics

The in-plane acceleration of a ball rolling without slipping on a plate tilted by $\theta$ is $\tfrac{5}{7}\, g \sin\theta$. The factor $5/7$ accounts for the rotational inertia of a solid sphere; $g \sin\theta$ is the gravity component along the plate.

rolling equation, per axis $$\ddot x \;\approx\; \tfrac{5}{7}\, g\, \theta_y, \qquad \ddot y \;\approx\; -\tfrac{5}{7}\, g\, \theta_x$$
Full derivation