Derivative-Free Trajectory Optimization

Model Predictive Path Integral control vs Cross-Entropy Method on Dynamic Bicycle and Perching Glider.

CEM and MPPI sample a large batch of trajectories every iteration, giving them quasi-global exploration and near-free GPU parallelization, and this project mainly explores these two zero-order methods.

MPPI for Perching Glider

Model as given in Appendix, the flat-plate perching–glider model is deterministic. As a result, the model does not explicitly represent physical uncertainty; MPPI can still optimize a nominal trajectory, but it cannot reason about distribution of future trajectories under same control sequence.

To place glider within MPPI framework, dynamics is augmented to a stochastic differential equation from \(dX_t = f(X_t,t)\,dt + g(X_t,t)\,dW_t\) to the form \(dX_t = f(X_t,t)\,dt + g(X_t,t)u_t\,dt + g(X_t,t)\,\delta u_t\,dt,\) where $\delta u_t$ is a zero-mean Gaussian control disturbance. In discrete time,

\[dX_t = f(X_t,t)\,dt + g(X_t,t)u_t\,dt + \frac{1}{\sqrt{\rho}}\frac{\epsilon_t}{\sqrt{dt}}\,dt, \quad \epsilon_t \sim \mathcal{N}(0,\sigma^2).\]

Here $\epsilon_t$ is white Gaussian noise, and the Euler–Maruyama scaling $\epsilon_t \sqrt{dt}$ gives the correct $dt$-proportional variance for a Wiener increment. The factor $1/\sqrt{\rho}$ sets the overall noise magnitude relative to the control cost. This construction assumes Gaussian, white noise entering through the control channel, which is exactly the condition under which the path–integral formulation produces the MPPI update via reward-weighted averages of the sampled perturbations $\delta u_t$.


MPPI for Dynamic bicycle

Cross-Entropy Method (CEM)

CEM views trajectory optimization as probabilistic inference: it defines an “elite” distribution \(\tilde p(U)\propto \mathbf{1}_{\{J(U)\le\gamma\}}\,p_{\text{old}}(U)\) and projects it onto a tractable family $p(U;v)$ via reverse KL, \(v_{\text{new}}=\arg\min_v D_{\mathrm{KL}}\big(\tilde p(U)\,\|\,p(U;v)\big).\) The indicator encodes the elite event, so the update reduces to estimating the mean and covariance of the lowest-cost samples (with EMA/variance floors), shifting the sampling distribution toward low-cost modes—analogous to MPPI’s exponential reweighting but using hard elite selection.

CE Bicycle

Comparision

Across equal horizons and rollout budgets, MPPI reduces cost quickly in early iterations because exponential reweighting emphasizes low-cost trajectories, as shown in the figure below; its performance depends on the inverse-temperature $ \kappa $ from $ w_i \propto \exp[-S_i/\kappa] $ and the noise covariance $ \Sigma $ from $ \epsilon_t \sim \mathcal{N}(0,\Sigma) $ in the stochastic dynamics. CEM improves steadily via elite-set refinement, is less sensitive to hyperparameters beyond the elite ratio, but requires more rollouts to match MPPI’s initial descent; with sufficient sampling, both reach similar terminal cost. Both methods parallelize well, though MPPI maps more directly to GPU throughput, while CEM’s mean and covariance updates add minor overhead. Consequently, MPPI is suited for real-time MPC once $ \kappa $ and $ \Sigma $ are tuned, whereas CEM is better for offline or global search due to its robustness and simple tuning.

[ code →]


Appendix: Models

Flat-Plate Perching Glider (2D Longitudinal Model)

We use the planar flat-plate model from Underactuated Robotics (§10.6.1) to describe a glider that perches using only an elevator.

Assumptions
Motion is restricted to the vertical plane $(x,z)$; the vehicle experiences still air and gravity $g$.
The body pitch angle is $\theta$ (nose-up positive).
Two lifting surfaces act: a rigid main wing $w$ and a hinged elevator $e$ deflected by angle $\phi$.
Each surface is modeled as a flat plate generating a normal force

\[f_n = \rho\,S\,\sin\alpha\,\|\dot{\mathbf p}\|^2,\]

where $\alpha$ is its local angle of attack.

State and Input
\(\mathbf x = [x,\,z,\,\theta,\,\phi,\,\dot x,\,\dot z,\,\dot\theta]^\top, \qquad u = \dot\phi.\) The control is the elevator angular rate as servo command.

Let $\mathbf F_w,\mathbf F_e$ be the flat-plate aerodynamic forces acting at the wing and elevator centers of pressure, and $\tau_w,\tau_e$ their moments about the center of mass. Then

\[\begin{aligned} m\,\ddot x &= f_w\sin\theta + f_e\sin(\theta+\phi),\\ m\,\ddot z &= f_w\cos\theta + f_e\cos(\theta+\phi) - m g,\\ I\,\ddot\theta &= l_w f_w + (l_h\cos\phi+l_e) f_e,\\ \dot\phi &= u, \end{aligned}\]

with \(f_w = \rho\,S_w\,\sin\alpha_w\,\|\dot{\mathbf p}_w\|^2, \qquad f_e = \rho\,S_e\,\sin\alpha_e\,\|\dot{\mathbf p}_e\|^2.\)

Parameters
$m, I$: mass and pitch inertia
$\rho$: air density
$S_w, S_e$: surface areas
$l_w, l_h, l_e$: geometry lengths defining the moment arms of each surface.

Flat-Plate Glider Diagram

Dynamic Bicycle Model (Discrete-Time, Euler Integration)

The system is represented by
\(x = [p_x,\, p_y,\, \theta,\, v,\, \delta]^\top, u = [a,\, \dot{\delta}]^\top,\) where $p_x, p_y$: vehicle position in world frame, $\theta$: heading angle, $v$: forward speed, $\delta$: steering angle, $a$: longitudinal acceleration (control), $\dot{\delta}$: steering rate (control).

For time step $\Delta t$ and wheelbase $L$:

\[\begin{aligned} p_x^{+} &= p_x + v\cos\theta\,\Delta t,\\ p_y^{+} &= p_y + v\sin\theta\,\Delta t,\\ \theta^{+} &= \theta + \frac{v}{L}\tan\delta\,\Delta t,\\ v^{+} &= v + a\,\Delta t,\\ \delta^{+} &= \delta + \dot{\delta}\,\Delta t. \end{aligned}\]

Physical and numerical constraints:
\(v \in [v_{\min}, v_{\max}], \quad \delta \in [\delta_{\min}, \delta_{\max}], \quad a \in [a_{\min}, a_{\max}], \quad \dot{\delta} \in [\dot{\delta}_{\min}, \dot{\delta}_{\max}],\) and all angles are wrapped to $[-\pi, \pi]$.


References

[1] Russ Tedrake. Underactuated Robotics Ch.10 Trajectory Optimization Course Notes for MIT 6.832, 2023. https://underactuated.csail.mit.edu/trajopt.html#perching