Shooting vs. Direct Transcription for Trajectory Optimization
Transcription with fmincon/SNOPT and Backpropagation, iLQR, DDP on the Cart–Pole System
Direct trascription
The shooting method (in following) does not achieve particularly high performance on the cart–pole system swing-up task. The most apparent issue is the final few time steps, the control inputs fail to precisely drive both the cart and the pole velocities to zero. Fundamentally, shooting method requires computing high powers of the system matrix $A^{n}$, causing the early control input $u_{[0]}$ to exert a much stronger influence on the overall trajectory and cost than the later control input $u_{[N−1]}$. Additionally, direct transcription treats the state $x[n]$ as an optimization variable, making it straightforward to impose state constraints—for instance, e.g. limiting the cart’s range of motion. Achieving the same behavior in iLQR or DDP is much more complicated, which demands repeated rollouts and fine-tuning of the weighting matrices.
Solved by fmincon (interior-point) without analytical gradient
Solved by SNOPT with analytical gradient — Final state \([x,\theta,\dot{x},\dot{\theta}] = [0.0005,\,3.1416,\,-0.0005,\,0.0005]\)
During the optimization process, SNOPT is significantly faster than the fmincon solver, especially when I increase the time horizon length or tighten the terminal-state constraints. One important point is that I provided analytical gradients when using SNOPT. In addition, I generally find that primal-dual IPM tend to be slower than SQP.
In each iteration, IPM solves large KKT linear systems but an SQP iteration only solves a single quadratic subproblem, which can be much cheaper when the number of active constraints is small and the QP remains more sparse. I guess that the SNOPT solver is better maintained and makes more use of the banded structure of the constraints, their gradients and Hessians. Probably for large-scale problems with many constraints, IPM can be more robust and scalable.
Shooting
backpropagation
In the special case of direct shooting without state constraints, the states are not decision variables in this NLP: with $x_{[0]}$ given, the rollout is $x_{[k+1]}=f(x_{[k]},u_{[k]})$. Introduce multipliers and augment the cost
\[\mathcal{L} =\mathcal{l}_f\big(x_{[N]},u_{[N]}\big) +\sum_{k=0}^{N-1}\mathcal{l}\big(x_{[k]},u_{[k]}\big) +\sum_{k=0}^{N-1}\lambda_{[k]}^{\top}\Big(f\big(x_{[k]},u_{[k]}\big)-x_{[k+1]}\Big).\]Stationarity w.r.t. the states yields the adjoint recursion
\[\lambda_{[N-1]}=\partial_{x_{[N]}}\mathcal{l}_f\big(x_{[N]},u_{[N]}\big),\qquad \lambda_{[k-1]}=\partial_{x_{[k]}}\mathcal{l}\big(x_{[k]},u_{[k]}\big) +\Big(\partial_{x_{[k]}}f\big(x_{[k]},u_{[k]}\big)\Big)^{\top}\lambda_{[k]}, \quad k=1,\dots,N-1.\]which is exactly backpropagation through time. The control gradients come directly from the Lagrangian function:
\[\frac{\partial J}{\partial u_{[k]}} =\partial_{u_{[k]}}\mathcal{l}\big(x_{[k]},u_{[k]}\big) +\Big(\partial_{u_{[k]}}f\big(x_{[k]},u_{[k]}\big)\Big)^{\top}\lambda_{[k]}, \quad k=0,\dots,N-1,\]Algorithm loops through three steps: 1. Forward rollout; 2. Backward adjoint sweep; 3. Gradient extraction.
Backpropagation — Final state \([x,\theta,\dot{x},\dot{\theta}] = [-0.4979,\,3.1548,\,-1.0040,\,2.6198]\)
Iterative LQR and Differential Dynamic Programming
ILQR treats trajectory optimization as sequential LQ optimal control—linearize the dynamics and quadratize the cost. It leverages the Riccati structure to produce a second-order trajectory update with just one backward pass and a forward rollout. Only difference of DDP is its quadratic approximations of the HJB, which require Hessians by nontrivial computing.