Trajectory Optimization with Continuous-Constraint Relaxation

Inter-sample Violation, Minimum Time/Energy Optimal Control, Shooting vs. Direct Transcription

← Back

This project started from a standard constrained nonlinear optimal control formulation and used a 4D nonlinear ODE for a cart–pole with viscous cart friction and joint damping swing-up problem as a testbed. A typical problem can be written as

\[\begin{align} \min_{x(\cdot),\,u(\cdot)} \quad & \ell_f\bigl(x(t_f)\bigr) + \int_{0}^{t_f} \ell\bigl(x(t),u(t)\bigr)\,dt \\ \text{s.t.}\quad & \dot x(t)=f\bigl(x(t),u(t)\bigr),\quad x(0)=x_0,\quad x(t_f)\in\mathcal{X}_f,\\ & u(t)\in \mathcal{U}, \quad \mathbf g\bigl(x(t)\bigr)\le 0 \quad \text{(path constraints)}. \end{align}\]

Here the path constraint $\mathbf g(x)$ is a box constraint on both cart position and pole-tip horizontal position $\in[-c,+c]$.


Discrete feasibility $\neq$ continuous-time feasibility

Direct transcription methods enforce path constraints only at finitely many discretization nodes. However, when the resulting ZOH control is simulated through the continuous-time dynamics, the state trajectory can violate the constraints between nodes as inter-sample violation. This effect becomes more pronounced on coarser grids.

Therefore, following the paper [1] Section 3.1, measurement on the accumulated intensity of continuous-time constraint violation is defined:

\[\begin{equation} y(t) \;=\; \frac{1}{2}\int_{0}^{t} \left\| \max\!\bigl\{ \mathbf{g}(x(\sigma)),\, \vec{\mathbf{0}} \bigr\} \right\|_2^{2}\, d\sigma; \quad \dot y(t)=\frac12\left\|\max\{\mathbf g(x(t)),\vec{\mathbf{0}}\}\right\|_2^2,\quad y(0)=0. \end{equation}\]

And $y(t)$ is enforced by a small tolerance $\delta$ to encourage continuous-time feasibility.

Direct transcription is used to convert the continuous-time problem into an NLP:

  • Decision variables: ${x_k}$, ${u_k}$, and for the relaxed formulation additionally ${y_k}$.
  • Dynamics discretization: RK4 is used to discretize both the original dynamics $\dot x=f(x,u)$ and the augmented dynamics $\dot y(t)$.

In the (A) hard formulation, I enforced $\mathbf g(x_k)\le 0$ at nodes. In the (B) relax formulation, I accumulated violation via the augmented $y$ dynamics and constrained $y$ by $\delta$, thereby indirectly controlling inter-sample constraint violation.


Continuous-time validation

To evaluate continuous-time feasibility, the 60Hz ZOH control sequence ${u_k}$ solver is used in the original continuous-time ODE in MATLAB using $\texttt{ode45}$.

Figures shows the cart position, pole-tip horizontal position and constraint violation. Although both solutions satisfy the node constraints, the hard solution exhibits a small inter-sample overshoot near the active constraint, whereas the relax solution stays typically within the bound.

To summarize feasibility, $g_{\max}(t)\;:=\;\max_i g_i\bigl(x(t)\bigr),$ where $g_{\max}(t)>0$ indicates constraint violation is also plotted. The hard solution produces inter-sample violations, while the relax solution suppresses these excursions and keeps $g_{\max}(t)$ closer to the feasible region. Figure 3,4 plots the RK4-discretized, ZOH control $u$ from the optimizer with the ODE45 rollout: the curve is the continuous trajectory and the markers are the discrete nodes. Inter-sample violations occur for the hard-constrained solution (blue): although the constraint is met tangentially at the discrete nodes. After relaxation, the violation is mitigated (red).


Cart-pole simulation
Figure 1
constraint violation
Figure 2
cart position
Figure 3
pole-tip horizontal position

Table summarizes continuous-time feasibility under ODE45 rollouts. Compared with the node-only hard constraint, the $y$-budget relaxation (with $\delta=10^{-6}$) alleviates the worst-case overshoot, the total time outside feasible set, and integrated constraint-violation.

Formulation Peak violation $\max_t \max_i [g_i(x(t))]_+$ Time in violation $\int \mathbb{1}[g_{\max}(t)>0]\,dt$ Integrated violation $\int |\max(\mathbf g(x(t)),\mathbf 0)|_2^2\,dt$
Hard $8.085\times 10^{-4}$ $7.778\times 10^{-2}\ \mathrm{s}$ $7.053\times 10^{-9}$
Relax $2.243\times 10^{-4}$ $2.028\times 10^{-2}\ \mathrm{s}$ $4.446\times 10^{-10}$

Minimum-Time Optimal Control

Next consider the free-final-time minimum-time problem under the same state/input bounds, terminal constraints, and violation-functional $y(t)\le\delta$.

Here introduce a normalized time variable $\tau\in[0,1]$ with a fixed grid $\tau_k = k/(N-1)$ for $k=0,\ldots,N-1$ (so $\Delta\tau = 1/(N-1)$), and treat $T$ as the free final time; under a ZOH control parameterization on each $\tau$-interval, the physical step size is $\Delta t = T\,\Delta\tau = T/(N-1)$.

Then, continuous-time dynamics

\[\dot x(t) = f\bigl(x(t),u(t)\bigr),\qquad \dot y(t)=\frac12\left\|\max\!\bigl\{\mathbf g(x(t)),\mathbf 0\bigr\}\right\|_2^2,\]

are discretized on the fixed $\tau$-grid with $\Delta t = T\Delta\tau$ using a forward-Euler transcription for presentation:

\[x_{k+1} = x_k + (T\Delta\tau)\, f(x_k,u_k),\quad k=0,\dots,N-2.\]

The violation-functional dynamics discretize accordingly as

\[y_{k+1} = y_k + (T\Delta\tau)\cdot \frac12\left\|\max\!\bigl\{\mathbf g(x_k),\mathbf 0\bigr\}\right\|_2^2,\quad y_0=0,\quad k=0,\dots,N-2.\]

In the code, I discretize both $x$- and $y$-dynamics using an RK4 discretization.

Here uses Euler for notational clarity, in code both the original dynamics and the $y$-dynamics are discretized with an RK4 integrator in the NLP. The resulting minimum-time trajectory optimization problem is

\[\begin{aligned} \min_{\{x_k\},\{u_k\},\{y_k\},\,T}\quad & T\\ \text{s.t.}\quad & x_0=\hat x_0,\qquad x_{N-1}\in\mathcal X_f,\\ & T_{\min}\le T\le T_{\max},\\ & x_{k+1} = x_k + (T\Delta\tau)\, f(x_k,u_k),\quad k=0,\dots,N-2,\\ & x_{\min}\le x_k\le x_{\max},\quad k=0,\dots,N-1,\\ & u_{\min}\le u_k\le u_{\max},\quad k=0,\dots,N-2,\\ & y_{k+1} = y_k + (T\Delta\tau)\cdot \frac12\left\|\max\!\bigl\{\mathbf g(x_k),\mathbf 0\bigr\}\right\|_2^2,\\ & y_0=0,\quad y_{k+1}\le \delta \quad, k=0,\dots,N-2,. \end{aligned}\]

In practice, I first solve a fixed-horizon version of the problem to obtain a trajectory, which is then used to warm-start the solver for the corresponding free-final-time formulation.


Minimum-Fuel Optimal Control

Similarlly, the minimum-fuel trajectory optimization problem is

\[\begin{aligned} \min_{\{x_k\},\{u_k\},\{y_k\}} \quad & \sum_{k=0}^{N-2} \Delta t\, \|u_k\|_1 \\ \text{s.t.}\quad & x_0 = \hat x_0, \qquad x_{N-1}\in \mathcal X_f,\\ & x_{k+1}=x_k+\Delta t\, f(x_k,u_k),\quad k=0,\dots,N-2,\\ & x_{\min}\le x_k\le x_{\max},\quad k=0,\dots,N-1,\\ & u_{\min}\le u_k\le u_{\max},\quad k=0,\dots,N-2,\\ & y_{k+1} = y_k + (\Delta t)\cdot \frac12\left\|\max\!\bigl\{\mathbf g(x_k),\mathbf 0\bigr\}\right\|_2^2,\\ & y_0=0,\quad y_{k+1}\le \delta \quad, k=0,\dots,N-2,. \end{aligned}\]
Minimum-Time
Minimum-Fuel

Direct trascription

The shooting method (in following) does not achieve particularly high performance on the cart–pole system swing-up task. The most apparent issue is the final few time steps, the control inputs fail to precisely drive both the cart and the pole velocities to zero. Fundamentally, shooting method requires computing high powers of the system matrix $A^{n}$, causing the early control input $u_{[0]}$ to exert a much stronger influence on the overall trajectory and cost than the later control input $u_{[N−1]}$. Additionally, direct transcription treats the state $x[n]$ as an optimization variable, making it straightforward to impose state constraints—for instance, e.g. limiting the cart’s range of motion. Achieving the same behavior in iLQR or DDP is much more complicated, which demands repeated rollouts and fine-tuning of the weighting matrices.

Solved by fmincon (interior-point) without analytical gradient
Solved by SNOPT with analytical gradient — Final state \([x,\theta,\dot{x},\dot{\theta}] = [0.0005,\,3.1416,\,-0.0005,\,0.0005]\)

During optimization, I consistently observe that SNOPT is significantly faster than fmincon and IPOPT, especially as I increase the time horizon or tighten terminal-state constraints. With analytical gradients enabled, primal–dual interior-point methods still tend to be slower in my experiments than SQP-style methods. For highly nonlinear, strongly nonconvex, structured optimal control problems, sequential methods—such as SQP and sequential convex approaches (e.g., prox-linear variants)—often appear more effective, although part of this gap may reflect differences in solver maturity and implementation quality. Looking ahead, it would be valuable to explore high-performance OCP solvers such as acados+HPIPM and FATROP. ([4],[5],[6])


Shooting

backpropagation

In the special case of direct shooting without state constraints, the states are not decision variables in this NLP: with $x_{[0]}$ given, the rollout is $x_{[k+1]}=f(x_{[k]},u_{[k]})$. Introduce multipliers and augment the cost; Stationarity w.r.t. the states yields the adjoint recursion which is exactly backpropagation through time. The control gradients come directly from the Lagrangian function. Algorithm loops through three steps: 1. Forward rollout; 2. Backward adjoint sweep; 3. Gradient extraction. (Details see [2] Section 10.4.2)

Backpropagation — Final state \([x,\theta,\dot{x},\dot{\theta}] = [-0.4979,\,3.1548,\,-1.0040,\,2.6198]\)

Iterative LQR and Differential Dynamic Programming

ILQR treats trajectory optimization as sequential LQ optimal control—linearize the dynamics and quadratize the cost. It leverages the Riccati structure to produce a second-order trajectory update with just one backward pass and a forward rollout. Only difference of DDP is its quadratic approximations of the HJB, which require Hessians by nontrivial computing.

DDP — Final state \([x,\theta,\dot{x},\dot{\theta}] = [-0.2494,\,3.1052,\,0.0636,\,0.5906]\)
ILQR — Final state \([x,\theta,\dot{x},\dot{\theta}] = [-0.2793,\,3.0954,\,0.0426,\,0.7569]\)

[code →]


References

[1]. Elango, P., Luo, D., Kamath, A. G., Uzun, S., Kim, T., & Açıkmeşe, B. Continuous-time successive convexification for constrained trajectory optimization. Automatica, vol. 180, Art. no. 112464, Oct. 2025. doi:10.1016/j.automatica.2025.112464.

[2]. Russ Tedrake. Underactuated Robotics Ch.10 Trajectory Optimization Course Notes for MIT 6.832, 2023. https://underactuated.csail.mit.edu/trajopt.html

[3]. Yue Yu. Numerical trajectory optimization Course Notes for UMN AEM 5431

[4]. Robin Verschueren, Gianluca Frison, Dimitris Kouzoupis, Jonathan Frey, Niels van Duijkeren, Andrea Zanelli, Branimir Novoselnik, Thivaharan Albin, Rien Quirynen, and Moritz Diehl. acados: a modular open-source framework for fast embedded optimal control. arXiv:1910.13753, 2019. arXiv · PDF

[5]. Gianluca Frison and Moritz Diehl. HPIPM: a high-performance quadratic programming framework for model predictive control. arXiv:2003.02547, 2020. arXiv · PDF

[6]. Lander Vanroye, Ajay Sathya, Joris De Schutter, and Wilm Decré. FATROP: A Fast Constrained Optimal Control Problem Solver for Robot Trajectory Optimization and Control. arXiv:2303.16746, 2023. arXiv · PDF