Model-Based Fixed-Wing Perching

← Back

This project study fixed-wing perching: landing a small glider onto a perch by pitching aggressively into a high-angle-of-attack, post-stall flight to shed speed quickly, then reaching the perch with near-zero relative velocity. This is a control setting with strongly nonlinear dynamics, severe model uncertainty. However, this project explores model-based methods rather than model-free approaches: once I commit to a minimal physics-based baseline as flat-plate glider and learn only the control-relevant residual structure, model-based trajectory optimization and linear optimal control become remarkably effective. Projects include identification, control, estimation, learning. Project includes control and identification; zero-order trajectory optimization; iterative model-based RL via iLQR.

[ code →]

Flat-Plate Glider Model

The standard planar (longitudinal) 7D glider $\mathbf{x}=[x,z,\theta,\phi,\dot x,\dot z,\dot\theta]^\top\in\mathbb{R}^7$ with elevator-rate input $u=\dot\phi$, and a flat-plate baseline in which each lifting surface contributes a normal force

\[f_n(S,\mathbf{n},\mathbf{v})=\rho S\sin\alpha\,\|\mathbf{v}\|^2=-\rho S(\mathbf{n}\!\cdot\!\mathbf{v})\|\mathbf{v}\|,\]

evaluated at the surface center of pressure with geometry fixed by $(l_w,l_h,l_e)$; with wing/elevator normals $\mathbf{n}_w=[\sin\theta,\ \cos\theta]^\top$ and $\mathbf{n}_e=[\sin(\theta+\phi),\ \cos(\theta+\phi)]^\top$ and forces $f_w=f_n(S_w,\mathbf{n}_w,\dot{\mathbf{p}}_w)$, $f_e=f_n(S_e,\mathbf{n}_e,\dot{\mathbf{p}}_e)$ (still-air in the controller model), the equations of motion are

$$ \begin{aligned} \ddot x &= \frac{1}{m}\!\left(f_w \sin\theta + f_e \sin(\theta+\phi)\right), \\ \ddot z &= \frac{1}{m}\!\left(f_w \cos\theta + f_e \cos(\theta+\phi)\right) - g, \\ \ddot\theta &= \frac{1}{I}\!\left(l_w f_w + (l_h\cos\phi + l_e) f_e\right), \\ \dot\phi &= u. \end{aligned} $$

and all three projects treat model mismatch through (learned/estimated) low-dimensional parameterizations of the aerodynamic residuals while stabilizing nominal perching motions with local linear feedback (finite-difference linearization + LQR/TVLQR) of the form $u(t)=u^\star(t)-K(t)(x(t)-x^\star(t))$.

Online SysID and residual learning

Project start with aerodynamic residual learning with online parameter identification/calibration in state estimator on the same 7D flat-plate glider. A nonlinear correction to the flat-plate lift model is learned via an RBF feature map that is linear in weights:

\[\widehat{C}_l(\alpha,e)=\theta^\top\phi([\alpha;e]),\qquad \phi(x)=\bigl[\exp(-\tfrac12(x-\mu_i)^\top\Sigma^{-1}(x-\mu_i))\bigr]_{i=1}^{N_p}\oplus 1,\]

where ${\mu_i}$ tile the $(\alpha,e)$ domain and $\Sigma$ sets the spread; stacking $N$ samples yields $\Phi\in\mathbb{R}^{N\times(N_p+1)}$ and ridge regression

\[\theta=\arg\min_\theta \|\Phi\theta-y\|_2^2+\gamma\|\theta\|_2^2 \;=\;(\Phi^\top\Phi+\gamma I)^{-1}\Phi^\top y,\]

so the corrected coefficient $C_{l,\mathrm{fp}}(\alpha,e)+\widehat{C}_l(\alpha,e)$ matches measured data and stored models ($\texttt{model_coeffs.mat}$). (B) For online identification under closed-loop flight, I model the nonlinear dynamics as

\[\dot x=f(x,u;\theta,w),\qquad \theta=\begin{bmatrix}l_w\\ S_e\end{bmatrix},\qquad w=[v_x^0,v_z^0],\]

stabilize around a desired operating point with an LQR designed from numerical linearization $(A,B)\approx\left(\frac{\partial f}{\partial x},\frac{\partial f}{\partial u}\right)\ \text{evaluated at }(x_d,u_d)$, and estimate state and aerodynamic parameters jointly using an augmented-state EKF with $\bar x=[x;\theta]$ and constant-parameter process model

\[\dot{\bar x}=\bar f(\bar x,u)=\begin{bmatrix}f(x,u;\theta,w)\\0\end{bmatrix},\qquad y=H\bar x+v,\ \ H=[I\ \ 0].\]

The outputs (perching trajectory and $\hat\theta(t)$ traces) demonstrate that (i) compact nonlinear residual models can be fit by linear least squares, and (ii) identifiability-driven parameter estimation can be integrated into closed-loop perching on the same glider platform.

The lift coefficient is represented as a flat-plate baseline plus a learned residual,

\[C_L(\alpha,\delta_e)=C_{L,\mathrm{fp}}(\alpha,\delta_e)+\Delta C_L(\alpha,\delta_e), \qquad \widehat{\Delta C_L}(\alpha,\delta_e)=\theta_{Cl}^\top \phi([\alpha;\delta_e]),\]

and estimate the RBF weights via ridge regression,

\[\theta_{Cl}=(\Phi^\top\Phi+\gamma I)^{-1}\Phi^\top y,\qquad \gamma=0.1.\]

Residual-fit validation: data $C_L^{\mathrm{data}}$, flat-plate baseline $C_{L,\mathrm{fp}}$, and corrected prediction $C_{L,\mathrm{fp}}+\widehat{\Delta C_L}$.

Model comparison: corrected prediction using our learned weights $\theta_{Cl}$ versus the provided weights in model_coeffs.mat.

Nonlinear parameter estimation (closed-loop flight)

EKF estimates of the embedded aerodynamic parameters $\hat l_w(t)$ and $\hat S_e(t)$

resulting planar trajectory $(x(t),z(t))$ under the simulated wind

Zero-Order trajectory optimization: MPPI & CEM

Let’s explore perching as a finite-horizon derivative-free trajectory optimization problem where computation is dominated by rollouts. Over $T=0.75$ with $dt=0.01$ and $N=\lfloor T/dt\rfloor$, each candidate open-loop control sequence $U=[u_1,\dots,u_N]$ is evaluated by an Euler rollout of the nominal still-air model, and the quadratic running/terminal objective.

To account for dynamics-level uncertainty (unmodeled aerodynamic residuals, gusts, and actuation variability), MPPI replaces the deterministic rollout model with a stochastic control-channel perturbation model: instead of applying $u(t)$ exactly, the executed input is corrupted by white Gaussian noise,

\[dX_t \;=\; f(X_t)\,dt \;+\; g(X_t,t)\!\left( u(X_t,t) \;+\; \frac{1}{\sqrt{\rho}}\frac{\epsilon_t}{\sqrt{dt}} \right)dt .\]

where $\epsilon_t \sim \mathcal N(0,\nu)$ is a white-noise sample, $\rho>0$ scales the noise magnitude, $g(X_t,t)$ determines which channels the noise is injected into. This is a control corrupted stochastic differential equation (SDE). The goal is still find policy minimize $J(u) = \mathbb{E}!\left[ \phi(x_T) + \int_{0}^{T} \ell(t, x_t, u_t)\, dt \right]$.

Under the time-discretization used in our implementation, this is realized by sampling i.i.d. perturbations at each step,

\[u_k^{(j)} = U_k + \delta u_k^{(j)},\qquad \delta u_k^{(j)} \sim \mathcal N\!\left(0,\ \frac{\nu}{\rho\,dt}\right).\]

and rolling out the perturbed trajectories $x_{k+1}^{(j)} = x_k^{(j)} + f\bigl(t_k,x_k^{(j)},u_k^{(j)}\bigr)\,dt.$

Cross-Entropy Method (CEM) maintains a diagonal Gaussian over the open-loop sequence $u_k\sim\mathcal N(U_k,\sigma_k^2)$ and iteratively updates mean/variance by elite-set MLE,

\[U_k\leftarrow\frac{1}{|\mathcal E|}\sum_{j\in\mathcal E}u_k^{(j)},\qquad \sigma_k^2\leftarrow\mathrm{Var}\bigl(\{u_k^{(j)}\}_{j\in\mathcal E}\bigr).\]

In contrast, MPPI explicitly uses the above stochastic rollouts: for each rollout $j$, it computes the cost-to-go $S_k^{(j)}=\sum_{\ell=k}^{N+1}J_\ell^{(j)}$, forms exponential path-integral weights $w_k^{(j)}\propto\exp(-S_k^{(j)}/\kappa)$, and applies the reward-weighted perturbation update

\[U_k\leftarrow U_k+\frac{\sum_{j=1}^{M}w_k^{(j)}\,\delta u_k^{(j)}}{\sum_{j=1}^{M}w_k^{(j)}}.\]

Empirically, under matched horizons and rollout budgets, MPPI typically achieves sharper early cost reduction due to exponential reweighting (sensitive to $(\kappa,\Sigma)$), whereas CEM improves more steadily via elite refinement with simpler tuning but often needs more rollouts to match MPPI’s initial descent; with sufficient sampling, both reach comparable terminal cost. The practical takeaway is that sampling alone can solve perching but is rollout-expensive, motivating the structured model learning and feedback-stabilized planning.

MPPI sampled rollouts

MPPI closed-loop flight

CEM sampled rollouts

CEM closed-loop flight

Model-based learning

Finally, a guided-policy-search-style loop that alternates planning is implemented on a learned dynamics model, executing the resulting feedback policy on a separate (higher-fidelity) simulator under uncertainty, and re-identifying model parameters from rollout data:

\[\mathrm{plan\ (iLQR)}\ \rightarrow\ \mathrm{execute\ (sim)}\ \rightarrow\ \mathrm{identify\ (ridge\ LS)}\ \rightarrow\ \mathrm{re\text{-}plan}.\]

The learned model uses a linear-in-parameters aerodynamic augmentation (RBF features) so that

\[\dot x = f(x,u) + g(x,u)\theta, \qquad \theta=\begin{bmatrix}\theta_{Cl}\\\theta_{Cd}\\\theta_{Cm}\end{bmatrix},\]

with deliberate model mismatch between planner and simulator via different aerodynamic inputs (planner uses $(\alpha,\delta_e)$ while simulator uses $(\alpha,\delta_e,V)$).

At iteration $j$, iLQR optimizes on the current learned model using a terminal objective and a trust-region-style quadratic shaping term around the previous nominal $(\bar x_k,\bar u_k)$, thereby producing an updated nominal trajectory ${x_k^{\mathrm{nom}},u_k^{\mathrm{nom}}}_{k=0}^{N-1}$ and time-varying gains ${K_k,k_k}$. Concretely, iLQR maintains a nominal pair $(x_k^{\mathrm{nom}},u_k^{\mathrm{nom}})$ and the backward pass computes a local policy in deviations $\delta x_k := x_k-x_k^{\mathrm{nom}}$:

\[\delta u_k = k_k + K_k\,\delta x_k,\]

where $k_k$ is a feedforward increment and $K_k$ is the feedback gain. The forward pass then updates the nominal by rollout under the improved policy,

\[u_k^{\mathrm{new}} = u_k^{\mathrm{nom}} + \alpha k_k + K_k\bigl(x_k^{\mathrm{new}}-x_k^{\mathrm{nom}}\bigr), \qquad x_{k+1}^{\mathrm{new}} = f\!\left(x_k^{\mathrm{new}},u_k^{\mathrm{new}}\right),\]

and sets $(x^{\mathrm{nom}},u^{\mathrm{nom}})\leftarrow(x^{\mathrm{new}},u^{\mathrm{new}})$. Hence $k_k$ appears only in the forward-pass nominal update; after convergence, execution tracks the finalized nominal baseline using the standard local feedback law

\[u_k = u_k^{\mathrm{nom}} + K_k\,(x_k-x_k^{\mathrm{nom}}).\]

Each rollout provides a dataset with finite-difference derivatives $\dot x_i\approx (x_{i+1}-x_i)/dt$. Identification exploits the linear parameterization by stacking $g(x_i,u_i)$ into $\Phi$ (typically only on aerodynamically affected components) and solving the ridge-regularized least squares

\[\theta_{j+1} =\arg\min_\theta \|\Phi\theta-(\dot x^{\text{data}}-f)\|_2^2+\gamma\|\theta\|_2^2 \;=\;(\Phi^\top\Phi+\gamma I)^{-1}\Phi^\top(\dot x^{\text{data}}-f),\]

then writing $\theta_{j+1}$ back into the planner model. Tracking $J_j=\ell_f(x_N)$ and $|\theta_{j+1}-\theta_j|$ illustrates the intended behavior: per-iteration improvement in perching performance alongside convergence of the learned aerodynamic residual parameters.

Evolution of the nominal planar trajectory $(x^{\mathrm{traj}}(t), z^{\mathrm{traj}}(t))$ during iLQR inner iterations on the current learned model. At each inner iteration $i$, the trajectory is obtained by a forward pass under the locally linearized dynamics and the updated gains $(K_i, k_i)$ from the backward pass; the displayed cost $J_i$ illustrates convergence of the iLQR solve.

Final converged nominal trajectory and control policy produced by iLQR on the learned dynamics model. This solution $\{x^\star(t), u^\star(t)\}$, together with the time-varying gains $K^\star(t)$, is used as the reference for closed-loop execution via $u(t) = u^\star(t) + K^\star(t)\bigl(x(t) - x^\star(t)\bigr)$.

Learning progress under model-based RL via iLQR

Cost vs. Learning Iteration

Parameter Change vs.\ Learning Iteration

References

[1] Russ Tedrake. Underactuated Robotics Ch.10 Trajectory Optimization Course Notes for MIT 6.832, 2023. https://underactuated.csail.mit.edu/trajopt.html#perching

[1] Russ Tedrake. Underactuated Robotics Ch.11 Policy Search Course Notes for MIT 6.832, 2023. https://underactuated.csail.mit.edu/policy_search.html

[2] Cory, R., & Tedrake, R. Experiments in Fixed-Wing UAV Perching. In AIAA Guidance, Navigation, and Control Conference and Exhibit, Honolulu, HI, USA, Aug. 18–21, 2008, Paper No. 2008-7256, pp. 1–12. doi:10.2514/6.2008-7256.

[3] Roberts, J. W., Cory, R., & Tedrake, R. On the Controllability of Fixed-Wing Perching. In Proceedings of the American Control Conference (ACC 2009), St. Louis, MO, USA, Jun. 10–12, 2009, pp. 2018–2023. doi:10.1109/ACC.2009.5160526.

[4] Moore, J., Cory, R., & Tedrake, R. Robust Post-Stall Perching with a Simple Fixed-Wing Glider using LQR-Trees. Bioinspiration & Biomimetics, vol. 9, no. 2, Art. no. 025013, Jun. 2014. doi:10.1088/1748-3182/9/2/025013.

[5] Joseph Moore. Learning based control for robotics Course Notes for JHU ME696, 2025.