Model-Based Fixed-Wing Perching
Minimal flat-plate dynamics; data-efficient identification; zero order trajectory optimization; model-based RL via iLQR
This project study fixed-wing perching: landing a small glider onto a perch by pitching aggressively into a high-angle-of-attack, post-stall flight to shed speed quickly, then reaching the perch with near-zero relative velocity. This is a control setting with strongly nonlinear dynamics, severe model uncertainty. However, this project explores model-based methods rather than model-free approaches: once I commit to a minimal physics-based baseline as flat-plate glider and learn only the control-relevant residual structure, model-based trajectory optimization and linear optimal control become remarkably effective. Projects include identification, control, estimation, learning. Project includes control and identification; zero-order trajectory optimization; iterative model-based RL via iLQR.
Flat-Plate Glider Model
The standard planar (longitudinal) 7D glider $\mathbf{x}=[x,z,\theta,\phi,\dot x,\dot z,\dot\theta]^\top\in\mathbb{R}^7$ with elevator-rate input $u=\dot\phi$, and a flat-plate baseline in which each lifting surface contributes a normal force
\[f_n(S,\mathbf{n},\mathbf{v})=\rho S\sin\alpha\,\|\mathbf{v}\|^2=-\rho S(\mathbf{n}\!\cdot\!\mathbf{v})\|\mathbf{v}\|,\]evaluated at the surface center of pressure with geometry fixed by $(l_w,l_h,l_e)$; with wing/elevator normals $\mathbf{n}_w=[\sin\theta,\ \cos\theta]^\top$ and $\mathbf{n}_e=[\sin(\theta+\phi),\ \cos(\theta+\phi)]^\top$ and forces $f_w=f_n(S_w,\mathbf{n}_w,\dot{\mathbf{p}}_w)$, $f_e=f_n(S_e,\mathbf{n}_e,\dot{\mathbf{p}}_e)$ (still-air in the controller model), the equations of motion are
and all three projects treat model mismatch through (learned/estimated) low-dimensional parameterizations of the aerodynamic residuals while stabilizing nominal perching motions with local linear feedback (finite-difference linearization + LQR/TVLQR) of the form $u(t)=u^\star(t)-K(t)(x(t)-x^\star(t))$.
Project start with aerodynamic residual learning with online parameter identification/calibration in state estimator on the same 7D flat-plate glider. I learn a nonlinear correction to the flat-plate lift model via an RBF feature map that is linear in weights:
\[\widehat{C}_l(\alpha,e)=\theta^\top\phi([\alpha;e]),\qquad \phi(x)=\bigl[\exp(-\tfrac12(x-\mu_i)^\top\Sigma^{-1}(x-\mu_i))\bigr]_{i=1}^{N_p}\oplus 1,\]where ${\mu_i}$ tile the $(\alpha,e)$ domain and $\Sigma$ sets the spread; stacking $N$ samples yields $\Phi\in\mathbb{R}^{N\times(N_p+1)}$ and ridge regression
\[\theta=\arg\min_\theta \|\Phi\theta-y\|_2^2+\gamma\|\theta\|_2^2 \;=\;(\Phi^\top\Phi+\gamma I)^{-1}\Phi^\top y,\]so the corrected coefficient $C_{l,\mathrm{fp}}(\alpha,e)+\widehat{C}_l(\alpha,e)$ matches measured data and stored models ($\texttt{model_coeffs.mat}$). (B) For online identification under closed-loop flight, I model the nonlinear dynamics as
\[\dot x=f(x,u;\theta,w),\qquad \theta=\begin{bmatrix}l_w\\ S_e\end{bmatrix},\qquad w=[v_x^0,v_z^0],\]stabilize around a desired operating point with an LQR designed from numerical linearization $(A,B)\approx\left(\frac{\partial f}{\partial x},\frac{\partial f}{\partial u}\right)\ \text{evaluated at }(x_d,u_d)$, and estimate state and aerodynamic parameters jointly using an augmented-state EKF with $\bar x=[x;\theta]$ and constant-parameter process model
\[\dot{\bar x}=\bar f(\bar x,u)=\begin{bmatrix}f(x,u;\theta,w)\\0\end{bmatrix},\qquad y=H\bar x+v,\ \ H=[I\ \ 0].\]The outputs (perching trajectory and $\hat\theta(t)$ traces) demonstrate that (i) compact nonlinear residual models can be fit by linear least squares, and (ii) identifiability-driven parameter estimation can be integrated into closed-loop perching on the same glider platform.
The lift coefficient is represented as a flat-plate baseline plus a learned residual,
\[C_L(\alpha,\delta_e)=C_{L,\mathrm{fp}}(\alpha,\delta_e)+\Delta C_L(\alpha,\delta_e), \qquad \widehat{\Delta C_L}(\alpha,\delta_e)=\theta_{Cl}^\top \phi([\alpha;\delta_e]),\]and estimate the RBF weights via ridge regression,
\[\theta_{Cl}=(\Phi^\top\Phi+\gamma I)^{-1}\Phi^\top y,\qquad \gamma=0.1.\]model_coeffs.mat. I then treat perching as a finite-horizon derivative-free trajectory optimization problem where computation is dominated by rollouts. Over $T=0.75$ with $dt=0.01$ and $N=\lfloor T/dt\rfloor$, each candidate open-loop control sequence $U=[u_1,\dots,u_N]$ is evaluated by an Euler rollout of the nominal still-air model, and the quadratic running/terminal objective.
To account for dynamics-level uncertainty (unmodeled aerodynamic residuals, gusts, and actuation variability), MPPI replaces the deterministic rollout model with a stochastic control-channel perturbation model: instead of applying $u(t)$ exactly, the executed input is corrupted by white Gaussian noise,
\[dX_t \;=\; f(X_t)\,dt \;+\; g(X_t,t)\!\left( u(X_t,t) \;+\; \frac{1}{\sqrt{\rho}}\frac{\epsilon_t}{\sqrt{dt}} \right)dt .\]where $\epsilon_t \sim \mathcal N(0,\nu)$ is a white-noise sample, $\rho>0$ scales the noise magnitude, $g(X_t,t)$ determines which channels the noise is injected into. This is a control corrupted stochastic differential equation (SDE). Under the time-discretization used in our implementation, this is realized by sampling i.i.d. perturbations at each step,
\[u_k^{(j)} = U_k + \delta u_k^{(j)},\qquad \delta u_k^{(j)} \sim \mathcal N\!\left(0,\ \frac{\nu}{\rho\,dt}\right).\]and rolling out the perturbed trajectories $x_{k+1}^{(j)} = x_k^{(j)} + f\bigl(t_k,x_k^{(j)},u_k^{(j)}\bigr)\,dt.$
On top of this rollout-and-evaluate primitive, I compare two sampling optimizers. Cross-Entropy Method (CEM) maintains a diagonal Gaussian over the open-loop sequence $u_k\sim\mathcal N(U_k,\sigma_k^2)$ and iteratively updates mean/variance by elite-set MLE,
\[U_k\leftarrow\frac{1}{|\mathcal E|}\sum_{j\in\mathcal E}u_k^{(j)},\qquad \sigma_k^2\leftarrow\mathrm{Var}\bigl(\{u_k^{(j)}\}_{j\in\mathcal E}\bigr).\]In contrast, MPPI explicitly uses the above stochastic rollouts: for each rollout $j$, it computes the cost-to-go $S_k^{(j)}=\sum_{\ell=k}^{N+1}J_\ell^{(j)}$, forms exponential path-integral weights $w_k^{(j)}\propto\exp(-S_k^{(j)}/\kappa)$, and applies the reward-weighted perturbation update
\[U_k\leftarrow U_k+\frac{\sum_{j=1}^{M}w_k^{(j)}\,\delta u_k^{(j)}}{\sum_{j=1}^{M}w_k^{(j)}}.\]Finally, I implement a guided-policy-search-style loop that alternates planning on a learned dynamics model, executing the resulting feedback policy on a separate (higher-fidelity) simulator under uncertainty, and re-identifying model parameters from rollout data:
\[\mathrm{plan\ (iLQR)}\ \rightarrow\ \mathrm{execute\ (sim)}\ \rightarrow\ \mathrm{identify\ (ridge\ LS)}\ \rightarrow\ \mathrm{re\text{-}plan}.\]The learned model uses a linear-in-parameters aerodynamic augmentation (RBF features) so that
\[\dot x = f(x,u) + g(x,u)\theta, \qquad \theta=\begin{bmatrix}\theta_{Cl}\\\theta_{Cd}\\\theta_{Cm}\end{bmatrix},\]with deliberate model mismatch between planner and simulator via different aerodynamic inputs (planner uses $(\alpha,\delta_e)$ while simulator uses $(\alpha,\delta_e,V)$).
At iteration $j$, iLQR optimizes on the current learned model using a terminal objective and a trust-region-style quadratic shaping term around the previous nominal $(\bar x_k,\bar u_k)$, thereby producing an updated nominal trajectory ${x_k^{\mathrm{nom}},u_k^{\mathrm{nom}}}_{k=0}^{N-1}$ and time-varying gains ${K_k,k_k}$. Concretely, iLQR maintains a nominal pair $(x_k^{\mathrm{nom}},u_k^{\mathrm{nom}})$ and the backward pass computes a local policy in deviations $\delta x_k := x_k-x_k^{\mathrm{nom}}$:
\[\delta u_k = k_k + K_k\,\delta x_k,\]where $k_k$ is a feedforward increment and $K_k$ is the feedback gain. The forward pass then updates the nominal by rollout under the improved policy,
\[u_k^{\mathrm{new}} = u_k^{\mathrm{nom}} + \alpha k_k + K_k\bigl(x_k^{\mathrm{new}}-x_k^{\mathrm{nom}}\bigr), \qquad x_{k+1}^{\mathrm{new}} = f\!\left(x_k^{\mathrm{new}},u_k^{\mathrm{new}}\right),\]and sets $(x^{\mathrm{nom}},u^{\mathrm{nom}})\leftarrow(x^{\mathrm{new}},u^{\mathrm{new}})$. Hence $k_k$ appears only in the forward-pass nominal update; after convergence, execution tracks the finalized nominal baseline using the standard local feedback law
\[u_k = u_k^{\mathrm{nom}} + K_k\,(x_k-x_k^{\mathrm{nom}}).\]Each rollout provides a dataset with finite-difference derivatives $\dot x_i\approx (x_{i+1}-x_i)/dt$. Identification exploits the linear parameterization by stacking $g(x_i,u_i)$ into $\Phi$ (typically only on aerodynamically affected components) and solving the ridge-regularized least squares
\[\theta_{j+1} =\arg\min_\theta \|\Phi\theta-(\dot x^{\text{data}}-f)\|_2^2+\gamma\|\theta\|_2^2 \;=\;(\Phi^\top\Phi+\gamma I)^{-1}\Phi^\top(\dot x^{\text{data}}-f),\]then writing $\theta_{j+1}$ back into the planner model. Tracking $J_j=\ell_f(x_N)$ and $|\theta_{j+1}-\theta_j|$ illustrates the intended behavior: per-iteration improvement in perching performance alongside convergence of the learned aerodynamic residual parameters.
References
[1] Russ Tedrake. Underactuated Robotics Ch.10 Trajectory Optimization Course Notes for MIT 6.832, 2023. https://underactuated.csail.mit.edu/trajopt.html#perching
[2] Cory, R., & Tedrake, R. Experiments in Fixed-Wing UAV Perching. In AIAA Guidance, Navigation, and Control Conference and Exhibit, Honolulu, HI, USA, Aug. 18–21, 2008, Paper No. 2008-7256, pp. 1–12. doi:10.2514/6.2008-7256.
[3] Roberts, J. W., Cory, R., & Tedrake, R. On the Controllability of Fixed-Wing Perching. In Proceedings of the American Control Conference (ACC 2009), St. Louis, MO, USA, Jun. 10–12, 2009, pp. 2018–2023. doi:10.1109/ACC.2009.5160526.
[4] Moore, J. L., & Tedrake, R. Control Synthesis and Verification for a Perching UAV using LQR-Trees. In Proceedings of the 51st IEEE Conference on Decision and Control (CDC 2012), Maui, HI, USA, Dec. 10–13, 2012, pp. 3707–3714. doi:10.1109/CDC.2012.6425852.
[5] Moore, J., Cory, R., & Tedrake, R. Robust Post-Stall Perching with a Simple Fixed-Wing Glider using LQR-Trees. Bioinspiration & Biomimetics, vol. 9, no. 2, Art. no. 025013, Jun. 2014. doi:10.1088/1748-3182/9/2/025013.
[6] Joseph Moore. Learning based control for robotics Course Notes for JHU ME696, 2025.