This repo contains MATLAB codes for solving a general nonlinear optimal control problem using the gradient descent approach.
In optimal contorl theory a standard optimal control is defined as
The solution to the problem above comes from calculus of variations. A Hamiltonian function is defined as
And the optimal control input, in absence of input limits, can be calcuated using the relationships below:
Here p denotes the costates. This set of equations is generally hard to solve, because they are two point boundary nonlinear equations. The initial values of x and the final values of p are known.
One way to solve this set of equations is using the gradient descent algorithm. An initial guess of the control input u is selected and the equations are solved for x and p, given the boundary values. Then u is corrected using the gradient of the Hamiltonian.
When the final time
For a problem with free
At the optimal solution, not only must the control satisfy
This condition arises because when
- the terminal cost changes as
$\frac{\partial \phi}{\partial t} \delta t_f$ - the integral cost changes by adding a time slice of width
$\delta t_f$ with integrand$g + p^T f = \mathcal{H}$ .
At optimality, these combined effects must sum to zero.
The solver implements this by performing alternating optimization: first updating the control u via gradient descent on
This exact logic has been implemented in the function optimalControlSolver. Here, we go over the variables, inputs and outputs of the function.
Problem:
Usage:
[sol, info] = optimalControlSolver(symF, symG, symPhi, xSym, uSym, tGrid, x0, U0, opts)
Inputs:
symF: symbolic vector field f(x,u) of size [n x 1]symG: symbolic scalar running cost g(x,u)symPhi: symbolic scalar terminal cost Phi(x)xSym: symbolic state vector [x1; x2; ...; xn]uSym: symbolic control vector [u1; u2; ...; um]tGrid: time grid (column or row) of size [N x 1] or [1 x N], increasing, with tGrid(1) = 0x0: initial state (numeric) [n x 1]U0: initial control trajectory over tGrid [N x m]opts: options struct (all optional fields):maxIters(default 50)alpha(default 1.0) initial step size for gradient descentbeta(default 0.5) backtracking reduction factor (0<beta<1)c1(default 1e-4) Armijo condition constanttol(default 1e-6) stopping tolerance on ||grad_u||_FodeOptions(default []) options set by odesetinterp(default 'linear') 'linear' or 'zoh' for u/x interpolationuLower(default []) lower bounds on u (1x m) or scalaruUpper(default []) upper bounds on u (1x m) or scalarmaxLineSearch(default 10)verbose(default true)
Outputs:
sol.t: time grid [N x 1]sol.X: state trajectory along tGrid [N x n]sol.U: control trajectory along tGrid [N x m]sol.P: costate trajectory along tGrid [N x n]sol.J: final cost value at solutionsol.J_hist: cost history per iterationsol.grad_norm_hist: gradient-norm history per iterationinfo.iters: number of iterations performed
Requirements:
- MATLAB Symbolic Math Toolbox
- Free time support has been added. You can use the available options to set it up. Take a look at Free time CSTR to see how to use it
Notes:
- Instead of a simple gradient descent with a constant step size, the Armijo condition is checked every time and backtracking is used to find an appropriate step size
- Three sample scripts have been provided:
- demo.m contains a linear system with two states and two inputs.
- CSTR.m solves the optimal control problem for a CSTR system (example 6.2-2 from Kirk's book).
- Free time CSTR solves the same optimal control problem as CSTR.m, but with free final time