GitHub - dnv-opensource/crane-controller: Control a crane using AI-based controllers

Introduction

The package provides AI-based control of a crane system using reinforcement learning, developed at DNV AS.

The primary goal is to solve the anti-pendulum problem: training an agent to dampen (or start) the swing of a load hanging from a mobile crane, using only horizontal crane acceleration as the control input.

Getting Started

Environments

AntiPendulumEnv

The main environment. A mobile crane with a swinging load modelled via real crane physics (crane-controller library). The agent controls horizontal crane acceleration and must either start or stop the pendulum motion.

 -rail_limit                   0                  +rail_limit
      |                        |                        |
──────┼────────────────────────┼────────────────────────┼──── rail
                         ┌─────┴─────┐
      ← ẍ ────────────── │   crane   │ ────────────── ẍ →
                         └─────┬─────┘
                               │   obs[0] = x    crane position
                               │   obs[1] = ẋ    crane velocity
                               │   reward: −|x|, −ẋ²
                            L  │
                               │╲ θ
                               │ ╲
                               │  ╲
                               │   ●  load
                                     obs[2] = θ   polar angle from vertical
                                     obs[3] = θ̇   angular velocity (pure)
                                     reward: KE + PE

      episode truncated (terminal_penalty) when |x| > rail_limit

Observation: crane x-position, crane x-velocity, load polar angle, pure angular velocity θ̇ (rad/s)
Actions: Discrete(3) by default (Q-agent compatible) — accelerate left / coast / accelerate right; Box([-1, 1]) when continuous_actions=True (PPO default) — continuous acceleration command
Modes: start (build pendulum energy) or stop (dampen swing)

ControlledCraneEnv

A more general mobile crane environment for future work.

Algorithms

Three RL algorithms are implemented, each as a self-contained agent class:

PPO (ppo_agent.py) — Proximal Policy Optimization via stable-baselines3. Supports vectorized environments for faster training. Models saved as .zip files.
Q-Learning (q_agent.py) — Tabular Q-learning with epsilon-greedy exploration. Uses a discretized observation space. Q-tables saved/loaded as JSON for incremental training.
AlgorithmAgent (algorithm.py) — Brute-force search over all 81 handcoded strategies (3⁴ combinations). Useful as a baseline.

Wrappers

Generic Gymnasium wrappers (from the Farama Foundation examples) are included for reference:

ClipReward — clips immediate rewards to a valid range
DiscreteActions — restricts the action space to a finite subset
RelativePosition — computes relative position between agent and target
ReacherRewardWrapper — weights multiple reward terms

Learning Examples

Two classic Gymnasium environments were used as stepping stones when developing this project:

GridWorldEnv — minimal grid navigation, ideal for learning the Gymnasium API. See the environment creation tutorial and the Gymnasium examples repo.
CartPoleEnv — cart-pole balancing, useful for verifying RL algorithms before applying them to the crane. Available via gymnasium.make("CartPole-v1").

Installation

pip install crane-controller

Usage

Running

Install dependencies and run the test suite with uv:

uv run pytest tests/ -v

Test files are organised by algorithm:

tests/test_environment.py -- environment and observation space tests
tests/test_algorithm.py -- brute-force algorithm tests
tests/test_q.py -- Q-learning smoke and analysis tests
tests/test_ppo.py -- PPO training, VecNormalize, and inference tests

Tests are suitable for CI/CD — no plot windows are produced.

Training

Experiment configs

PPO training is driven by YAML experiment config files in experiments/. Each file encodes both reward weights and training hyperparameters:

# experiments/hybrid_cv01.yaml
reward:
  energy: 1.0
  crane_velocity: 0.1
  position: 0.02
  terminal_penalty: -5.0
training:
  steps: 3000000
  n_envs: 32
  gamma: 0.99
  n_steps: 4096
  rail_limit: 2.0
  randomize_start: true
  start_speed: 1.0

Pass a config with --config PATH; any key not present falls back to the dataclass defaults. A JSON sidecar (*_meta.json) is written alongside every saved model so play_ppo.py can reconstruct the environment automatically — no --config needed at playback time.

PPO:

uv run python scripts/train_ppo.py --config experiments/hybrid_cv01.yaml \
    --save-path models/my_model.zip --seed 42

Key options:

--config PATH — load a YAML experiment config (reward weights + training hyperparams)
--steps N — total training timesteps (default: 100 000)
--n-envs N — number of parallel environments (default: 4)
--seed N — RNG seed for reproducibility
--continuous-actions / --no-continuous-actions — use Box([-1,1]) or Discrete(3) action space (default: continuous)
--randomize-start / --start-speed SPEED — randomise initial crane speed up to ±SPEED each episode
--save-path PATH — where to write the trained model (default: models/ppo_AntiPendulumEnv.zip)
--resume-from PATH — continue training from a saved checkpoint; preserves VecNormalize statistics and learning rate schedule
--dry-run — run 1 000 steps with a live reward-tracking plot and no model saved

Q-learning:

uv run python scripts/train_q.py

Key options:

--episodes N — total training episodes (default: 10 000)
--v0 F — initial crane speed; negative = stop mode, positive = start mode (default: -1.0)
--reward-limit F — per-episode termination threshold (default: -0.05)
--save-path PATH — where to write the Q-table (default: models/q_AntiPendulumEnv.json)
--trained PATH — continue training from an existing Q-table JSON
--intervals N — run interval training: N rounds of 10 episodes each
--dry-run — run 50 episodes with a reward plot and no model saved

Playing

Run a trained agent visually. Both scripts accept --render-mode with the following options:

plot — 6-panel figure per episode (load angle, load speed, crane position, crane speed, rewards, acceleration)
play-back — animated crane trajectory after each episode
reward-tracking — live reward line plot updating every step

Pre-trained models

Four pre-trained PPO models are included in models/ (trained with experiments/hybrid_cv01.yaml, 3M steps, 32 parallel envs): two action-space variants (Discrete and Box/continuous) across two random seeds (42 and 5775). All generalise well beyond the training range across the full ±10 m/s speed sweep (see docs/source/reward_comparison.md for detailed analysis).

Each model bundle requires three files: .zip (policy), _vecnorm.pkl (observation normalisation statistics), _meta.json (reward config + flags). The play_ppo.py script locates the sidecar files automatically from --model-path.

PPO (default render-mode: play-back):

uv run python scripts/play_ppo.py --model-path models/hybrid_cv01_disc_s42.zip --episodes 3 --render-mode plot
uv run python scripts/play_ppo.py --model-path models/hybrid_cv01_s42.zip --episodes 3 --render-mode plot

OOD evaluation (randomised start speed, 7× training range):

uv run python scripts/play_ppo.py --model-path models/hybrid_cv01_disc_s42.zip \
    --episodes 6 --render-mode plot --randomize-start --start-speed 7.0

Q-learning (default render-mode: plot):

uv run python scripts/play_q.py --model-path models/q_trained.json
uv run python scripts/play_q.py --model-path models/q_trained.json --render-mode play-back --episodes 3

Analysing

Inspect a trained Q-table without running the environment:

uv run python scripts/analyse_q.py --model-path tests/anti-pendulum.json

Prints per-pos/speed average Q-values for a quick sanity check. To drill into specific states, use --obs with 5 integers (use -1 as a wildcard):

uv run python scripts/analyse_q.py --model-path tests/anti-pendulum.json --obs -1 0 0 -1 -1

The five observation dimensions are: [energy, pos, speed, distance, sector].

Development Setup

1. Install uv

This project uses uv as package manager.

If you haven't already, install uv, preferably using it's "Standalone installer" method:

..on Windows:

powershell -ExecutionPolicy ByPass -c "irm https://astral.sh/uv/install.ps1 | iex"

..on MacOS and Linux:

curl -LsSf https://astral.sh/uv/install.sh | sh

(see docs.astral.sh/uv for all / alternative installation methods.)

Once installed, you can update uv to its latest version, anytime, by running:

uv self update

2. Clone the repository

Clone the crane-controller repository into your local development directory:

git clone https://github.com/dnv-opensource/crane-controller path/to/your/dev/crane-controller

Change into the project directory after cloning:

cd crane-controller

3. Install dependencies

Run uv sync -U to create a virtual environment and install all project dependencies into it:

uv sync -U

Note: Using --no-dev will omit installing development dependencies.

Explanation: The -U option stands for --update. It forces uv to fetch and install the latest versions of all dependencies, ensuring that your environment is up-to-date.

Note: uv will create a new virtual environment called .venv in the project root directory when running uv sync -U the first time. Optionally, you can create your own virtual environment using e.g. uv venv, before running uv sync -U.

4. (Optional) Activate the virtual environment

When using uv, there is in almost all cases no longer a need to manually activate the virtual environment.

uv will find the .venv virtual environment in the working directory or any parent directory, and activate it on the fly whenever you run a command via uv inside your project folder structure:

uv run <command>

However, you still can manually activate the virtual environment if needed. When developing in an IDE, for instance, this can in some cases be necessary depending on your IDE settings. To manually activate the virtual environment, run one of the "known" legacy commands:

..on Windows:

.venv\Scripts\activate.bat

..on Linux:

source .venv/bin/activate

6. Install pre-commit hooks

The .pre-commit-config.yaml file in the project root directory contains a configuration for pre-commit hooks. To install the pre-commit hooks defined therein in your local git repository, run:

uv run pre-commit install

All pre-commit hooks configured in .pre-commit-config.yam will now run each time you commit changes.

pre-commit can also manually be invoked, at anytime, using:

uv run pre-commit run --all-files

To skip the pre-commit validation on commits (e.g. when intentionally committing broken code), run:

uv run git commit -m <MSG> --no-verify

To update the hooks configured in .pre-commit-config.yaml to their newest versions, run:

uv run pre-commit autoupdate

7. Test that the installation works

To test that the installation works, run pytest in the project root folder:

uv run pytest

Contributing

Fork it https://github.com/dnv-opensource/crane-controller/fork/
Create an issue in your GitHub repo
Create your branch based on the issue number and type (git checkout -b issue-name)
Evaluate and stage the changes you want to commit (git add -i)
Commit your changes (git commit -am 'place a descriptive commit message here')
Push to the branch (git push origin issue-name)
Create a new Pull Request in GitHub

For your contribution, please make sure you follow the STYLEGUIDE before creating the Pull Request.

Name		Name	Last commit message	Last commit date
Latest commit History 160 Commits
.github		.github
.vscode		.vscode
assets		assets
docs		docs
experiments		experiments
models		models
plans		plans
scripts		scripts
src/crane_controller		src/crane_controller
stubs		stubs
tests		tests
.coveragerc		.coveragerc
.editorconfig		.editorconfig
.gitattributes		.gitattributes
.gitignore		.gitignore
.instructions.md		.instructions.md
.pre-commit-config.yaml		.pre-commit-config.yaml
.prompt.md		.prompt.md
.sourcery.yaml		.sourcery.yaml
CHANGELOG.md		CHANGELOG.md
CITATION.cff		CITATION.cff
LICENSE		LICENSE
README.rst		README.rst
STYLEGUIDE.md		STYLEGUIDE.md
names.csv		names.csv
pyproject.toml		pyproject.toml
pytest.ini		pytest.ini
ruff.toml		ruff.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Introduction

Getting Started

Environments

Algorithms

Wrappers

Learning Examples

Installation

Usage

Running

Training

Experiment configs

Playing

Pre-trained models

Analysing

Development Setup

1. Install uv

2. Clone the repository

3. Install dependencies

4. (Optional) Activate the virtual environment

6. Install pre-commit hooks

7. Test that the installation works

Meta

Contributing

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Introduction

Getting Started

Environments

Algorithms

Wrappers

Learning Examples

Installation

Usage

Running

Training

Experiment configs

Playing

Pre-trained models

Analysing

Development Setup

1. Install uv

2. Clone the repository

3. Install dependencies

4. (Optional) Activate the virtual environment

6. Install pre-commit hooks

7. Test that the installation works

Meta

Contributing

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages