This directory contains GitHub-ready versions of the local training and inference scripts. Private paths and API keys have been removed. Configure paths through environment variables.
sft.sh: supervised fine-tuning withswift sft.RL.sh: GRPO/RLHF training with an external rollout server.plugin_reward.py: custom reward functions used byRL.sh.1.infer_task_no_prompt.py: task-level inference without injecting the prompt tags.1.infer_task_prompt_tags.py: task-level inference with the built-in GeoGebra prompt..env.example: example configuration values. Do not commit real API keys.
cp .env.example .env
# Edit .env, then:
source .envRun SFT:
OUTPUT_DIR=./outputs/sft bash sft.shRun RL training:
OUTPUT_DIR=./outputs/rl_both bash RL.shRun evaluation:
CHECKPOINT=/path/to/checkpoint TASK_DATASET_DIR=./task_datasets python 1.infer_task_prompt_tags.py
CHECKPOINT=/path/to/checkpoint TASK_DATASET_DIR=./task_datasets WORLD_SIZE=8 python 1.infer_task_no_prompt.pySWANLAB_API_KEYis intentionally not set in the scripts. Export it locally only when needed.REWARD_PLUGIN,SYSTEM_PROMPT_FILE, model checkpoints, and datasets are expected to be provided by the user or the surrounding project.