Skip to content

Improve AMD accelerator example#3901

Merged
peterschmidt85 merged 10 commits into
masterfrom
improve-amd-accelerator-example
May 26, 2026
Merged

Improve AMD accelerator example#3901
peterschmidt85 merged 10 commits into
masterfrom
improve-amd-accelerator-example

Conversation

@peterschmidt85
Copy link
Copy Markdown
Contributor

Summary

  • Rework the AMD accelerator example around fleets, inference, training, dev environments, Docker image, and metrics
  • Use MI300X examples that can request at least four GPUs for inference and training
  • Remove stale AMD links from the TRL and Axolotl training examples

@peterschmidt85 peterschmidt85 requested a review from Bihan May 23, 2026 19:36
resources:
gpu: MI300X:4..
disk: 100GB..
```
Copy link
Copy Markdown
Collaborator

@Bihan Bihan May 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I recommend adding

volumes:
  - /checkpoints:/checkpoints

and setting --output_dir /checkpoints

@peterschmidt85 peterschmidt85 merged commit 39d3453 into master May 26, 2026
25 checks passed
@peterschmidt85 peterschmidt85 deleted the improve-amd-accelerator-example branch May 26, 2026 11:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants