|
| 1 | +--- |
| 2 | +title: Object Storage |
| 3 | +description: Configure where Sim stores uploaded files — local disk, AWS S3, or Azure Blob |
| 4 | +--- |
| 5 | + |
| 6 | +import { Tab, Tabs } from 'fumadocs-ui/components/tabs' |
| 7 | +import { Callout } from 'fumadocs-ui/components/callout' |
| 8 | +import { Step, Steps } from 'fumadocs-ui/components/steps' |
| 9 | +import { FAQ } from '@/components/ui/faq' |
| 10 | + |
| 11 | +Sim stores every uploaded file — knowledge base documents, chat attachments, execution outputs, profile pictures, and more — in object storage. Three backends are supported: |
| 12 | + |
| 13 | +| Backend | When to use | |
| 14 | +|---------|-------------| |
| 15 | +| **Local disk** | Single-node Docker, local development, evaluation | |
| 16 | +| **[AWS S3](https://aws.amazon.com/s3/)** | Production, especially when running more than one app replica | |
| 17 | +| **[Azure Blob](https://learn.microsoft.com/azure/storage/blobs/)** | Production on Azure | |
| 18 | + |
| 19 | +<Callout type="warning"> |
| 20 | + Local disk writes to the container's `/uploads` directory. Files are lost when the container is recreated unless that path is on a persistent volume, and they are **not** shared across replicas. For any multi-replica or production deployment, use S3 or Azure Blob. |
| 21 | +</Callout> |
| 22 | + |
| 23 | +## How the backend is selected |
| 24 | + |
| 25 | +Sim picks the backend automatically from environment variables — there is no explicit "provider" flag. The logic, in order of precedence: |
| 26 | + |
| 27 | +1. **Azure Blob** — used if `AZURE_STORAGE_CONTAINER_NAME` is set **and** either (`AZURE_ACCOUNT_NAME` + `AZURE_ACCOUNT_KEY`) or `AZURE_CONNECTION_STRING` is set. |
| 28 | +2. **AWS S3** — used if `S3_BUCKET_NAME` **and** `AWS_REGION` are set (and Azure is not configured). |
| 29 | +3. **Local disk** — the fallback when neither is configured. |
| 30 | + |
| 31 | +If both Azure and S3 are configured, **Azure wins**. Set only the variables for the backend you intend to use. |
| 32 | + |
| 33 | +## Set up AWS S3 |
| 34 | + |
| 35 | +<Steps> |
| 36 | + |
| 37 | +<Step> |
| 38 | + |
| 39 | +### Create the buckets |
| 40 | + |
| 41 | +Sim separates files into purpose-specific buckets. At minimum you need the general workspace bucket; the rest are created on demand based on which env vars you set. A bucket that isn't configured falls back to the general bucket where the code allows it, but the recommended setup is one bucket per purpose. |
| 42 | + |
| 43 | +```bash |
| 44 | +# Set your region once |
| 45 | +export AWS_REGION=us-east-1 |
| 46 | + |
| 47 | +# Create buckets (names must be globally unique — prefix with your org) |
| 48 | +for name in workspace-files knowledge-base execution-files chat-files \ |
| 49 | + copilot-files profile-pictures og-images workspace-logos; do |
| 50 | + aws s3api create-bucket \ |
| 51 | + --bucket "myorg-sim-$name" \ |
| 52 | + --region "$AWS_REGION" \ |
| 53 | + --create-bucket-configuration LocationConstraint="$AWS_REGION" |
| 54 | +done |
| 55 | +``` |
| 56 | + |
| 57 | +<Callout type="info"> |
| 58 | + In `us-east-1`, omit the `--create-bucket-configuration` flag — that region rejects an explicit `LocationConstraint`. |
| 59 | +</Callout> |
| 60 | + |
| 61 | +Keep all buckets **private** (block public access). Sim serves files through short-lived presigned URLs, so the buckets never need public read access. |
| 62 | + |
| 63 | +</Step> |
| 64 | + |
| 65 | +<Step> |
| 66 | + |
| 67 | +### Grant access with an IAM policy |
| 68 | + |
| 69 | +Create an IAM policy scoped to your buckets and attach it to the user (or role) Sim runs as: |
| 70 | + |
| 71 | +```json |
| 72 | +{ |
| 73 | + "Version": "2012-10-17", |
| 74 | + "Statement": [ |
| 75 | + { |
| 76 | + "Effect": "Allow", |
| 77 | + "Action": [ |
| 78 | + "s3:GetObject", |
| 79 | + "s3:PutObject", |
| 80 | + "s3:DeleteObject", |
| 81 | + "s3:ListBucket" |
| 82 | + ], |
| 83 | + "Resource": [ |
| 84 | + "arn:aws:s3:::myorg-sim-*", |
| 85 | + "arn:aws:s3:::myorg-sim-*/*" |
| 86 | + ] |
| 87 | + } |
| 88 | + ] |
| 89 | +} |
| 90 | +``` |
| 91 | + |
| 92 | +You then have two ways to supply credentials: |
| 93 | + |
| 94 | +- **Static keys** — create an IAM user with this policy and set `AWS_ACCESS_KEY_ID` / `AWS_SECRET_ACCESS_KEY`. |
| 95 | +- **Instance/role credentials (recommended)** — attach the policy to the EC2 instance role, ECS task role, or EKS IRSA role. Leave `AWS_ACCESS_KEY_ID` / `AWS_SECRET_ACCESS_KEY` unset and Sim falls back to the default AWS credential chain automatically. |
| 96 | + |
| 97 | +</Step> |
| 98 | + |
| 99 | +<Step> |
| 100 | + |
| 101 | +### Configure environment variables |
| 102 | + |
| 103 | +Set the region, optionally the credentials, and the bucket names: |
| 104 | + |
| 105 | +```bash |
| 106 | +# Region + credentials |
| 107 | +AWS_REGION=us-east-1 |
| 108 | +AWS_ACCESS_KEY_ID=AKIA... # omit when using an instance/IRSA role |
| 109 | +AWS_SECRET_ACCESS_KEY=... # omit when using an instance/IRSA role |
| 110 | + |
| 111 | +# Buckets (per purpose) |
| 112 | +S3_BUCKET_NAME=myorg-sim-workspace-files |
| 113 | +S3_KB_BUCKET_NAME=myorg-sim-knowledge-base |
| 114 | +S3_EXECUTION_FILES_BUCKET_NAME=myorg-sim-execution-files |
| 115 | +S3_CHAT_BUCKET_NAME=myorg-sim-chat-files |
| 116 | +S3_COPILOT_BUCKET_NAME=myorg-sim-copilot-files |
| 117 | +S3_PROFILE_PICTURES_BUCKET_NAME=myorg-sim-profile-pictures |
| 118 | +S3_OG_IMAGES_BUCKET_NAME=myorg-sim-og-images |
| 119 | +S3_WORKSPACE_LOGOS_BUCKET_NAME=myorg-sim-workspace-logos |
| 120 | +``` |
| 121 | + |
| 122 | +Only `AWS_REGION` and `S3_BUCKET_NAME` are strictly required to switch Sim into S3 mode. Add the others so each file type lands in its own bucket. |
| 123 | + |
| 124 | +</Step> |
| 125 | + |
| 126 | +</Steps> |
| 127 | + |
| 128 | +### S3 bucket reference |
| 129 | + |
| 130 | +| Variable | Stores | Required | |
| 131 | +|----------|--------|----------| |
| 132 | +| `AWS_REGION` | Region for all buckets | **Yes** (enables S3) | |
| 133 | +| `AWS_ACCESS_KEY_ID` | Access key | No (uses credential chain if unset) | |
| 134 | +| `AWS_SECRET_ACCESS_KEY` | Secret key | No (uses credential chain if unset) | |
| 135 | +| `S3_BUCKET_NAME` | General workspace files | **Yes** (enables S3) | |
| 136 | +| `S3_KB_BUCKET_NAME` | Knowledge base documents | Recommended | |
| 137 | +| `S3_EXECUTION_FILES_BUCKET_NAME` | Workflow execution files (default: `sim-execution-files`) | Recommended | |
| 138 | +| `S3_CHAT_BUCKET_NAME` | Deployed chat assets | Recommended | |
| 139 | +| `S3_COPILOT_BUCKET_NAME` | Copilot attachments | Recommended | |
| 140 | +| `S3_PROFILE_PICTURES_BUCKET_NAME` | User avatars | Recommended | |
| 141 | +| `S3_OG_IMAGES_BUCKET_NAME` | OpenGraph preview images (falls back to `S3_BUCKET_NAME`) | Optional | |
| 142 | +| `S3_WORKSPACE_LOGOS_BUCKET_NAME` | Workspace logos (falls back to `S3_BUCKET_NAME`) | Optional | |
| 143 | +| `S3_LOGS_BUCKET_NAME` | Stored logs | Optional | |
| 144 | +| `S3_ENDPOINT` | Custom endpoint for S3-compatible storage (R2, MinIO, B2) | Optional (AWS S3 if unset) | |
| 145 | +| `S3_FORCE_PATH_STYLE` | `true` for path-style addressing (MinIO/Ceph) | Optional (defaults `false`) | |
| 146 | + |
| 147 | +## Apply the configuration |
| 148 | + |
| 149 | +<Tabs items={['Docker Compose', 'Kubernetes (Helm)']}> |
| 150 | + <Tab value="Docker Compose"> |
| 151 | + |
| 152 | +Add the storage variables to the `.env` file used by `docker-compose.prod.yml`, then restart: |
| 153 | + |
| 154 | +```bash |
| 155 | +docker compose -f docker-compose.prod.yml up -d |
| 156 | +``` |
| 157 | + |
| 158 | +Because files now live in S3, you no longer depend on a local `/uploads` volume for durability. |
| 159 | + |
| 160 | + </Tab> |
| 161 | + <Tab value="Kubernetes (Helm)"> |
| 162 | + |
| 163 | +Set the variables under `app.env` (non-secret, e.g. region and bucket names) and supply credentials through a secret. The chart ships a complete example at `helm/sim/examples/values-aws.yaml`: |
| 164 | + |
| 165 | +```yaml |
| 166 | +app: |
| 167 | + env: |
| 168 | + AWS_REGION: "us-east-1" |
| 169 | + S3_BUCKET_NAME: "myorg-sim-workspace-files" |
| 170 | + S3_KB_BUCKET_NAME: "myorg-sim-knowledge-base" |
| 171 | + S3_EXECUTION_FILES_BUCKET_NAME: "myorg-sim-execution-files" |
| 172 | + # ...remaining buckets |
| 173 | +``` |
| 174 | + |
| 175 | +On EKS, prefer **IRSA**: attach the IAM policy to the service account's role and leave the access-key variables unset. |
| 176 | + |
| 177 | + </Tab> |
| 178 | +</Tabs> |
| 179 | + |
| 180 | +## Set up Azure Blob |
| 181 | + |
| 182 | +Azure Blob uses one container per purpose, mirroring the S3 layout. Authenticate with either a connection string or an account name + key. |
| 183 | + |
| 184 | +```bash |
| 185 | +# Credentials — provide ONE of these forms |
| 186 | +AZURE_ACCOUNT_NAME=mystorageaccount |
| 187 | +AZURE_ACCOUNT_KEY=... |
| 188 | +# or |
| 189 | +AZURE_CONNECTION_STRING=DefaultEndpointsProtocol=https;AccountName=...;AccountKey=...;EndpointSuffix=core.windows.net |
| 190 | + |
| 191 | +# Containers (per purpose) |
| 192 | +AZURE_STORAGE_CONTAINER_NAME=workspace-files |
| 193 | +AZURE_STORAGE_KB_CONTAINER_NAME=knowledge-base |
| 194 | +AZURE_STORAGE_EXECUTION_FILES_CONTAINER_NAME=execution-files |
| 195 | +AZURE_STORAGE_CHAT_CONTAINER_NAME=chat-files |
| 196 | +AZURE_STORAGE_COPILOT_CONTAINER_NAME=copilot-files |
| 197 | +AZURE_STORAGE_PROFILE_PICTURES_CONTAINER_NAME=profile-pictures |
| 198 | +AZURE_STORAGE_OG_IMAGES_CONTAINER_NAME=og-images |
| 199 | +AZURE_STORAGE_WORKSPACE_LOGOS_CONTAINER_NAME=workspace-logos |
| 200 | +``` |
| 201 | + |
| 202 | +A full Helm example lives at `helm/sim/examples/values-azure.yaml`. |
| 203 | + |
| 204 | +## Set up an S3-compatible provider (R2, MinIO, B2) |
| 205 | + |
| 206 | +Sim works with any S3-compatible store by pointing the S3 client at a custom endpoint. Configure it exactly like AWS S3 (buckets, access key, secret), then add `S3_ENDPOINT` — and `S3_FORCE_PATH_STYLE` where the provider requires path-style addressing. Verified with [Cloudflare R2](https://developers.cloudflare.com/r2/), [MinIO](https://min.io/), [Backblaze B2](https://www.backblaze.com/cloud-storage), and [RustFS](https://rustfs.com/). |
| 207 | + |
| 208 | +<Callout type="info"> |
| 209 | + `S3_ENDPOINT` is trusted operator configuration, so it is used as-is — `http://` and private hosts are accepted (no SSRF/HTTPS gate). Don't wire it to untrusted input. |
| 210 | +</Callout> |
| 211 | + |
| 212 | +<Callout type="warning"> |
| 213 | + **The endpoint must be reachable from your users' browsers, and the bucket needs CORS.** Uploads use presigned `PUT` requests sent **directly from the browser** to `S3_ENDPOINT` (downloads are proxied back through the app, so they only need server-side reachability). This means: |
| 214 | + |
| 215 | + - A purely internal endpoint (e.g. `https://minio.internal:9000` that only the app pods can resolve) will let the server start cleanly but **uploads will fail in the browser**. Use an endpoint your users can reach. |
| 216 | + - Configure a **CORS policy** on the bucket that allows your Sim origin (`PUT`, `GET`, and the `Authorization` / `Content-Type` / `x-amz-*` headers). This applies to AWS S3 too — R2 and MinIO are no different. |
| 217 | +</Callout> |
| 218 | + |
| 219 | +<Tabs items={['Cloudflare R2', 'MinIO', 'RustFS']}> |
| 220 | + <Tab value="Cloudflare R2"> |
| 221 | + |
| 222 | +[Cloudflare R2](https://developers.cloudflare.com/r2/api/s3/) uses virtual-hosted style (the default) and the region `auto`: |
| 223 | + |
| 224 | +```bash |
| 225 | +AWS_REGION=auto |
| 226 | +S3_ENDPOINT=https://<account-id>.r2.cloudflarestorage.com |
| 227 | +AWS_ACCESS_KEY_ID=<r2-access-key-id> |
| 228 | +AWS_SECRET_ACCESS_KEY=<r2-secret-access-key> |
| 229 | +S3_BUCKET_NAME=myorg-sim-workspace-files |
| 230 | +# ...remaining S3_*_BUCKET_NAME vars, one R2 bucket each |
| 231 | +``` |
| 232 | + |
| 233 | +Leave `S3_FORCE_PATH_STYLE` unset — R2 supports the default virtual-hosted addressing. |
| 234 | + |
| 235 | + </Tab> |
| 236 | + <Tab value="MinIO"> |
| 237 | + |
| 238 | +[MinIO](https://min.io/docs/minio/linux/index.html) (and [Ceph RGW](https://docs.ceph.com/en/latest/radosgw/)) need path-style addressing and accept any region string: |
| 239 | + |
| 240 | +```bash |
| 241 | +AWS_REGION=us-east-1 |
| 242 | +S3_ENDPOINT=https://minio.example.com # must be reachable from users' browsers, not app-pods-only |
| 243 | +S3_FORCE_PATH_STYLE=true |
| 244 | +AWS_ACCESS_KEY_ID=<minio-access-key> |
| 245 | +AWS_SECRET_ACCESS_KEY=<minio-secret-key> |
| 246 | +S3_BUCKET_NAME=myorg-sim-workspace-files |
| 247 | +# ...remaining S3_*_BUCKET_NAME vars, one bucket each |
| 248 | +``` |
| 249 | + |
| 250 | +`http://` works server-side, but since the browser uploads directly to this endpoint, prefer a TLS endpoint your users can reach (a mixed-content `http://` target will be blocked on an `https://` Sim origin). |
| 251 | + |
| 252 | + </Tab> |
| 253 | + <Tab value="RustFS"> |
| 254 | + |
| 255 | +[RustFS](https://rustfs.com/) is a Rust-based, S3-compatible store (a MinIO drop-in). Configure it exactly like MinIO — path-style, any region string, SigV4 access key/secret: |
| 256 | + |
| 257 | +```bash |
| 258 | +AWS_REGION=us-east-1 |
| 259 | +S3_ENDPOINT=https://rustfs.example.com # must be reachable from users' browsers |
| 260 | +S3_FORCE_PATH_STYLE=true |
| 261 | +AWS_ACCESS_KEY_ID=<rustfs-access-key> |
| 262 | +AWS_SECRET_ACCESS_KEY=<rustfs-secret-key> |
| 263 | +S3_BUCKET_NAME=myorg-sim-workspace-files |
| 264 | +# ...remaining S3_*_BUCKET_NAME vars, one bucket each |
| 265 | +``` |
| 266 | + |
| 267 | +The same browser-reachability and CORS requirements apply. |
| 268 | + |
| 269 | + </Tab> |
| 270 | +</Tabs> |
| 271 | + |
| 272 | +## Verify it works |
| 273 | + |
| 274 | +After restarting with the new configuration: |
| 275 | + |
| 276 | +1. Open the app and upload a document to a knowledge base (or set a profile picture). |
| 277 | +2. Confirm an object appears in the corresponding bucket/container. |
| 278 | +3. Reload the page — the file should still render (downloads stream back through the app at `/api/files/serve`). |
| 279 | + |
| 280 | +If uploads fail, check the app logs for credential or permission errors (see [Troubleshooting](/self-hosting/troubleshooting)). |
| 281 | + |
| 282 | +<FAQ items={[ |
| 283 | + { question: "What happens if I do not configure any storage variables?", answer: "Sim falls back to local disk, writing files to the /uploads directory inside the app container. This is fine for evaluation but not durable across container recreation and not shared across replicas — use S3 or Azure Blob for production." }, |
| 284 | + { question: "Do I have to create all eight S3 buckets?", answer: "No. Only AWS_REGION and S3_BUCKET_NAME are required to enable S3 mode. The purpose-specific buckets are recommended so each file type is isolated; og-images and workspace-logos fall back to the general bucket if their variables are unset." }, |
| 285 | + { question: "How do I avoid storing AWS keys in plaintext?", answer: "On EC2/ECS/EKS, attach the IAM policy to the instance role, task role, or IRSA service-account role and leave AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY unset. Sim resolves credentials through the default AWS SDK provider chain automatically." }, |
| 286 | + { question: "Can I use both S3 and Azure Blob at the same time?", answer: "No. Sim selects a single backend. If both are configured, Azure Blob takes precedence. Set only the variables for the backend you want." }, |
| 287 | + { question: "Are the buckets exposed publicly?", answer: "No, and they should not be. Keep them private with public access blocked. Sim serves files to users through short-lived presigned URLs, so the buckets never need public read permissions." }, |
| 288 | + { question: "Can I use MinIO or Cloudflare R2?", answer: "Yes. Configure it like AWS S3, then set S3_ENDPOINT to your provider's endpoint. For R2, set AWS_REGION=auto and leave S3_FORCE_PATH_STYLE unset. For MinIO/Ceph, set S3_FORCE_PATH_STYLE=true. See the S3-compatible provider section above." }, |
| 289 | +]} /> |
0 commit comments