Dumpus API

API to extract statistics from the Discord Data Packages (GDPR packages). This API is completely open-source, self-hostable and documented.

Architecture Documentation

It has been adapted to meet the following constraints:

users' Discord Data Package must be entirely encrypted on the server side.
the encryption key must always remain on the client side, and must never be stored on the server side.
Discord Data Package processing must be fast and scalable.

In short, Dumpus admins, or users providing their own Dumpus instance, must never have access to users' Discord Data Packages, even if the server is compromised.

A Discord Data Package download link consists of a UPN KEY. It is therefore possible to download the Discord Data Package from the UPN KEY.

https://click.discord.com/ls/click?upn={UPN_KEY}

Thus:

a Discord Data Package identifier is created from a function that hashes the package's UPN KEY (called package_id).
when a Discord Data Package is to be stored in a database, it is encrypted with its UPN KEY.
when the client queries the server, it must always provide its UPN KEY to prove that it is the owner of the Discord Data Package, and to enable the server to return the decrypted data (if the client makes a data request).

Self-hosting

Anyone can host their own Dumpus instance. The official Dumpus client can then be configured to use it.

The worker no longer runs as a separate Celery process — set QUEUE_BACKEND=sync and the API processes packages inline on the request thread (good enough at small volume), or set QUEUE_BACKEND=sqs to dispatch to AWS SQS (used by the Lambda deployment).

clone https://github.com/dumpus-app/dumpus-api
easy way: cp .env.example .env, fill it in, then make up
manual:
- install requirements with pip
- start a PostgreSQL server
- fill the .env file with your PostgreSQL creds
- start the API: QUEUE_BACKEND=sync waitress-serve --port=5000 app:app

By default, Dumpus API will only treat zip files sent from https://discord.click. You can specify a DL_ZIP_WHITELISTED_DOMAINS environment variable to add other allowed domains.

Deploy to AWS

A Terraform stack under infra/terraform/ provisions a deployment of the API on AWS:

Component	AWS service
API	Lambda (container image) behind API Gateway HTTP API
Forwarder	Lambda triggered by SQS, fires one Fargate task per message
Worker	Fargate task (no time / memory caps, pay-per-run)
Database	RDS Postgres in private subnets
Outbound NAT	fck-nat instance (NAT Gateway replacement)
Secrets	Secrets Manager + Lambda/task env
TLS / DNS	ACM cert + Route53 alias to API Gateway
CI	GitHub OIDC role; build → ECR → `update-function-code` + `register-task-definition`

Bootstrap

Create a public Route53 hosted zone for your domain and point your registrar's nameservers at it.
cp infra/terraform/terraform.tfvars.example infra/terraform/terraform.tfvars and fill in discord_secret, domain_name, github_repository, region, etc.
cd infra/terraform && terraform init && terraform apply. This single apply does everything: a null_resource pushes a placeholder image (the public AWS Lambda Python base) into ECR with the :bootstrap tag, then the Lambda functions are created against that placeholder. Requires docker and aws CLI on the apply host.
Set the GitHub repo secret AWS_DEPLOY_ROLE_ARN from terraform output -raw github_deploy_role_arn. From here on, every push to main builds the real image in CI and rolls both Lambdas — no more local builds needed.

Day-to-day deploys

Push to main → .github/workflows/deploy.yml builds both container images (Lambda for the API + forwarder, plain Python for the Fargate worker), pushes them to ECR tagged with the git SHA, rolls both Lambdas, and registers a new ECS task definition revision. The next runTask call picks up the new image. No long-lived AWS keys in GitHub.

Operations

aws logs tail /aws/lambda/<name-prefix>-<env>-api --follow
aws logs tail /aws/lambda/<name-prefix>-<env>-forwarder --follow
aws logs tail /aws/ecs/<name-prefix>-<env>-worker --follow
aws sqs receive-message --queue-url "$(terraform output -raw sqs_dlq_url)"

Things to keep in mind:

API cold start is a few seconds while pandas imports. Invisible on the async submit/poll flow; use provisioned concurrency if a sync endpoint must be sub-second.
Worker /tmp cap defaults to 30 GiB. Bump worker_task_ephemeral_storage_gib if users upload very large Discord exports (Fargate ceiling is 200 GiB).
Worker has no time cap. Heavy packages just take as long as they need; failures show up as ERRORED package rows + a Discord webhook from process_package, not in the DLQ.
Forwarder DLQ. If the forwarder Lambda itself fails to launch a Fargate task twice (capacity / IAM / network), the SQS message lands in the DLQ — alarmed via Discord through monitoring.tf.
fck-nat is a single instance. Switch to a managed NAT Gateway if you need the extra availability — at the cost of a much higher fixed monthly bill.

API Documentation

One header is required for all the requests except the POST /process one:

Authorization: Bearer <UPN_KEY>

Process a package

POST /process

Request body:

{
    "package_link": "https://click.discord.com/ls/click?upn=<UPN_KEY>"
}

Response:

{
    "isAccepted": true, // whether or not the package has been accepted for processing (if false, the error message will be in errorMessageCode)
    "packageId": "a1b2c3d4e5f6g7h8i9j0", // the package ID

    "errorMessageCode": null // if an error occurs, the error message code will show up here
}

Current error message codes:

INVALID_LINK: the link provided is not a valid Discord Data Package link.

Note: if the package was already processed previously, the API will not return a specific response. You will see that the isDataAvailable will be true in the first status response.

Fetch a package status

GET /process/<package_id>/status

Response:

{
    "isDataAvailable": false, // whether or not the data is available (meaning the processing is ended)

    "isUpgraded": false, // whether or not the user has paid for the "queue skip" feature

    "isErrored": false, // whether or not an error occurred during the processing
    "errorMessageCode": null, // if an error occurs, the error message code will show up here

    "isProcessing": true, // whether or not the package is still being processed
    "processingStep": "messages", // the current processing step
    "processingQueuePosition": {
        "premiumQueueTotal": 20, // the number of premium packages in the queue
        "standardQueueTotal": 300, // the number of standard packages in the queue
        "premiumQueueUser": null, // the number of premium packages in the queue before the user's package
        "standardQueueUser": 63, // the number of standard packages in the queue before the user's package
        "standardWhenJoined": 150, // the number of standard packages in the queue when the user's package joined the queue
        "premiumWhenJoined": 10 // the number of premium packages in the queue when the user's package joined the queue
    }
}

Current error message codes:

UNKNOWN_PACKAGE_ID: for some reason, you are asking for the status of a package that does not exist in the database.
UNKNOWN_ERROR: an unknown error occurred on the server side. Please contact us on GitHub or Discord.
UNAUTHORIZED: the UPN KEY provided in the Authorization header is not valid.
EXPIRED_LINK: the link provided is a valid Discord Data Package link, but it has expired.

Available steps:

LOCKED: the package is locked, meaning it is waiting for a worker to process it. It can still be aborted by calling the DELETE endpoint.
DOWNLOADING: the package is being downloaded from Discord's servers.
ANALYZING: the package is being analyzed to determine the number of messages, channels, etc.
PROCESSED: the package has been processed and the data is available.

Fetch a package data

GET /process/<package_id>/blob

Returns a short-lived presigned S3 URL the client downloads the encrypted SQLite from directly. Decryption happens client-side using the UPN as the key, so the encryption key never reaches the server.

Response:

{
    "url": "https://<bucket>.s3.<region>.amazonaws.com/...",
    "iv": "abc123...",
    "ttl": 300
}

iv is a hex string for real packages, null for the demo (the demo blob is unencrypted). Decrypt with AES-CBC using SHA-256(UPN) as the key and the returned IV; the plaintext is a gzipped SQLite file.

Status codes:

200: presigned URL returned.
401: the UPN KEY provided in the Authorization header is not valid.
404: no blob for this package id.
501: server doesn't have the S3 backend wired up.

Demo (/process/demo/blob) is unauthenticated and lazy-seeds the blob on the first call after deploy.

Fetch a package user

GET /process/<package_id>/user/<user_id>

Response:

{
    "avatar_url": "https://cdn.discordapp.com/avatars/422820341791064085/af0c1960a90d98e69bce68d206b56c9a.png",
    "display_name": "Androz",
    "user_id": "422820341791064085"
}

Status codes:

200: the data is available and has been returned.
401: the UPN KEY provided in the Authorization header is not valid, or the package does not exist.
404: unknown user ID.
500: an error occurred while fetching the data (can often happen).
429: you are being rate limited. Wait 500ms and send the request again.

Delete a package (and abort the processing)

DELETE /process/<package_id>

Response:

{
    "isDeleted": true, // whether or not the package has been deleted
    "errorMessageCode": null // if an error occurs, the error message code will show up here
}

Current error message codes:

UNKNOWN_PACKAGE_ID: for some reason, you are asking for the status of a package that does not exist in the database.
UNAUTHORIZED: the UPN KEY provided in the Authorization header is not valid.

SQLite Database Documentation can be found here

Troubleshooting

API server is crashing and says that Postgres is not supported. Make sure that your PostgreSQL server URL starts with postgresql:// and not postgres://, which is no longer supported by SQLAlchemy.

Name		Name	Last commit message	Last commit date
Latest commit History 391 Commits
.github/workflows		.github/workflows
.vscode		.vscode
docs		docs
infra/terraform		infra/terraform
scripts		scripts
src		src
tmp		tmp
.env.example		.env.example
.gitignore		.gitignore
Dockerfile.api		Dockerfile.api
Dockerfile.lambda		Dockerfile.lambda
Dockerfile.worker		Dockerfile.worker
Makefile		Makefile
README.md		README.md
captain.api		captain.api
docker-compose.yml		docker-compose.yml
renovate.json		renovate.json
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Dumpus API

Table of Contents

Architecture Documentation

Self-hosting

Deploy to AWS

Bootstrap

Day-to-day deploys

Operations

API Documentation

Process a package

Fetch a package status

Fetch a package data

Fetch a package user

Delete a package (and abort the processing)

SQLite Database Documentation can be found here

Troubleshooting

About

Uh oh!

Releases

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Dumpus API

Table of Contents

Architecture Documentation

Self-hosting

Deploy to AWS

Bootstrap

Day-to-day deploys

Operations

API Documentation

Process a package

Fetch a package status

Fetch a package data

Fetch a package user

Delete a package (and abort the processing)

SQLite Database Documentation can be found here

Troubleshooting

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Uh oh!

Contributors

Uh oh!

Languages