Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
20 commits
Select commit Hold shift + click to select a range
c0730ef
docs: focus coding-agents guide on Claude Code with translation proxy
typhoonzero Jun 9, 2026
b0becaf
docs: fix coding agent inference guide
typhoonzero Jun 9, 2026
7871e1f
docs: remove non-existent --default-chat-template-kwargs flag
typhoonzero Jun 9, 2026
285e68d
docs: clarify vllm reasoning effort support
typhoonzero Jun 9, 2026
b18b5cd
docs: refine agentic mlops tuning guidance
typhoonzero Jun 9, 2026
79d27c9
Merge branch 'master' of https://github.com/alauda/aml-docs into docs…
typhoonzero Jun 10, 2026
3c79b62
docs: fix lint error in pipelines-mlflow-integration guide
Jun 12, 2026
a6d351b
Merge branch 'master' of https://github.com/alauda/aml-docs into docs…
typhoonzero Jun 15, 2026
ddff8e5
update
typhoonzero Jun 15, 2026
02774a4
docs: rewrite KFP+MLflow integration guide to a cluster-verified example
typhoonzero Jun 15, 2026
595ea04
docs: align MLflow auth with the kubernetes-auth plugin's canonical m…
typhoonzero Jun 15, 2026
76eef44
docs: frame MLflow auth as the kubernetes-auth user_identity_token flow
typhoonzero Jun 15, 2026
4536ae9
docs: use only the user_identity_token method; add user-identity smok…
typhoonzero Jun 15, 2026
03ea72d
docs: add MLflow Python SDK auth + RBAC guide
typhoonzero Jun 15, 2026
deb8677
docs: SDK guide authenticates through the OAuth proxy (no app-port ac…
typhoonzero Jun 15, 2026
b627b4a
docs: use the Dex refresh-token grant for headless MLflow auth
typhoonzero Jun 15, 2026
6ee45a7
docs: use the Dex password grant (ROPC) for headless MLflow auth
typhoonzero Jun 15, 2026
ad96d01
docs: make ROPC (username/password) the primary MLflow SDK auth method
typhoonzero Jun 16, 2026
23a816d
docs: in-cluster Service URL + MLflow client for pipelines
typhoonzero Jun 16, 2026
cdf097c
docs: cross-reference the MLflow SDK auth guide from training guides
typhoonzero Jun 16, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
135 changes: 135 additions & 0 deletions docs/en/kubeflow/how_to/mlflow-python-sdk.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,135 @@
---
weight: 46
---

# Using the MLflow Python SDK with Authentication and RBAC

On Alauda AI the [MLflow Tracking Server](./mlflow.mdx) runs behind single sign-on and multi-tenancy: an OAuth proxy authenticates every caller, and the server records each run under the calling user and authorizes it against Kubernetes RBAC. This guide shows how to drive the stock **MLflow Python SDK** through that OAuth proxy with your own identity, using the OAuth2 **password grant** to obtain a token from a username and password — no browser, and never the MLflow container port.

## Platform setup (administrator, one-time) \{#platform-setup-administrator-one-time}

The password grant needs two settings, which an administrator enables once:

- **Accept bearer tokens at the proxy.** Add `--skip-jwt-bearer-tokens=true` to the MLflow OAuth proxy so it accepts a Dex OIDC token alongside browser sessions:

```yaml
# MLflow plugin values
auth:
oauth:
extraArgs:
- --skip-jwt-bearer-tokens=true
```

- **Allow the password grant.** Dex must have the password connector enabled (`enablePasswordDB: true`), and the OAuth client you authenticate with must list `password` in its `grantTypes`. Register a **dedicated** client for this rather than the platform's interactive-login client.

## Prerequisites

- `mlflow` **3.10 or later** (`pip install "mlflow>=3.10"`). Workspace selection (`mlflow.set_workspace`) is a 3.10+ feature.
- A platform **username and password** — ideally a dedicated service account, not a person's login — that can access the target workspace (see [Workspace Access](./mlflow.mdx)).
- The Dex **client id and secret** allowed to use the password grant (from your administrator).

## How authentication works

Two layers sit in front of your runs:

1. The **OAuth proxy** (`oauth2-proxy`) authenticates the request. With `--skip-jwt-bearer-tokens`, it accepts a Dex-issued OIDC **id token** sent as `Authorization: Bearer …`.
2. The MLflow server's `kubernetes-auth` plugin reads your identity from that token, records it as the run **owner**, and authorizes it against your Kubernetes permissions in the workspace.

The client always goes through the OAuth proxy — never connect to the MLflow container port directly.

## Connect the SDK

### 1. Mint an id token with the password grant

Exchange the username and password for a Dex **id token** in a single call (no browser, no cookie):

```bash
export ID_TOKEN=$(curl -sk "https://<platform>/dex/token" \
-d grant_type=password \
--data-urlencode "username=$MLFLOW_USERNAME" \
--data-urlencode "password=$MLFLOW_PASSWORD" \
-d scope="openid email groups" \
-d client_id="$DEX_CLIENT_ID" --data-urlencode "client_secret=$DEX_CLIENT_SECRET" \
| jq -r .id_token)
```

### 2. Point the SDK at the MLflow route with the token

The SDK reads `MLFLOW_TRACKING_TOKEN` and sends it as `Authorization: Bearer …`:

```python
import os
import mlflow

os.environ["MLFLOW_TRACKING_TOKEN"] = os.environ["ID_TOKEN"].strip() # → Authorization: Bearer
mlflow.set_tracking_uri("http://mlflow-tracking-server.kubeflow:5000") # in-cluster Service (fronted by the OAuth proxy)
mlflow.set_workspace("team-a") # workspace namespace → X-MLFLOW-WORKSPACE
mlflow.set_experiment("my-experiment")

with mlflow.start_run(run_name="sdk-quickstart") as run:
mlflow.log_param("learning_rate", 2e-4)
mlflow.log_metric("loss", 0.123)
print("run:", run.info.run_id)
```

The run appears under **Alauda AI → Tools → MLFlow**, owned by the username you authenticated as. (Verified end-to-end on a secured install: the run owner is the token's user identity.)

Use the in-cluster Service URL `http://mlflow-tracking-server.kubeflow:5000` when the client runs **inside** the cluster (pipeline components, Workbench notebooks). From **outside** the cluster, point at the platform route `https://<platform>/clusters/<cluster>/mlflow` instead — both reach the same OAuth proxy.

:::warning
The password grant sends the password to the token endpoint, so use a **dedicated service account** and keep the credentials and client secret in a Kubernetes `Secret`, never in code. Always `.strip()` the token (a trailing newline produces `Invalid … character(s) in header value: 'Bearer …\n'`). id tokens expire (24 h by default), so re-run step 1 to refresh for long-running jobs. If you use the external HTTPS route and the platform certificate is not trusted by your machine, set `MLFLOW_TRACKING_INSECURE_TLS=true`.
:::

## Selecting a workspace

Runs are recorded in the workspace you select; if you select none, the server's default workspace is used. Any of these set it (the SDK turns them into the `X-MLFLOW-WORKSPACE` header):

- `mlflow.set_workspace("team-a")` in code,
- the `MLFLOW_WORKSPACE=team-a` environment variable.

You can only use a workspace your account has access to; see [Workspace Access](./mlflow.mdx).

## Registering models

The model registry is workspace-scoped and authorized the same way, so the usual SDK calls work once connected:

```python
mlflow.set_workspace("team-a")
with mlflow.start_run():
mlflow.sklearn.log_model(sk_model, name="model", registered_model_name="fraud-detector")
```

Promote the registered version to **Staging** or **Production** from the MLflow UI.

## Interactive alternative: browser session

If you cannot use the password grant (for example you only have an interactive SSO login), present your browser session instead — this works without the `--skip-jwt-bearer-tokens` setting. Sign in at **Alauda AI → Tools → MLFlow**, copy the `_oauth2_proxy` cookie from the browser developer tools (**Application/Storage → Cookies**; include any `_oauth2_proxy_N` chunks, joined with `; `), and attach it to every request with a header provider:

```python
import os, mlflow
from mlflow.tracking.request_header.abstract_request_header_provider import RequestHeaderProvider
from mlflow.tracking.request_header.registry import _request_header_provider_registry

class ProxySessionHeader(RequestHeaderProvider):
def in_context(self):
return bool(os.environ.get("MLFLOW_PROXY_COOKIE")) # export MLFLOW_PROXY_COOKIE='_oauth2_proxy=<value>'
def request_headers(self):
return {"Cookie": os.environ["MLFLOW_PROXY_COOKIE"]}

_request_header_provider_registry.register(ProxySessionHeader)
mlflow.set_tracking_uri("https://<platform>/clusters/<cluster>/mlflow")
mlflow.set_workspace("team-a")
```

The session cookie expires — copy a fresh one when calls start returning a login redirect.

## Troubleshooting

| Symptom | Check |
|---------|-------|
| `/dex/token` returns `unsupported_grant_type` / "password grant … not allowed" | The Dex client does not permit the password grant. Use a client whose `grantTypes` include `password` (see [Platform setup](#platform-setup-administrator-one-time)). |
| Call returns HTML or a redirect (`302` to the login page) | The OAuth proxy rejected the bearer token. Confirm `--skip-jwt-bearer-tokens` is enabled and the token is a valid Dex id token (`aud` = the proxy's client). For the cookie alternative, your `_oauth2_proxy` value is missing or expired. |
| `Invalid … character(s) in header value: 'Bearer …\n'` | The token has trailing whitespace. Set `MLFLOW_TRACKING_TOKEN` to the `.strip()`-ed value. |
| `Failed to query /api/3.0/mlflow/server-info` | The SDK could not reach the server through the proxy — verify the tracking URI is the platform MLflow route and the token is valid. |
| `403 PERMISSION_DENIED` | Your account lacks access to the workspace namespace. Request access to the workspace (see [Workspace Access](./mlflow.mdx)); no ServiceAccount is involved. |
| Run shows the wrong owner or workspace | The owner is your authenticated identity; the workspace is `set_workspace()` / `MLFLOW_WORKSPACE` (else the server default). Check both. |
2 changes: 2 additions & 0 deletions docs/en/kubeflow/how_to/mlflow.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -69,6 +69,8 @@ subjects:

## Client Configuration

For authenticating the MLflow Python SDK with a user identity token — including the in-cluster connection details and RBAC — see [Using the MLflow Python SDK with Authentication and RBAC](./mlflow-python-sdk.mdx).

Set the MLflow tracking URI to the platform route and select the workspace:

```python
Expand Down
12 changes: 2 additions & 10 deletions docs/en/training_guides/fine-tune-with-trainer-v2.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -947,15 +947,7 @@
"cell_type": "markdown",
"id": "27d2b476",
"metadata": {},
"source": [
"## Step 5: View Training Metrics in MLflow\n",
"\n",
"If `MLFLOW_TRACKING_URI` is set and the MLflow server is reachable from the training pod, LlamaFactory will log metrics (loss, learning rate, etc.) to MLflow automatically via `report_to: mlflow` in the training config.\n",
"\n",
"To open the MLflow UI, go to **Alauda AI** - **Tools** - **MLFlow** (need MLFlow Cluster plugin installed). Look for the experiment named by `MLFLOW_EXPERIMENT_NAME`.\n",
"\n",
"Each `TrainJob` run will appear as a separate MLflow **run** under the same experiment, making it easy to compare training curves across different models and hyperparameters."
]
"source": "## Step 5: View Training Metrics in MLflow\n\nIf `MLFLOW_TRACKING_URI` is set and the MLflow server is reachable from the training pod, LlamaFactory will log metrics (loss, learning rate, etc.) to MLflow automatically via `report_to: mlflow` in the training config.\n\nOn a secured (SSO + multi-tenant) MLflow install the trainer must also authenticate — set `MLFLOW_TRACKING_TOKEN` and select a workspace. See [Using the MLflow Python SDK with Authentication and RBAC](../kubeflow/how_to/mlflow-python-sdk.mdx) for how to obtain the token and how authorization/RBAC work.\n\nTo open the MLflow UI, go to **Alauda AI** - **Tools** - **MLFlow** (need MLFlow Cluster plugin installed). Look for the experiment named by `MLFLOW_EXPERIMENT_NAME`.\n\nEach `TrainJob` run will appear as a separate MLflow **run** under the same experiment, making it easy to compare training curves across different models and hyperparameters."
},
{
"cell_type": "markdown",
Expand Down Expand Up @@ -1060,4 +1052,4 @@
},
"nbformat": 4,
"nbformat_minor": 5
}
}
6 changes: 4 additions & 2 deletions docs/en/training_guides/fine-tuning-using-notebooks.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -325,7 +325,9 @@ After success the merged model is pushed to a date-stamped branch (`sft-YYYYMMDD

## 8. Experiment tracking

Setting `report_to: mlflow` in the LLaMA-Factory config plus the `MLFLOW_TRACKING_URI` / `MLFLOW_EXPERIMENT_NAME` env vars routes metrics to MLflow. Find runs in **Alauda AI → Advanced → MLFlow**, compare loss curves, and pin the winning run.
Setting `report_to: mlflow` in the LLaMA-Factory config plus the `MLFLOW_TRACKING_URI` / `MLFLOW_EXPERIMENT_NAME` env vars routes metrics to MLflow. Find runs in **Alauda AI → Tools → MLFlow**, compare loss curves, and pin the winning run.

On a secured (SSO + multi-tenant) MLflow install the job must also authenticate — supply an `MLFLOW_TRACKING_TOKEN` and select a workspace. See [Using the MLflow Python SDK with Authentication and RBAC](../kubeflow/how_to/mlflow-python-sdk.mdx) for how to obtain the token and configure the client.

## 9. Publish the fine-tuned model

Expand Down Expand Up @@ -412,4 +414,4 @@ spec:

### Experiment tracking on other devices

LLaMA-Factory and Transformers integrate with MLflow / wandb directly. Set the destination in the framework config (e.g. `report_to: mlflow` for LLaMA-Factory) and supply `MLFLOW_TRACKING_URI` and `MLFLOW_EXPERIMENT_NAME` env vars. View results under **Alauda AI → Advanced → MLFlow**.
LLaMA-Factory and Transformers integrate with MLflow / wandb directly. Set the destination in the framework config (e.g. `report_to: mlflow` for LLaMA-Factory) and supply `MLFLOW_TRACKING_URI` and `MLFLOW_EXPERIMENT_NAME` env vars (plus `MLFLOW_TRACKING_TOKEN` on a secured install — see [Using the MLflow Python SDK with Authentication and RBAC](../kubeflow/how_to/mlflow-python-sdk.mdx)). View results under **Alauda AI → Tools → MLFlow**.
Loading