Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
25 changes: 25 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,12 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0

### Changed

- Reserved the `sub::` marker for runtime-generated sub-orchestration instance ids.
`Client::start_orchestration` and `Client::start_orchestration_versioned` now
return `ClientError::InvalidInput` for root instance ids that start with `sub::`
or contain `::sub::`; other uses of `::` remain supported. Applications that used
the reserved marker in root instance ids must rename those ids before upgrading.
See [docs/migration-guide.md](docs/migration-guide.md) for guidance.
- **`ctx.new_guid()` now returns a standard UUID v4.** The previous
implementation derived the value from `SystemTime::now()` nanoseconds plus a
thread-local counter, which produced low-entropy, structured values (the
Expand All @@ -19,6 +25,25 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
`nanos + process id`, removing a predictable-token pattern in work-item
ownership checks.

### Fixed

- **Parent hang on sub-orchestration instance-id collision** — When an auto-generated
child instance id already named a terminal instance, the scheduling parent could await
a completion that never arrived. The runtime now notifies the parent with a
sub-orchestration failure so it fails fast. The parent execution that scheduled the child
is stamped onto the child start at schedule time and persisted in the child's
`OrchestrationStarted` event, so the failure (and all sub-orchestration completion/failure
notifications) is routed to exactly that parent execution. This is correct across runtime
restarts and multiple dispatcher nodes, and avoids a TOCTOU window where the parent's
*current* execution at completion time could differ from the execution that scheduled the
child. When the stamp is absent (children started by an older runtime, or work items from
before this change), routing falls back to a durable provider read, keeping mixed-version
clusters correct during rolling upgrades.
- **Sub-orchestration id reuse across continue-as-new** — Child instance ids generated
after a parent `continue_as_new` now include the parent execution id
(`{parent}::sub::{execution_id}_{event_id}`), preventing collisions with the terminal
child of a previous iteration that schedules at the same position.

## [0.1.29] - 2026-05-08

**Release:** <https://crates.io/crates/duroxide/0.1.29>
Expand Down
52 changes: 52 additions & 0 deletions docs/migration-guide.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,58 @@

This guide helps you migrate between Duroxide versions and handle orchestration versioning.

## Reserved `sub::` instance-id marker (Unreleased)

The `sub::` marker is now reserved for runtime-generated sub-orchestration instance ids.
`Client::start_orchestration` and `Client::start_orchestration_versioned` reject root
instance ids that:

- start with `sub::`, or
- contain the `::sub::` infix.

Such ids return `ClientError::InvalidInput`. Ordinary uses of `::` in instance ids remain
valid (e.g. `tenant-7::order-42`); only the `sub::` marker is reserved.

This prevents a root instance id from pre-occupying an auto-generated child id. Child
sub-orchestration ids take the form `{parent}::sub::{event_id}` on the first parent
execution and `{parent}::sub::{execution_id}_{event_id}` after `continue_as_new`.

Before upgrading client code, audit your root instance-id scheme for the reserved marker:

```text
# Reject — start with `sub::` or contain `::sub::`
sub::job-1
tenant-7::sub::order-42

# Accept — ordinary `::` is fine
tenant-7::order-42
order-2026-06-09
```

Rename any root instance ids that use the reserved marker before upgrading.

## Durable sub-orchestration routing (`parent_execution_id`)

Sub-orchestration completion and failure notifications are now routed to the exact parent
execution that scheduled the child. To do this, the scheduling parent's execution id is
stamped onto the child's start and persisted in the child's history:

- `WorkItem::StartOrchestration` gains an optional `parent_execution_id` field.
- `EventKind::OrchestrationStarted` gains an optional `parent_execution_id` field.

Both fields are `Option<u64>`, serialized with `#[serde(default, skip_serializing_if = "Option::is_none")]`,
so the wire and history formats remain backward compatible:

- **Old → new:** A new runtime reading an old child history (or an old work item) sees
`parent_execution_id = None` and falls back to a durable provider read of the parent's
current execution — the previous behavior.
- **New → old:** An old runtime ignores the extra field (it is skipped when absent and not
required when deserializing).

No action is required to upgrade. Mixed-version clusters route correctly during a rolling
upgrade. The provider-read fallback is retained only for histories/work items created before
this change.

## Orchestration Versioning

Duroxide supports versioning to handle code evolution while maintaining compatibility with running instances.
Expand Down
36 changes: 34 additions & 2 deletions src/client/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -168,6 +168,25 @@ pub struct Client {
store: Arc<dyn Provider>,
}

/// Reject instance ids that collide with the reserved sub-orchestration markers.
///
/// Child sub-orchestration instance ids reserve the `sub::` marker (see
/// [`crate::auto_sub_orch_suffix`], the canonical formatter). The first parent
/// execution uses `{parent}::sub::{event_id}`; executions after continue-as-new use
/// `{parent}::sub::{execution_id}_{event_id}`. A user-supplied id matching either form
/// could pre-occupy a future child id, so the `sub::` prefix and `::sub::` infix are
/// reserved. Other uses of `::` remain valid.
fn validate_instance_id(instance: &str) -> Result<(), ClientError> {
if instance.starts_with(crate::SUB_ORCH_AUTO_PREFIX) || instance.contains("::sub::") {
return Err(ClientError::InvalidInput {
message: format!(
"instance id '{instance}' uses the reserved sub-orchestration marker 'sub::'"
),
});
}
Ok(())
}

impl Client {
/// Create a client bound to a Provider instance.
///
Expand Down Expand Up @@ -211,6 +230,9 @@ impl Client {
/// - Must be unique across all orchestrations
/// - Can be any string (alphanumeric + hyphens recommended)
/// - Reusing an instance ID that already exists will fail
/// - Must not use the reserved sub-orchestration marker `sub::` (as a prefix
/// or in the `::sub::` form); these are reserved for auto-generated child
/// instance ids. Such ids are rejected with [`ClientError::InvalidInput`].
///
/// # Example
///
Expand All @@ -230,20 +252,25 @@ impl Client {
///
/// # Errors
///
/// Returns `ClientError::InvalidInput` if the instance id uses the reserved
/// `sub::` marker.
/// Returns `ClientError::Provider` if the provider fails to enqueue the orchestration.
pub async fn start_orchestration(
&self,
instance: impl Into<String>,
orchestration: impl Into<String>,
input: impl Into<String>,
) -> Result<(), ClientError> {
let instance = instance.into();
validate_instance_id(&instance)?;
let item = WorkItem::StartOrchestration {
instance: instance.into(),
instance,
orchestration: orchestration.into(),
input: input.into(),
version: None,
parent_instance: None,
parent_id: None,
parent_execution_id: None,
execution_id: crate::INITIAL_EXECUTION_ID,
};
self.store
Expand All @@ -256,6 +283,8 @@ impl Client {
///
/// # Errors
///
/// Returns `ClientError::InvalidInput` if the instance id uses the reserved
/// `sub::` marker.
/// Returns `ClientError::Provider` if the provider fails to enqueue the orchestration.
pub async fn start_orchestration_versioned(
&self,
Expand All @@ -264,13 +293,16 @@ impl Client {
version: impl Into<String>,
input: impl Into<String>,
) -> Result<(), ClientError> {
let instance = instance.into();
validate_instance_id(&instance)?;
let item = WorkItem::StartOrchestration {
instance: instance.into(),
instance,
orchestration: orchestration.into(),
input: input.into(),
version: Some(version.into()),
parent_instance: None,
parent_id: None,
parent_execution_id: None,
execution_id: crate::INITIAL_EXECUTION_ID,
};
self.store
Expand Down
46 changes: 45 additions & 1 deletion src/lib.rs
Original file line number Diff line number Diff line change
Expand Up @@ -873,6 +873,24 @@ pub fn is_auto_generated_sub_orch_id(instance: &str) -> bool {
instance.starts_with(SUB_ORCH_AUTO_PREFIX)
}

/// Build the auto-generated sub-orchestration suffix for a given parent execution
/// and scheduling event id.
///
/// The first execution uses `sub::{event_id}` for backward compatibility. Later
/// executions (after `continue_as_new`) include the execution id as
/// `sub::{execution_id}_{event_id}`: event ids reset on continue-as-new, so a parent
/// that schedules a sub-orchestration at the same position on each iteration would
/// otherwise regenerate an identical child id and collide with the now-terminal
/// child from the previous iteration.
#[inline]
pub(crate) fn auto_sub_orch_suffix(execution_id: u64, event_id: u64) -> String {
if execution_id == INITIAL_EXECUTION_ID {
format!("{SUB_ORCH_AUTO_PREFIX}{event_id}")
} else {
format!("{SUB_ORCH_AUTO_PREFIX}{execution_id}_{event_id}")
}
}

/// Build the full child instance ID, adding parent prefix only for auto-generated IDs.
///
/// - Auto-generated IDs (starting with "sub::"): `{parent}::{child}` (e.g., `parent-1::sub::5`)
Expand Down Expand Up @@ -1099,6 +1117,13 @@ pub enum EventKind {
input: String,
parent_instance: Option<String>,
parent_id: Option<u64>,
/// Execution id of the parent that scheduled this sub-orchestration, persisted at
/// child-start time. Used to route this child's completion/failure back to the
/// exact parent execution that awaited it. `None` for root orchestrations and for
/// children started by older runtimes (routing then falls back to a provider read).
#[serde(skip_serializing_if = "Option::is_none")]
#[serde(default)]
parent_execution_id: Option<u64>,
/// Persistent events carried forward from the previous execution during continue-as-new.
/// Present only on CAN-initiated executions for audit trail. Each tuple is (event_name, data).
#[serde(skip_serializing_if = "Option::is_none")]
Expand Down Expand Up @@ -3732,7 +3757,22 @@ impl OrchestrationContext {
/// without any parent prefix. Use this when you need to control the exact
/// instance ID for the sub-orchestration.
///
/// For auto-generated instance IDs, use [`schedule_sub_orchestration`] instead.
/// For auto-generated instance IDs, use [`schedule_sub_orchestration`](Self::schedule_sub_orchestration)
/// instead.
///
/// # Reserved marker (advanced escape hatch)
///
/// Unlike [`crate::Client::start_orchestration`], explicit child ids are **not**
/// validated against the reserved `sub::` marker — they are an advanced escape hatch
/// where the caller owns the full id space. Two consequences to be aware of:
///
/// - An explicit id of the runtime-generated shape (e.g. `parent::sub::2`) is allowed
/// and may therefore collide with an auto-generated child id. The runtime defends
/// against the resulting collision: if the id already names a terminal instance the
/// scheduling parent receives a sub-orchestration failure instead of hanging.
/// - An explicit id that *starts with* `sub::` is treated as auto-generated by
/// [`crate::build_child_instance_id`] and therefore gets the parent prefix added,
/// so it is **not** used verbatim. Avoid leading `sub::` in explicit ids.
pub fn schedule_sub_orchestration_with_id(
&self,
name: impl Into<String>,
Expand Down Expand Up @@ -3760,6 +3800,10 @@ impl OrchestrationContext {
/// The provided `instance` value is used exactly as the child instance ID,
/// without any parent prefix.
///
/// Like [`schedule_sub_orchestration_with_id`](Self::schedule_sub_orchestration_with_id),
/// explicit child ids are an advanced escape hatch and are **not** validated against the
/// reserved `sub::` marker; see that method for the collision and leading-`sub::` caveats.
///
/// Returns a [`DurableFuture`] that supports cancellation on drop. If the future
/// is dropped without completing, a `CancelInstance` work item will be enqueued
/// for the child orchestration.
Expand Down
6 changes: 6 additions & 0 deletions src/provider_validation/atomicity.rs
Original file line number Diff line number Diff line change
Expand Up @@ -37,6 +37,7 @@ pub async fn test_atomicity_failure_rollback<F: ProviderFactory>(factory: &F) {
input: "{}".to_string(),
parent_instance: None,
parent_id: None,
parent_execution_id: None,
carry_forward_events: None,
initial_custom_status: None,
},
Expand Down Expand Up @@ -80,6 +81,7 @@ pub async fn test_atomicity_failure_rollback<F: ProviderFactory>(factory: &F) {
input: "{}".to_string(),
parent_instance: None,
parent_id: None,
parent_execution_id: None,
carry_forward_events: None,
initial_custom_status: None,
},
Expand Down Expand Up @@ -142,6 +144,7 @@ pub async fn test_multi_operation_atomic_ack<F: ProviderFactory>(factory: &F) {
input: "{}".to_string(),
parent_instance: None,
parent_id: None,
parent_execution_id: None,
carry_forward_events: None,
initial_custom_status: None,
},
Expand Down Expand Up @@ -327,6 +330,7 @@ pub async fn test_lock_released_only_on_successful_ack<F: ProviderFactory>(facto
input: "{}".to_string(),
parent_instance: None,
parent_id: None,
parent_execution_id: None,
carry_forward_events: None,
initial_custom_status: None,
},
Expand Down Expand Up @@ -402,6 +406,7 @@ pub async fn test_concurrent_ack_prevention<F: ProviderFactory>(factory: &F) {
input: "{}".to_string(),
parent_instance: None,
parent_id: None,
parent_execution_id: None,
carry_forward_events: None,
initial_custom_status: None,
},
Expand Down Expand Up @@ -430,6 +435,7 @@ pub async fn test_concurrent_ack_prevention<F: ProviderFactory>(factory: &F) {
input: "{}".to_string(),
parent_instance: None,
parent_id: None,
parent_execution_id: None,
carry_forward_events: None,
initial_custom_status: None,
},
Expand Down
2 changes: 2 additions & 0 deletions src/provider_validation/bulk_deletion.rs
Original file line number Diff line number Diff line change
Expand Up @@ -351,6 +351,7 @@ async fn create_completed_instance_with_parent(
version: Some("1.0.0".to_string()),
parent_instance: Some(parent_id.to_string()),
parent_id: Some(1),
parent_execution_id: None,
execution_id: INITIAL_EXECUTION_ID,
},
Some(parent_id.to_string()),
Expand Down Expand Up @@ -378,6 +379,7 @@ async fn create_completed_instance_with_parent(
input: "{}".to_string(),
parent_instance: parent_instance_id.clone(),
parent_id: if parent_instance_id.is_some() { Some(1) } else { None },
parent_execution_id: None,
carry_forward_events: None,
initial_custom_status: None,
},
Expand Down
Loading
Loading