diff --git a/PARALLELIZATION_STRATEGY.md b/PARALLELIZATION_STRATEGY.md
index e888452c..59e7624d 100644
--- a/PARALLELIZATION_STRATEGY.md
+++ b/PARALLELIZATION_STRATEGY.md
@@ -15,8 +15,8 @@ using `concurrent.futures.ProcessPoolExecutor`.
 The entry point is the `n_jobs` constructor parameter (default `1`).
 
 ```
-n_jobs == 1  →  _optimize_sequential_run()   (single process, synchronous)
-n_jobs  > 1  →  _optimize_steady_state()     (multi-process, asynchronous)
+n_jobs == 1  →  optimize_sequential_run()   (single process, synchronous)
+n_jobs  > 1  →  optimize_steady_state()     (multi-process, asynchronous)
 ```
 
 ### 1.2 Steady-State Architecture
@@ -254,7 +254,7 @@ search tasks.  The surrogate model and all data are shared by reference.
 Memory footprint per active search task drops from O(surrogate size) to O(1).
 
 **Compatibility:** The public API (`n_jobs`, `optimize()`) does not change.
-The internal `_optimize_steady_state` method is refactored.
+The internal `optimize_steady_state` method is refactored.
 
 **Risk:** Low.  Thread safety must be verified for surrogate `predict()` calls
 made concurrently.  sklearn's `GaussianProcessRegressor.predict` is
@@ -333,7 +333,7 @@ computation inside `suggest_next_infill_point()` — no code change is required
 for that path.
 
 **Change:** Added GIL-status detection via a module-level helper and
-`contextlib.ExitStack`-based executor selection in `_optimize_steady_state`:
+`contextlib.ExitStack`-based executor selection in `optimize_steady_state`:
 
 ```python
 import sys
@@ -343,7 +343,7 @@ def _is_gil_disabled() -> bool:
     return not getattr(sys, "_is_gil_enabled", lambda: True)()
 ```
 
-Executor selection inside `_optimize_steady_state`:
+Executor selection inside `optimize_steady_state`:
 
 ```python
 from contextlib import ExitStack
diff --git a/_quarto.yml b/_quarto.yml
index e74d98e5..a610e52a 100644
--- a/_quarto.yml
+++ b/_quarto.yml
@@ -49,8 +49,12 @@ website:
         file: docs/examples.qmd
       - text: Download
         file: docs/download.qmd
-      - text: SpotOptim `__init__()` method
+      - text: Initialization
         file: docs/spotoptim_init.qmd
+      - text: Sequential Optimization
+        file: docs/optimize_seq.qmd
+      - text: Parallel Optimization
+        file: docs/optimize_parallel.qmd
       - section: API Reference
         contents:
           - text: Overview
diff --git a/docs/optimize_parallel.qmd b/docs/optimize_parallel.qmd
new file mode 100644
index 00000000..cefda30e
--- /dev/null
+++ b/docs/optimize_parallel.qmd
@@ -0,0 +1,725 @@
+---
+title: "SpotOptim: Parallel Optimization"
+description: "Step-by-step walkthrough of execute_optimization_run() and every method it calls along the parallel (steady-state) path, with executable examples validated by pytest."
+---
+
+This document traces every step executed by `SpotOptim.execute_optimization_run()` along
+the parallel code path (`n_jobs > 1`), in the order they occur.
+Each section describes one method or phase with a `{python}` code block that can be executed directly.
+
+The public entry point is `optimize()`, which manages the outer restart loop and delegates
+each cycle to `execute_optimization_run()`.
+When `n_jobs > 1`, that dispatcher routes to `optimize_steady_state()`,
+which implements a hybrid steady-state parallelisation strategy that overlaps surrogate
+search with objective function evaluation.
+This document covers that path in full.
+
+Run all related tests with:
+
+```bash
+uv run pytest tests/test_spotoptim_deep.py -v
+```
+
+---
+
+## Step 1 — Dispatch (`execute_optimization_run()`)
+
+```python
+if self.n_jobs > 1:
+    return self.optimize_steady_state(...)
+else:
+    return self.optimize_sequential_run(...)
+```
+
+`execute_optimization_run()` is the routing layer between the outer restart loop in
+`optimize()` and the actual optimisation engine.
+Its sole responsibility is to examine `n_jobs` and forward all arguments to either
+`optimize_steady_state()` (parallel) or `optimize_sequential_run()` (sequential).
+It returns a `(status, OptimizeResult)` tuple in both cases, which `optimize()` uses to
+decide whether to restart or terminate.
+The optional `shared_best_y` and `shared_lock` parameters support inter-worker
+coordination; they are unused in the current steady-state implementation and reserved
+for future multi-restart parallelism.
+
+```{python}
+import time
+import numpy as np
+from spotoptim import SpotOptim
+from spotoptim.function import sphere
+
+opt = SpotOptim(fun=sphere, bounds=[(-5, 5), (-5, 5)],
+                n_initial=5, max_iter=10, seed=0, n_jobs=2)
+status, result = opt.execute_optimization_run(timeout_start=time.time())
+print(f"status : {status}")
+print(f"best   : {result.fun:.6f}")
+assert status == "FINISHED"
+print("dispatch check passed.")
+```
+
+---
+
+## Step 2 — Parallel Run Orchestration (`optimize_steady_state()`)
+
+```python
+self.set_seed()
+X0 = self.get_initial_design(X0)
+X0 = self.curate_initial_design(X0)
+# Phase 1: parallel initial evaluation
+# Phase 2: steady-state loop
+return "FINISHED", OptimizeResult(...)
+```
+
+`optimize_steady_state()` is the parallel orchestrator.
+It follows the same preparatory sequence as its sequential counterpart — seeding the
+random number generator, generating and curating the initial design — before switching
+to a pool-based execution model.
+The method never returns `"RESTART"`: unlike the sequential loop, the steady-state engine
+does not implement a zero-success-rate restart criterion, and always exits with
+`"FINISHED"`.
+All worker pools are managed inside a single `contextlib.ExitStack` that guarantees
+orderly shutdown on success and on exception.
+
+```{python}
+import time
+from spotoptim import SpotOptim
+from spotoptim.function import sphere
+
+opt = SpotOptim(fun=sphere, bounds=[(-5, 5), (-5, 5)],
+                n_initial=5, max_iter=10, seed=0, n_jobs=2)
+status, result = opt.optimize_steady_state(timeout_start=time.time(), X0=None)
+print(f"status      : {status}")
+print(f"evaluations : {result.nfev}")
+print(f"best        : {result.fun:.6f}")
+assert status == "FINISHED"
+print("parallel orchestration check passed.")
+```
+
+---
+
+## Step 3 — Seed and Initial Design (`set_seed()`, `get_initial_design()`, `curate_initial_design()`)
+
+```python
+self.set_seed()
+X0 = self.get_initial_design(X0)
+X0 = self.curate_initial_design(X0)
+```
+
+The parallel path begins with the same three preparatory calls as the sequential path.
+`set_seed()` re-seeds Python's `random` module and NumPy's global generator to ensure
+reproducibility.
+`get_initial_design()` either processes the user-supplied `X0` or generates a Latin
+Hypercube sample in the transformed, reduced search space.
+`curate_initial_design()` removes duplicate points and generates replacements as needed.
+Because points are dispatched concurrently to worker processes or threads in the next
+phase, curation must be complete before submission: worker processes operate on a
+snapshot of the optimiser serialised with `dill` and cannot interact with the
+main-process state.
+Immediately after curation, restart injection is applied: a `y0_prefilled` array (all
+`NaN` by default) is populated with `y0_known` at the position of the matching `self.x0`
+point, using the same distance tolerance as `_initialize_run()` in the sequential path.
+This pre-filled array is consumed by Phase 1 to skip one pool submission per restart.
+
+```{python}
+import numpy as np
+from spotoptim import SpotOptim
+from spotoptim.function import sphere
+
+opt = SpotOptim(fun=sphere, bounds=[(-5, 5), (-5, 5)], n_initial=5, seed=42, n_jobs=2)
+opt.set_seed()
+X0 = opt.get_initial_design(None)
+X0 = opt.curate_initial_design(X0)
+print(f"initial design shape : {X0.shape}")
+assert X0.shape == (5, 2)
+assert len(np.unique(X0, axis=0)) == 5
+print("seed and initial design check passed.")
+```
+
+---
+
+## Step 4 — GIL Detection and Executor Construction (`_is_gil_disabled()`)
+
+```python
+_no_gil = _is_gil_disabled()
+with ExitStack() as _stack:
+    eval_pool = _stack.enter_context(
+        ThreadPoolExecutor(max_workers=self.n_jobs) if _no_gil
+        else ProcessPoolExecutor(max_workers=self.n_jobs)
+    )
+    search_pool = _stack.enter_context(
+        ThreadPoolExecutor(max_workers=self.n_jobs)
+    )
+```
+
+`_is_gil_disabled()` queries `sys._is_gil_enabled()` (Python 3.13+) to detect whether
+the interpreter was built without the Global Interpreter Lock.
+On standard GIL builds — Python 3.12 or GIL-enabled 3.13 — objective evaluations are
+dispatched to a `ProcessPoolExecutor` so that each worker runs in a separate process,
+giving true CPU-level parallelism and safe isolation of arbitrary callables.
+Surrogate search tasks are always dispatched to a `ThreadPoolExecutor` because they
+share the main-process heap and require no serialisation.
+On free-threaded builds (`python3.13t`) both pools are `ThreadPoolExecutor` instances:
+threads achieve true parallelism without the GIL, `dill` serialisation is eliminated,
+and the objective function is called directly from the shared heap.
+The `_surrogate_lock` (a `threading.Lock`) is used in both configurations to serialise
+concurrent surrogate reads and refits.
+
+```{python}
+from spotoptim.SpotOptim import _is_gil_disabled
+
+result = _is_gil_disabled()
+print(f"GIL disabled: {result}")
+assert isinstance(result, bool)
+print("GIL detection check passed.")
+```
+
+---
+
+## Step 5 — Phase 1: Parallel Initial Submission
+
+```python
+n_to_submit = 0
+for i, x in enumerate(X0):
+    if np.isfinite(y0_prefilled[i]):
+        # Restart injection: store directly, skip the pool.
+        self._update_storage_steady(x, y0_prefilled[i])
+        continue
+    if _no_gil:
+        fut = eval_pool.submit(_thread_eval_task_single, x)
+    else:
+        pickled_args = dill.dumps((self, x))
+        fut = eval_pool.submit(remote_eval_wrapper, pickled_args)
+    futures[fut] = "eval"
+    n_to_submit += 1
+```
+
+The initial design is partitioned into two groups before any worker is touched.
+Points that carry a pre-filled `y0_prefilled` value — set during Step 4 for the
+restart-injected best point — are stored on the main thread via
+`_update_storage_steady()` and skipped entirely.
+The remaining `n_to_submit` points are submitted to `eval_pool` concurrently.
+On GIL builds each point is serialised together with a snapshot of the optimiser using
+`dill.dumps()`; the TensorBoard writer is temporarily set to `None` before serialisation
+because `SummaryWriter` objects are not picklable.
+`remote_eval_wrapper()` unpickles the arguments in the worker process, reshapes the
+point to a `(1, d)` array, calls `evaluate_function()`, and returns the `(x, y)` pair.
+On free-threaded builds `_thread_eval_task_single()` calls `evaluate_function()` without
+any serialisation overhead.
+All submitted futures are tracked in a `futures` dictionary keyed by `Future` object
+with the string tag `"eval"`.
+When `y0_known` is `None` (no restart), `y0_prefilled` is all-NaN and `n_to_submit`
+equals `n_initial`, so behaviour is identical to a fresh run.
+
+```{python}
+import time
+import numpy as np
+from spotoptim import SpotOptim
+from spotoptim.function import sphere
+
+# Fresh run — all n_initial points are submitted to the pool.
+opt = SpotOptim(fun=sphere, bounds=[(-5, 5), (-5, 5)],
+                n_initial=8, max_iter=8, seed=0, n_jobs=2)
+result = opt.optimize()
+assert result.nfev == 8
+print(f"evaluations (no injection): {result.nfev}")
+
+# Restart injection — the known best point is stored directly, not re-evaluated.
+x_inject = np.array([0.5, -0.5])
+opt2 = SpotOptim(fun=sphere, bounds=[(-5, 5), (-5, 5)],
+                 n_initial=6, max_iter=6, seed=1, n_jobs=2,
+                 x0=x_inject)
+opt2.optimize_steady_state(
+    timeout_start=time.time(),
+    X0=None,
+    y0_known=float(np.sum(x_inject**2)),
+)
+assert float(np.sum(x_inject**2)) in opt2.y_
+print(f"injected value present in y_: {float(np.sum(x_inject**2)):.4f}")
+print("parallel initial submission check passed.")
+```
+
+---
+
+## Step 6 — Initial Design Collection (`_update_storage_steady()`)
+
+```python
+while initial_done_count < len(X0):
+    done, _ = wait(futures.keys(), return_when=FIRST_COMPLETED)
+    for fut in done:
+        ftype = futures.pop(fut)
+        if ftype != "eval":
+            continue
+        x_done, y_done = fut.result()
+        if not isinstance(y_done, Exception):
+            self._update_storage_steady(x_done, y_done)
+        initial_done_count += 1
+```
+
+The main thread waits in a loop using `concurrent.futures.wait()` with
+`return_when=FIRST_COMPLETED`, processing each completed future as it arrives.
+Futures tagged `"eval"` are the only type active during Phase 1; any unexpected type
+is silently skipped.
+When a result is an `Exception` instance — indicating a worker-side failure — the point
+is dropped and, if `verbose=True`, the error is printed; the initial design count still
+advances so the loop terminates correctly.
+For valid results, `_update_storage_steady()` appends the point in original scale to
+`X_` and its objective value to `y_`, initialising both arrays on the first call.
+It also updates `best_x_`, `best_y_`, `min_y`, and `min_X` in-place whenever the new
+value improves on the current best.
+Because the main thread is the only writer during Phase 1, no locking is required here.
+
+```{python}
+import numpy as np
+from spotoptim import SpotOptim
+from spotoptim.function import sphere
+
+opt = SpotOptim(fun=sphere, bounds=[(-5, 5), (-5, 5)],
+                n_initial=5, max_iter=5, seed=0, n_jobs=2)
+opt.optimize()
+print(f"X_ shape  : {opt.X_.shape}")
+print(f"y_        : {opt.y_}")
+assert opt.X_.shape[0] >= 1
+assert np.isfinite(opt.best_y_)
+print("_update_storage_steady check passed.")
+```
+
+---
+
+## Step 7 — Initial Postprocessing (`_init_tensorboard()`, `update_stats()`, `get_best_xy_initial_design()`)
+
+```python
+self._init_tensorboard()
+if self.y_ is None or len(self.y_) == 0:
+    raise RuntimeError(...)
+self.update_stats()
+self.get_best_xy_initial_design()
+```
+
+Once all initial evaluations have completed, three postprocessing steps are applied in
+fixed order.
+`_init_tensorboard()` logs each initial-design point to TensorBoard as a separate
+hyperparameter run; when `tensorboard_log=False` (the default) it is a no-op.
+A guard check follows: if `y_` is `None` or empty, every worker evaluation failed and
+a `RuntimeError` is raised with a diagnostic message pointing to likely causes such as
+unpicklable callables or missing imports inside the worker process.
+`update_stats()` refreshes `min_y`, `min_X`, `counter`, and, for noisy objectives,
+aggregated per-point means and variances.
+`get_best_xy_initial_design()` identifies the initial best solution and writes it to
+`best_x_` and `best_y_`; unlike the sequential path, these attributes are updated
+incrementally by `_update_storage_steady()` throughout the loop, so this call aligns
+the verbose best-solution display with the true running best after Phase 1.
+
+```{python}
+from spotoptim import SpotOptim
+from spotoptim.function import sphere
+
+opt = SpotOptim(fun=sphere, bounds=[(-5, 5)], tensorboard_log=False,
+                n_initial=5, max_iter=5, seed=0, n_jobs=2)
+opt.optimize()
+assert opt.tb_writer is None
+assert opt.counter == 5
+print(f"counter : {opt.counter}")
+print(f"min_y   : {opt.min_y:.6f}")
+print("initial postprocessing check passed.")
+```
+
+---
+
+## Step 8 — First Surrogate Fit (`_fit_scheduler()`)
+
+```python
+# No lock needed — no search threads active yet
+self._fit_scheduler()
+```
+
+After the initial postprocessing, the surrogate model is fitted to the complete initial
+design for the first time.
+No surrogate lock is acquired here because the `search_pool` has not yet been populated:
+this is the only point in the parallel path where `_fit_scheduler()` is called without
+holding `_surrogate_lock`.
+`_fit_scheduler()` selects the most recent `window_size` training points according to
+`selection_method`, fits the surrogate, and prepares it for acquisition-function
+queries.
+When a list of surrogates was specified at construction, one is chosen probabilistically
+according to `prob_surrogate` before fitting.
+
+```{python}
+import time
+from spotoptim import SpotOptim
+from spotoptim.function import sphere
+
+opt = SpotOptim(fun=sphere, bounds=[(-5, 5), (-5, 5)],
+                n_initial=5, max_iter=15, window_size=10, seed=0, n_jobs=2)
+result = opt.optimize()
+print(f"window_size : {opt.window_size}")
+print(f"evaluations : {result.nfev}")
+assert opt.window_size == 10
+print("first surrogate fit check passed.")
+```
+
+---
+
+## Step 9 — Steady-State Loop Overview (`optimize_steady_state()`)
+
+```python
+pending_cands: list = []
+_future_n_pts: dict = {}
+
+while (len(self.y_) < effective_max_iter) and (
+    time.time() < timeout_start + self.max_time * 60
+):
+    if _batch_ready():
+        _flush_batch()
+    # fill open slots with search tasks
+    # flush again if threshold crossed
+    # wait for any future to complete
+    # route result by ftype: "search" or "batch_eval"
+```
+
+The main iteration loop runs until either the evaluation budget `effective_max_iter` or
+the wall-clock limit `max_time` minutes is exhausted.
+Two data structures coordinate the flow: `pending_cands` accumulates candidates returned
+by completed search tasks, and `_future_n_pts` maps each in-flight batch-eval future to
+the number of points it carries.
+Both structures are required for accurate budget accounting: the reserved count is
+`len(y_) + n_in_flight + n_active_searches + len(pending_cands)`, ensuring the total
+never exceeds `effective_max_iter` regardless of concurrency.
+Each loop iteration performs four actions in order — batch flush, slot filling, second
+flush, and future wait — before routing completed results by their type tag.
+
+```{python}
+import numpy as np
+from spotoptim import SpotOptim
+from spotoptim.function import sphere
+
+opt = SpotOptim(fun=sphere, bounds=[(-5, 5), (-5, 5)],
+                n_initial=5, max_iter=20, seed=0, n_jobs=2)
+result = opt.optimize()
+print(f"evaluations : {result.nfev}")
+print(f"best        : {result.fun:.6f}")
+assert result.nfev <= 20
+assert result.success
+print("steady-state loop check passed.")
+```
+
+---
+
+## Step 10 — Batch Readiness (`_batch_ready()`)
+
+```python
+def _batch_ready() -> bool:
+    if not pending_cands:
+        return False
+    if len(pending_cands) >= self.eval_batch_size:
+        return True
+    return not any(t == "search" for t in futures.values())
+```
+
+`_batch_ready()` determines whether the accumulated candidates in `pending_cands` should
+be dispatched as a batch evaluation.
+The primary condition is that the number of pending candidates meets or exceeds
+`eval_batch_size`.
+A secondary starvation-guard condition also triggers a flush when no search tasks remain
+in flight: without this guard, pending candidates would block indefinitely if the budget
+is nearly exhausted and no further search tasks can be submitted.
+With `eval_batch_size=1` (the default), `_batch_ready()` returns `True` whenever any
+candidate is pending, preserving the one-point-at-a-time behaviour of the sequential
+path.
+Larger values of `eval_batch_size` amortise process-spawn and inter-process
+communication overhead across multiple points, improving throughput for expensive
+objectives on GIL builds.
+
+```{python}
+import numpy as np
+from spotoptim import SpotOptim
+from spotoptim.function import sphere
+
+# Default batch size = 1: each search result is dispatched immediately
+opt = SpotOptim(fun=sphere, bounds=[(-5, 5), (-5, 5)],
+                n_initial=5, max_iter=15, seed=0, n_jobs=2, eval_batch_size=1)
+result = opt.optimize()
+assert opt.eval_batch_size == 1
+print(f"eval_batch_size : {opt.eval_batch_size}")
+print(f"evaluations     : {result.nfev}")
+print("_batch_ready check passed.")
+```
+
+---
+
+## Step 11 — Batch Dispatch (`_flush_batch()`)
+
+```python
+def _flush_batch() -> None:
+    X_batch = np.vstack(pending_cands)
+    n_in_batch = len(pending_cands)
+    pending_cands.clear()
+    if _no_gil:
+        fut_eval = eval_pool.submit(_thread_batch_eval_task, X_batch)
+    else:
+        pickled_args = dill.dumps((self, X_batch))
+        fut_eval = eval_pool.submit(remote_batch_eval_wrapper, pickled_args)
+    futures[fut_eval] = "batch_eval"
+    _future_n_pts[fut_eval] = n_in_batch
+```
+
+`_flush_batch()` stacks all pending candidates into a single `(n, d)` array `X_batch`,
+clears `pending_cands`, and dispatches the batch to `eval_pool` as one future.
+On GIL builds `remote_batch_eval_wrapper()` is used: it unpickles the optimiser and
+batch in the worker process, calls `evaluate_function(X_batch)` once, and returns
+`(X_batch, y_batch)`.
+Evaluating the whole batch in a single `fun()` call avoids repeated process-spawn
+overhead and allows vectorised objective implementations to exploit NumPy-level
+parallelism within the worker.
+On free-threaded builds `_thread_batch_eval_task()` performs the same operation
+directly in a thread without serialisation.
+The future is registered under the `"batch_eval"` tag and its point count is recorded in
+`_future_n_pts` so that budget accounting remains correct while the evaluation is
+in flight.
+
+```{python}
+import numpy as np
+from spotoptim import SpotOptim
+from spotoptim.function import sphere
+
+opt = SpotOptim(fun=sphere, bounds=[(-5, 5), (-5, 5)],
+                n_initial=5, max_iter=20, seed=0, n_jobs=2, eval_batch_size=3)
+result = opt.optimize()
+assert opt.eval_batch_size == 3
+print(f"eval_batch_size : {opt.eval_batch_size}")
+print(f"evaluations     : {result.nfev}")
+print("_flush_batch check passed.")
+```
+
+---
+
+## Step 12 — Search Slot Management (`_thread_search_task()`)
+
+```python
+n_in_flight = sum(_future_n_pts.values())
+n_searches = sum(1 for t in futures.values() if t == "search")
+reserved = len(self.y_) + n_in_flight + n_searches + len(pending_cands)
+if reserved < effective_max_iter:
+    fut = search_pool.submit(_thread_search_task)
+    futures[fut] = "search"
+```
+
+At each loop iteration, the main thread fills any open slots in the search pool up to
+`n_jobs` concurrent tasks.
+Before submitting a new search task, the budget guard checks that adding one more
+in-flight search would not push the reserved count over `effective_max_iter`: if it
+would, no further search tasks are submitted and the loop drains existing futures until
+the budget is consumed.
+`_thread_search_task()` is a nested closure that acquires `_surrogate_lock` before
+calling `suggest_next_infill_point()` and releases it on return, so that a concurrent
+surrogate refit on the main thread cannot corrupt the model while a search thread reads
+it.
+Because search tasks are always submitted to a `ThreadPoolExecutor`, they share the main
+process heap and require no serialisation regardless of the GIL state.
+
+```{python}
+import numpy as np
+from spotoptim import SpotOptim
+from spotoptim.function import sphere
+
+opt = SpotOptim(fun=sphere, bounds=[(-5, 5), (-5, 5)],
+                n_initial=5, max_iter=12, seed=0, n_jobs=3)
+result = opt.optimize()
+assert result.nfev <= 12
+print(f"n_jobs      : {opt.n_jobs}")
+print(f"evaluations : {result.nfev}")
+print("search slot management check passed.")
+```
+
+---
+
+## Step 13 — Candidate Generation (`suggest_next_infill_point()`)
+
+```python
+def _thread_search_task():
+    with _surrogate_lock:
+        return self.suggest_next_infill_point()
+```
+
+`suggest_next_infill_point()` runs the acquisition function to identify the most
+promising candidate for the next objective evaluation.
+The acquisition strategy is controlled by `acquisition` (default `"y"`, minimising the
+surrogate prediction directly).
+The acquisition optimiser (default `"differential_evolution"`) searches the transformed,
+reduced search space.
+When the optimiser fails, the fallback strategy selected by
+`acquisition_failure_strategy` applies; `"random"` (the default) draws a uniform random
+point.
+In the parallel path this method is always called inside `_thread_search_task()`, which
+holds `_surrogate_lock` for the duration of the call, preventing simultaneous surrogate
+access by multiple search threads or by the main-thread refit.
+
+```{python}
+from spotoptim import SpotOptim
+from spotoptim.function import sphere
+
+opt = SpotOptim(fun=sphere, bounds=[(-5, 5), (-5, 5)],
+                n_initial=5, max_iter=10, acquisition="ei", seed=0, n_jobs=2)
+result = opt.optimize()
+assert opt.acquisition == "ei"
+print(f"acquisition : {opt.acquisition}")
+print(f"best        : {result.fun:.6f}")
+print("suggest_next_infill_point check passed.")
+```
+
+---
+
+## Step 14 — Future Completion and Routing
+
+```python
+done, _ = wait(futures.keys(), return_when=FIRST_COMPLETED)
+for fut in done:
+    ftype = futures.pop(fut)
+    res = fut.result()
+    if ftype == "search":
+        x_cand = res
+        pending_cands.append(x_cand)
+        if _batch_ready():
+            _flush_batch()
+    elif ftype == "batch_eval":
+        _future_n_pts.pop(fut, None)
+        X_done, y_done = res
+        # process batch...
+```
+
+`concurrent.futures.wait()` with `return_when=FIRST_COMPLETED` blocks until at least
+one future finishes, then returns all futures that are done.
+Each completed future is popped from `futures` and routed by its type tag.
+For a `"search"` future the returned candidate is appended to `pending_cands`; if this
+pushes the list over the batch threshold, `_flush_batch()` is called immediately to
+avoid an unnecessary extra loop iteration.
+For a `"batch_eval"` future the corresponding entry in `_future_n_pts` is removed so
+that budget accounting is unblocked before the storage update begins.
+If a result is an `Exception` — indicating a remote failure — the error is optionally
+printed, the budget entry is removed, and the loop continues; failed search slots are
+refilled in the next iteration, and failed eval points are lost without charging the
+budget counter.
+
+```{python}
+import numpy as np
+from spotoptim import SpotOptim
+from spotoptim.function import sphere
+
+opt = SpotOptim(fun=sphere, bounds=[(-5, 5), (-5, 5)],
+                n_initial=5, max_iter=15, seed=0, n_jobs=2)
+result = opt.optimize()
+assert len(opt.y_) <= 15
+assert np.all(np.isfinite(opt.y_))
+print(f"evaluations stored : {len(opt.y_)}")
+print("future routing check passed.")
+```
+
+---
+
+## Step 15 — Batch Evaluation Processing (`update_success_rate()`, `_update_storage_steady()`, `_fit_scheduler()`)
+
+```python
+for xi, yi in zip(X_done, y_done):
+    self.update_success_rate(np.array([yi]))
+    self._update_storage_steady(xi, yi)
+    self.n_iter_ += 1
+with _surrogate_lock:
+    self._fit_scheduler()
+```
+
+When a batch evaluation completes successfully, the main thread processes every point in
+the returned `(X_done, y_done)` pair before refitting the surrogate.
+For each point, `update_success_rate()` records whether the new value improves on
+`best_y_`, maintaining the rolling `success_rate` attribute.
+`_update_storage_steady()` appends the point to `X_` and `y_`, updates `best_x_` and
+`best_y_` if an improvement is found, and synchronises `min_y` and `min_X`.
+`n_iter_` is incremented once per point so that it reflects the total number of
+post-initial-design evaluations.
+After all points in the batch have been stored, `_fit_scheduler()` is called once under
+`_surrogate_lock`, refitting the surrogate on the updated training window.
+Batching the refit in this way — one call per batch rather than one call per point —
+improves efficiency when `eval_batch_size > 1` and ensures that in-flight search
+threads always read a self-consistent model.
+
+```{python}
+import numpy as np
+from spotoptim import SpotOptim
+from spotoptim.function import sphere
+
+opt = SpotOptim(fun=sphere, bounds=[(-5, 5), (-5, 5)],
+                n_initial=5, max_iter=15, seed=0, n_jobs=2)
+opt.optimize()
+print(f"success_rate : {opt.success_rate:.4f}")
+print(f"X_ shape     : {opt.X_.shape}")
+print(f"y_ shape     : {opt.y_.shape}")
+assert 0.0 <= opt.success_rate <= 1.0
+assert opt.X_.shape[0] == opt.y_.shape[0]
+print("batch processing check passed.")
+```
+
+---
+
+## Step 16 — Termination and Result Assembly
+
+```python
+return "FINISHED", OptimizeResult(
+    x=self.best_x_,
+    fun=self.best_y_,
+    nfev=len(self.y_),
+    nit=self.n_iter_,
+    success=True,
+    message="Optimization finished (Steady State)",
+    X=self.X_,
+    y=self.y_,
+)
+```
+
+`optimize_steady_state()` exits the while loop when `len(y_) >= effective_max_iter` or
+the wall-clock limit is exceeded.
+It always returns `"FINISHED"` — there is no restart mechanism in the parallel path.
+The `OptimizeResult` object carries the best solution (`x`, `fun`), total evaluation
+count (`nfev`), iteration count (`nit`), the full evaluation history (`X`, `y`), and a
+fixed termination message that identifies the result as coming from the steady-state
+engine.
+The `success` flag is always `True` because partial results are still valid whenever at
+least one evaluation completed successfully; the guard at the end of Phase 1 ensures
+the method does not return an empty result.
+
+```{python}
+from spotoptim import SpotOptim
+from spotoptim.function import sphere
+
+opt = SpotOptim(fun=sphere, bounds=[(-5, 5), (-5, 5)],
+                n_initial=5, max_iter=10, seed=0, n_jobs=2)
+result = opt.optimize()
+first_line = result.message.splitlines()[0]
+print(f"termination : {first_line}")
+assert "Steady State" in result.message
+assert result.success
+print("termination check passed.")
+```
+
+---
+
+## Complete Parallel Run Summary
+
+@tbl-par summarises every step executed along the parallel path in call order:
+
+| Step | Method / Phase | Purpose |
+|-----:|----------------|---------|
+| 1 | `execute_optimization_run()` | Dispatch to parallel path when `n_jobs > 1` |
+| 2 | `optimize_steady_state()` | Parallel orchestrator; manages pools and phases |
+| 3 | `set_seed()`, `get_initial_design()`, `curate_initial_design()` | Seed RNG, generate and curate initial design; pre-fill `y0_prefilled` for restart-injected point |
+| 4 | `_is_gil_disabled()` | Detect GIL state; select `ProcessPoolExecutor` or `ThreadPoolExecutor` for eval |
+| 5 | Phase 1 submission | Store injected points directly; submit remaining `n_to_submit` points to `eval_pool` |
+| 6 | `_update_storage_steady()` | Collect initial results; append each valid `(x, y)` to storage |
+| 7 | `_init_tensorboard()`, `update_stats()`, `get_best_xy_initial_design()` | Log initial design; compute statistics; identify initial best |
+| 8 | `_fit_scheduler()` | First surrogate fit (no lock needed, no search threads active) |
+| 9 | `optimize_steady_state()` while loop | Main loop: iterate until budget or time exhausted |
+| 10 | `_batch_ready()` | Check whether `pending_cands` should be flushed |
+| 11 | `_flush_batch()` | Dispatch all pending candidates as one batch eval to `eval_pool` |
+| 12 | `_thread_search_task()` slot fill | Submit up to `n_jobs` search tasks under budget guard |
+| 13 | `suggest_next_infill_point()` | Optimise acquisition under `_surrogate_lock` to propose candidate |
+| 14 | `wait(FIRST_COMPLETED)` routing | Block until any future completes; route by `"search"` or `"batch_eval"` tag |
+| 15 | `update_success_rate()`, `_update_storage_steady()`, `_fit_scheduler()` | Process batch result; update storage and best; refit surrogate under lock |
+| 16 | Return `"FINISHED"` + `OptimizeResult` | Assemble and return final result |
+
+: Complete Parallel Run Summary {#tbl-par}
diff --git a/docs/optimize_seq.qmd b/docs/optimize_seq.qmd
new file mode 100644
index 00000000..9a5e7bb1
--- /dev/null
+++ b/docs/optimize_seq.qmd
@@ -0,0 +1,727 @@
+---
+title: "SpotOptim: Sequential Optimization"
+description: "Step-by-step walkthrough of execute_optimization_run() and every method it calls along the sequential path, with executable examples validated by pytest."
+---
+
+This document traces every step executed by `SpotOptim.execute_optimization_run()` along
+the sequential code path (`n_jobs=1`), in the order they occur.
+Each section describes one method with a `{python}` code block that can be executed directly.
+
+The public entry point is `optimize()`, which manages the outer restart loop and delegates
+each cycle to `execute_optimization_run()`.
+When `n_jobs == 1` (the default), that dispatcher routes to `optimize_sequential_run()`,
+which coordinates initialisation, storage setup, and the main iteration loop.
+This document covers that path in full.
+
+Run all related tests with:
+
+```bash
+uv run pytest tests/test_spotoptim_deep.py -v
+```
+
+---
+
+## Step 1 — Dispatch (`execute_optimization_run()`)
+
+```python
+if self.n_jobs > 1:
+    return self.optimize_steady_state(...)
+else:
+    return self.optimize_sequential_run(...)
+```
+
+`execute_optimization_run()` is the routing layer between the outer restart loop in
+`optimize()` and the actual optimisation engine.
+Its sole responsibility is to examine `n_jobs` and forward all arguments to either
+`optimize_steady_state()` (parallel) or `optimize_sequential_run()` (sequential).
+It returns a `(status, OptimizeResult)` tuple in both cases, which `optimize()` uses to
+decide whether to restart or terminate.
+The optional `shared_best_y` and `shared_lock` parameters support inter-worker
+coordination in the parallel path; they are `None` in sequential mode.
+
+```{python}
+import time
+import numpy as np
+from spotoptim import SpotOptim
+from spotoptim.function import sphere
+
+opt = SpotOptim(fun=sphere, bounds=[(-5, 5), (-5, 5)],
+                n_initial=5, max_iter=10, seed=0, n_jobs=1)
+status, result = opt.execute_optimization_run(timeout_start=time.time())
+print(f"status : {status}")
+print(f"best   : {result.fun:.6f}")
+assert status == "FINISHED"
+print("dispatch check passed.")
+```
+
+---
+
+## Step 2 — Sequential Run Orchestration (`optimize_sequential_run()`)
+
+```python
+X0, y0 = self._initialize_run(X0, y0_known)
+X0, y0, n_evaluated = self.rm_NA_values(X0, y0)
+self.check_size_initial_design(y0, n_evaluated)
+self.init_storage(X0, y0)
+self._zero_success_count = 0
+self._success_history = []
+self.update_stats()
+self._init_tensorboard()
+self.get_best_xy_initial_design()
+return self._run_sequential_loop(timeout_start, effective_max_iter)
+```
+
+`optimize_sequential_run()` is the sequential orchestrator.
+It calls eight methods in fixed order to prepare internal state before handing off
+to `_run_sequential_loop()` for the iterative acquisition phase.
+The zero-success counter and success-history list are reset here so that each fresh
+run (including restarts) begins with a clean convergence record.
+When `max_iter_override` is supplied by `optimize()`, it replaces the configured
+`max_iter` as the effective evaluation budget for this run.
+
+```{python}
+import time
+from spotoptim import SpotOptim
+from spotoptim.function import sphere
+
+opt = SpotOptim(fun=sphere, bounds=[(-5, 5), (-5, 5)],
+                n_initial=5, max_iter=10, seed=0)
+status, result = opt.optimize_sequential_run(timeout_start=time.time())
+print(f"status      : {status}")
+print(f"evaluations : {result.nfev}")
+print(f"best        : {result.fun:.6f}")
+assert status == "FINISHED"
+assert result.nfev == 10
+print("sequential run check passed.")
+```
+
+---
+
+## Step 3 — Initialisation (`_initialize_run()`)
+
+```python
+self.set_seed()
+X0 = self.get_initial_design(X0)
+X0 = self.curate_initial_design(X0)
+y0 = self.evaluate_function(X0)
+return X0, y0
+```
+
+`_initialize_run()` performs three preparatory actions before the first surrogate can
+be fitted.
+`set_seed()` re-seeds Python's `random` module and NumPy's global generator to ensure
+reproducibility within the run.
+`get_initial_design()` either processes the user-supplied `X0` or generates a
+Latin Hypercube sample in the transformed, reduced search space.
+`curate_initial_design()` removes duplicate points and generates replacements as needed.
+The curated design is then evaluated in batch via `evaluate_function()`.
+
+When a best-known value `y0_known` is provided (restart injection), the point that
+matches `self.x0` is not re-evaluated; its objective value is taken directly from the
+previous run, saving one function call.
+
+```{python}
+import numpy as np
+from spotoptim import SpotOptim
+from spotoptim.function import sphere
+
+opt = SpotOptim(fun=sphere, bounds=[(-5, 5), (-5, 5)], n_initial=5, seed=42)
+X0, y0 = opt._initialize_run(X0=None, y0_known=None)
+print(f"initial design shape : {X0.shape}")
+print(f"evaluations          : {len(y0)}")
+assert X0.shape == (5, 2)
+assert len(y0) == 5
+assert np.all(np.isfinite(y0))
+print("_initialize_run check passed.")
+```
+
+---
+
+## Step 4 — Filtering Invalid Evaluations (`rm_NA_values()`)
+
+```python
+finite_mask = np.isfinite(y0)
+X0 = X0[finite_mask]
+y0 = y0[finite_mask]
+return X0, y0, len(finite_mask)
+```
+
+Initial design points whose objective value is `NaN` or `±inf` are removed rather than
+penalised.
+This is the correct policy for the initial phase: a penalty value would corrupt the
+surrogate's training data, whereas removal simply reduces the effective initial design
+size.
+The method also converts object-dtype arrays (which may contain Python `None`) to
+`float` before applying the mask.
+The original count is returned as the third value so that `check_size_initial_design()`
+can report how many points were lost.
+
+```{python}
+import numpy as np
+from spotoptim import SpotOptim
+from spotoptim.function import sphere
+
+opt = SpotOptim(fun=sphere, bounds=[(-5, 5), (-5, 5)], n_initial=5)
+X0 = np.array([[1.0, 2.0], [3.0, 4.0], [0.0, 0.0]])
+y0 = np.array([5.0, np.nan, 0.0])
+X0_clean, y0_clean, n_original = opt.rm_NA_values(X0, y0)
+assert X0_clean.shape == (2, 2)
+assert len(y0_clean) == 2
+assert n_original == 3
+assert np.all(np.isfinite(y0_clean))
+print(f"1 NaN removed; {len(y0_clean)} of {n_original} points retained.")
+print("rm_NA_values check passed.")
+```
+
+---
+
+## Step 5 — Size Validation (`check_size_initial_design()`)
+
+```python
+min_required = min(n_initial, 3 if n_dim > 1 else 2)
+if len(y0) < min_required:
+    raise ValueError(...)
+```
+
+Before fitting the first surrogate, the optimizer verifies that enough valid initial
+points remain.
+The minimum accepted count is the smaller of `n_initial` and the surrogate's structural
+minimum — 3 for multi-dimensional problems, 2 for one-dimensional ones.
+If the filtered design falls below this threshold, a `ValueError` is raised with a
+diagnostic message.
+The threshold adapts to the user's intent: when `n_initial` was set to 2, only 2 points
+are required; the structural minimum only applies when more points were requested than
+survived filtering.
+
+```{python}
+import numpy as np
+from spotoptim import SpotOptim
+from spotoptim.function import sphere
+
+opt = SpotOptim(fun=sphere, bounds=[(-5, 5), (-5, 5)], n_initial=10)
+
+y0_ok = np.array([1.0, 2.0, 3.0, 4.0])
+opt.check_size_initial_design(y0_ok, n_evaluated=10)
+print("Sufficient points: OK")
+
+y0_tiny = np.array([1.0])
+try:
+    opt.check_size_initial_design(y0_tiny, n_evaluated=10)
+    raise AssertionError("Expected ValueError not raised")
+except ValueError as e:
+    print(f"Caught expected error: {e}")
+print("check_size_initial_design check passed.")
+```
+
+---
+
+## Step 6 — Storage Initialisation (`init_storage()`)
+
+```python
+self.X_ = self.inverse_transform_X(X0.copy())
+self.y_ = y0.copy()
+self.n_iter_ = 0
+```
+
+`init_storage()` populates the two primary data arrays that persist throughout the run.
+`X_` is stored in natural (original) scale by applying `inverse_transform_X()` to the
+internally scaled design; `y_` is stored as-is.
+The iteration counter `n_iter_` is reset to zero.
+All subsequent storage operations in the main loop append to these arrays rather than
+replacing them, so the complete evaluation history is available at the end of the run.
+
+```{python}
+import numpy as np
+from spotoptim import SpotOptim
+
+opt = SpotOptim(fun=lambda X: np.sum(X**2, axis=1),
+                bounds=[(-5, 5), (-5, 5)], n_initial=3)
+X0 = np.array([[1.0, 2.0], [0.0, 0.0], [3.0, -1.0]])
+y0 = np.array([5.0, 0.0, 10.0])
+opt.init_storage(X0, y0)
+print(f"X_ shape  : {opt.X_.shape}")
+print(f"y_        : {opt.y_}")
+print(f"n_iter_   : {opt.n_iter_}")
+assert opt.X_.shape == (3, 2)
+assert opt.n_iter_ == 0
+print("init_storage check passed.")
+```
+
+---
+
+## Step 7 — Statistics Update (`update_stats()`)
+
+```python
+self.min_y = np.min(self.y_)
+self.min_X = self.X_[np.argmin(self.y_)]
+self.counter = len(self.y_)
+```
+
+`update_stats()` refreshes the summary statistics derived from the current `X_` and
+`y_` arrays.
+It always sets `min_y`, `min_X`, and `counter`.
+When the problem is noisy (`repeats_initial > 1` or `repeats_surrogate > 1`), it
+additionally computes per-point means and variances via `aggregate_mean_var()`,
+populating `mean_X`, `mean_y`, `var_y`, `min_mean_X`, `min_mean_y`, and `min_var_y`.
+The method is called during setup — after `init_storage()` — and once per iteration
+inside the main loop after new points have been appended.
+
+```{python}
+import numpy as np
+from spotoptim import SpotOptim
+from spotoptim.function import sphere
+
+opt = SpotOptim(fun=sphere, bounds=[(-5, 5), (-5, 5)],
+                max_iter=10, n_initial=5, seed=0)
+opt.optimize()
+print(f"counter : {opt.counter}")
+print(f"min_y   : {opt.min_y:.6f}")
+assert opt.counter == 10
+assert np.isclose(opt.min_y, np.min(opt.y_))
+print("update_stats check passed.")
+```
+
+---
+
+## Step 8 — TensorBoard Logging of the Initial Design (`_init_tensorboard()`)
+
+```python
+if self.tb_writer is not None:
+    for i in range(len(self.y_)):
+        self._write_tensorboard_hparams(self.X_[i], self.y_[i])
+    self._write_tensorboard_scalars()
+```
+
+`_init_tensorboard()` logs each point of the initial design to TensorBoard as a
+separate hyperparameter run, together with global scalar summaries.
+When `tensorboard_log=False` (the default), the writer is `None` and the method is a
+no-op with no runtime cost.
+When logging is enabled and no writer exists yet, `_init_tensorboard()` creates the
+`SummaryWriter`, choosing a timestamped directory if `tensorboard_path` was not
+specified.
+This lazy creation avoids producing stale log directories for runs that fail during
+initialisation.
+
+```{python}
+from spotoptim import SpotOptim
+from spotoptim.function import sphere
+
+opt = SpotOptim(fun=sphere, bounds=[(-5, 5)], tensorboard_log=False)
+opt.optimize()
+assert opt.tb_writer is None
+print("TensorBoard disabled: writer not created.")
+print("_init_tensorboard check passed.")
+```
+
+---
+
+## Step 9 — Initial Best (`get_best_xy_initial_design()`)
+
+```python
+best_idx = np.argmin(self.y_)
+self.best_x_ = self.X_[best_idx].copy()
+self.best_y_ = self.y_[best_idx]
+```
+
+After the initial design is stored and its statistics computed, the point with the
+minimum objective value is identified and written to `best_x_` and `best_y_`.
+These two attributes define the running best solution, updated by
+`_update_best_main_loop()` in every subsequent iteration.
+When `verbose=True`, the initial best is printed; for noisy problems the mean best
+(`min_mean_y`) is also reported alongside the raw minimum.
+
+```{python}
+import numpy as np
+from spotoptim import SpotOptim
+from spotoptim.function import sphere
+
+opt = SpotOptim(fun=sphere, bounds=[(-5, 5), (-5, 5)], n_initial=5, verbose=False)
+opt.X_ = np.array([[1.0, 2.0], [0.0, 0.0], [2.0, 1.0]])
+opt.y_ = np.array([5.0, 0.0, 5.0])
+opt.get_best_xy_initial_design()
+assert np.array_equal(opt.best_x_, [0.0, 0.0])
+assert opt.best_y_ == 0.0
+print(f"best_x_ : {opt.best_x_}")
+print(f"best_y_ : {opt.best_y_}")
+print("get_best_xy_initial_design check passed.")
+```
+
+---
+
+## Step 10 — Main Iteration Loop (`_run_sequential_loop()`)
+
+```python
+while len(self.y_) < effective_max_iter and \
+      time.time() < timeout_start + max_time * 60:
+    self.n_iter_ += 1
+    self._fit_scheduler()
+    X_ocba = self.apply_ocba()
+    x_next = self.suggest_next_infill_point()
+    x_next_repeated = self.update_repeats_infill_points(x_next)
+    if X_ocba is not None:
+        x_next_repeated = append(X_ocba, x_next_repeated, axis=0)
+    y_next = self.evaluate_function(x_next_repeated)
+    x_next_repeated, y_next = self._handle_NA_new_points(x_next_repeated, y_next)
+    self.update_success_rate(y_next)
+    # restart check
+    self.update_storage(x_next_repeated, y_next)
+    self.update_stats()
+    self._update_best_main_loop(x_next_repeated, y_next, start_time=timeout_start)
+```
+
+`_run_sequential_loop()` executes iterations until either the evaluation budget
+(`effective_max_iter`) or the wall-clock limit (`max_time` minutes) is exhausted.
+Each iteration increments `n_iter_`, fits the surrogate, selects and evaluates one or
+more candidate points, and updates internal state.
+A safety counter tracks consecutive failures: more than `max_iter` consecutive
+NaN/inf evaluations triggers an early exit with `success=False`.
+
+The loop returns `("RESTART", result)` when `success_rate` has been zero for
+`restart_after_n` consecutive iterations, signalling `optimize()` to begin a fresh run.
+It returns `("FINISHED", result)` when the budget or time limit is reached.
+
+```{python}
+import numpy as np
+from spotoptim import SpotOptim
+from spotoptim.function import sphere
+
+opt = SpotOptim(fun=sphere, bounds=[(-5, 5), (-5, 5)],
+                n_initial=5, max_iter=15, seed=0)
+result = opt.optimize()
+print(f"iterations  : {result.nit}")
+print(f"evaluations : {result.nfev}")
+print(f"best        : {result.fun:.6f}")
+assert result.nfev == 15
+assert result.success
+print("_run_sequential_loop check passed.")
+```
+
+---
+
+## Step 11 — Surrogate Fitting (`_fit_scheduler()`)
+
+```python
+self._fit_scheduler()
+```
+
+At the start of each iteration, the surrogate model is refitted to the current training
+window.
+`_fit_scheduler()` selects the most recent `window_size` observations according to
+`selection_method` (default `"distant"`) and calls the surrogate's `fit()` method.
+When a list of surrogates was supplied at construction, one surrogate is chosen
+probabilistically according to `prob_surrogate` before fitting, and per-surrogate
+point caps from `_max_surrogate_points_list` are respected.
+
+```{python}
+from spotoptim import SpotOptim
+from spotoptim.function import sphere
+
+opt = SpotOptim(fun=sphere, bounds=[(-5, 5), (-5, 5)],
+                n_initial=5, max_iter=15, window_size=10, seed=0)
+result = opt.optimize()
+print(f"window_size : {opt.window_size}")
+print(f"evaluations : {result.nfev}")
+assert opt.window_size == 10
+print("_fit_scheduler check passed.")
+```
+
+---
+
+## Step 12 — OCBA Re-evaluations (`apply_ocba()`)
+
+```python
+X_ocba = self.apply_ocba()
+```
+
+`apply_ocba()` implements Optimal Computing Budget Allocation for noisy objective
+functions.
+When `ocba_delta > 0`, it identifies the `ocba_delta` best-mean points and schedules
+additional evaluations at those locations, returning them as `X_ocba`.
+These points are concatenated with the acquisition candidate before the objective call,
+so that noisy regions near the current optimum receive extra replication.
+When `ocba_delta == 0` (the default), the method returns `None` and adds no overhead.
+
+```{python}
+from spotoptim import SpotOptim
+from spotoptim.function import sphere
+
+opt = SpotOptim(fun=sphere, bounds=[(-5, 5), (-5, 5)],
+                n_initial=5, max_iter=10, ocba_delta=0, seed=0)
+result = opt.optimize()
+assert opt.ocba_delta == 0
+print(f"ocba_delta  : {opt.ocba_delta}  (no OCBA overhead)")
+print("apply_ocba check passed.")
+```
+
+---
+
+## Step 13 — Candidate Generation (`suggest_next_infill_point()`)
+
+```python
+x_next = self.suggest_next_infill_point()
+```
+
+`suggest_next_infill_point()` runs the acquisition function to identify the most
+promising candidate for the next objective evaluation.
+The acquisition strategy is controlled by `acquisition` (default `"y"`, minimising the
+surrogate prediction directly).
+The acquisition optimiser (default `"differential_evolution"`) searches the
+transformed, reduced search space using the bounds `[lower, upper]`.
+When the optimiser fails — no improvement found or a numerical issue — the strategy
+selected by `acquisition_failure_strategy` takes effect; `"random"` (the default)
+draws a point uniformly at random.
+
+```{python}
+from spotoptim import SpotOptim
+from spotoptim.function import sphere
+
+opt = SpotOptim(fun=sphere, bounds=[(-5, 5), (-5, 5)],
+                n_initial=5, max_iter=10, acquisition="ei", seed=0)
+result = opt.optimize()
+assert opt.acquisition == "ei"
+print(f"acquisition : {opt.acquisition}")
+print(f"best        : {result.fun:.6f}")
+print("suggest_next_infill_point check passed.")
+```
+
+---
+
+## Step 14 — Repeat Infill Points (`update_repeats_infill_points()`)
+
+```python
+x_next_repeated = self.update_repeats_infill_points(x_next)
+```
+
+When `repeats_surrogate > 1`, each surrogate-suggested candidate is evaluated multiple
+times to reduce noise.
+`update_repeats_infill_points()` tiles the candidate point `repeats_surrogate` times,
+returning a 2-D array of shape `(repeats_surrogate, n_dim)`.
+When `repeats_surrogate == 1` (the default) the array is simply `x_next` reshaped to
+`(1, n_dim)`, adding no computational cost.
+
+```{python}
+from spotoptim import SpotOptim
+from spotoptim.function import sphere
+
+opt = SpotOptim(fun=sphere, bounds=[(-5, 5), (-5, 5)],
+                n_initial=5, max_iter=10, repeats_surrogate=1, seed=0)
+result = opt.optimize()
+assert opt.repeats_surrogate == 1
+assert result.nfev == 10
+print(f"repeats_surrogate : {opt.repeats_surrogate}")
+print("update_repeats_infill_points check passed.")
+```
+
+---
+
+## Step 15 — Objective Evaluation (`evaluate_function()`)
+
+```python
+y_next = self.evaluate_function(x_next_repeated)
+```
+
+`evaluate_function()` calls the user-supplied objective `fun` on the batch of
+candidate points.
+The input is always in transformed, reduced internal scale;
+`evaluate_function()` applies `inverse_transform_X()` and `to_all_dim()` before the
+call, so `fun` always receives points in natural scale with all dimensions present.
+When `fun_mo2so` is set, the multi-objective output is first converted to a scalar
+using that aggregation function.
+The result is a 1-D array whose length equals the number of candidates evaluated.
+
+```{python}
+import numpy as np
+from spotoptim import SpotOptim
+from spotoptim.function import sphere
+
+opt = SpotOptim(fun=sphere, bounds=[(-5, 5), (-5, 5)],
+                n_initial=5, max_iter=10, seed=0)
+result = opt.optimize()
+assert len(opt.y_) == 10
+assert np.all(np.isfinite(opt.y_))
+print(f"total evaluations : {len(opt.y_)}")
+print(f"best value        : {opt.min_y:.6f}")
+print("evaluate_function check passed.")
+```
+
+---
+
+## Step 16 — NaN Handling for Sequential Evaluations (`_handle_NA_new_points()`)
+
+```python
+x_next_repeated, y_next = self._handle_NA_new_points(x_next_repeated, y_next)
+```
+
+Unlike the initial design (where invalid points are removed), NaN or inf values
+returned during the sequential loop are replaced with a penalty derived from the worst
+finite value seen so far, scaled by a large factor.
+This preserves the storage structure — one row in `X_` per candidate — and prevents
+the surrogate from ignoring pathological regions.
+If every candidate in the batch is invalid, the method returns `(None, None)`, causing
+the iteration to be skipped and the consecutive-failure counter to increment.
+
+```{python}
+import numpy as np
+from spotoptim import SpotOptim
+from spotoptim.function import sphere
+
+opt = SpotOptim(fun=sphere, bounds=[(-5, 5), (-5, 5)],
+                n_initial=5, max_iter=12, seed=0)
+opt.optimize()
+assert np.all(np.isfinite(opt.y_))
+print("All stored values are finite after NaN handling.")
+print("_handle_NA_new_points check passed.")
+```
+
+---
+
+## Step 17 — Success Rate (`update_success_rate()`)
+
+```python
+self.update_success_rate(y_next)
+```
+
+`update_success_rate()` measures whether the current iteration produced an improvement
+relative to `best_y_`.
+It records a binary outcome in `_success_history` and computes `success_rate` as the
+fraction of recent iterations that showed improvement.
+When `success_rate` remains at `0.0` for `restart_after_n` consecutive iterations,
+`_zero_success_count` reaches the threshold and the loop returns `"RESTART"` to the
+caller.
+
+```{python}
+import numpy as np
+from spotoptim import SpotOptim
+from spotoptim.function import sphere
+
+opt = SpotOptim(fun=sphere, bounds=[(-5, 5), (-5, 5)],
+                n_initial=5, max_iter=15, seed=0)
+opt.optimize()
+print(f"success_rate : {opt.success_rate:.4f}")
+assert 0.0 <= opt.success_rate <= 1.0
+print("update_success_rate check passed.")
+```
+
+---
+
+## Step 18 — Storage Update (`update_storage()`)
+
+```python
+self.update_storage(x_next_repeated, y_next)
+```
+
+`update_storage()` appends newly evaluated candidates to `X_` and `y_`.
+Like `init_storage()`, it converts points to natural scale via `inverse_transform_X()`
+before storing.
+After each call `X_.shape[0]` increases by the number of candidates evaluated — usually
+1, but more when `repeats_surrogate > 1` or OCBA is active.
+The growing `X_` and `y_` arrays serve both as the surrogate training window and as
+the final output embedded in `OptimizeResult`.
+
+```{python}
+import numpy as np
+from spotoptim import SpotOptim
+from spotoptim.function import sphere
+
+opt = SpotOptim(fun=sphere, bounds=[(-5, 5), (-5, 5)],
+                n_initial=5, max_iter=10, seed=0)
+opt.optimize()
+assert opt.X_.shape == (10, 2)
+assert opt.y_.shape == (10,)
+print(f"X_ shape : {opt.X_.shape}")
+print(f"y_ shape : {opt.y_.shape}")
+print("update_storage check passed.")
+```
+
+---
+
+## Step 19 — Best Solution Update (`_update_best_main_loop()`)
+
+```python
+self._update_best_main_loop(x_next_repeated, y_next, start_time=timeout_start)
+```
+
+At the end of each iteration, `_update_best_main_loop()` checks whether any of the
+newly evaluated points improves on `best_y_`.
+If so, `best_x_` and `best_y_` are updated in-place.
+When `verbose=True`, the improvement is printed together with elapsed time and the
+current evaluation count.
+For noisy problems, improvement is judged against `min_mean_y` rather than the raw
+minimum.
+
+```{python}
+import numpy as np
+from spotoptim import SpotOptim
+from spotoptim.function import sphere
+
+opt = SpotOptim(fun=sphere, bounds=[(-5, 5), (-5, 5)],
+                n_initial=5, max_iter=15, seed=0)
+result = opt.optimize()
+assert np.isclose(opt.best_y_, result.fun)
+assert np.array_equal(opt.best_x_, result.x)
+print(f"best_x_ : {opt.best_x_}")
+print(f"best_y_ : {opt.best_y_:.6f}")
+print("_update_best_main_loop check passed.")
+```
+
+---
+
+## Step 20 — Termination (`determine_termination()`)
+
+```python
+status_message = self.determine_termination(timeout_start)
+```
+
+After the while condition fails, `determine_termination()` produces the human-readable
+termination message embedded in `OptimizeResult.message`.
+It distinguishes three cases: the evaluation budget was exhausted
+(`nfev >= effective_max_iter`), the wall-clock limit was exceeded (`max_time`), or the
+tolerance criterion was met — consecutive best-point improvements smaller than
+`tolerance_x` measured by `min_tol_metric`.
+The formatted message also includes the final function value, iteration count, and
+total evaluation count, matching the style of `scipy.optimize.minimize`.
+
+```{python}
+from spotoptim import SpotOptim
+from spotoptim.function import sphere
+
+opt = SpotOptim(fun=sphere, bounds=[(-5, 5), (-5, 5)],
+                n_initial=5, max_iter=10, seed=0)
+result = opt.optimize()
+first_line = result.message.splitlines()[0]
+print(f"termination : {first_line}")
+assert "10" in result.message
+print("determine_termination check passed.")
+```
+
+---
+
+## Complete Sequential Run Summary
+
+@tbl-seq summarises every step executed along the sequential path in call order:
+
+| Step | Method | Purpose |
+|-----:|--------|---------|
+| 1 | `execute_optimization_run()` | Dispatch to sequential or parallel path |
+| 2 | `optimize_sequential_run()` | Sequential orchestrator |
+| 3 | `_initialize_run()` | Seed RNG, generate and evaluate initial design |
+| 4 | `rm_NA_values()` | Remove NaN/inf from initial evaluations |
+| 5 | `check_size_initial_design()` | Validate minimum initial design size |
+| 6 | `init_storage()` | Initialise `X_`, `y_`, `n_iter_` |
+| 7 | `update_stats()` | Compute `min_y`, `min_X`, `counter` |
+| 8 | `_init_tensorboard()` | Log initial design to TensorBoard |
+| 9 | `get_best_xy_initial_design()` | Identify initial `best_x_`, `best_y_` |
+| 10 | `_run_sequential_loop()` | Main iteration loop (Steps 11–20 per iteration) |
+| 11 | `_fit_scheduler()` | Fit surrogate to current training window |
+| 12 | `apply_ocba()` | Schedule OCBA re-evaluations (noisy problems only) |
+| 13 | `suggest_next_infill_point()` | Optimise acquisition to propose candidate |
+| 14 | `update_repeats_infill_points()` | Replicate candidate for noisy evaluation |
+| 15 | `evaluate_function()` | Call `fun` in natural scale |
+| 16 | `_handle_NA_new_points()` | Penalise or skip invalid evaluations |
+| 17 | `update_success_rate()` | Track improvement rate; trigger restart if stalled |
+| 18 | `update_storage()` | Append candidates to `X_`, `y_` |
+| 19 | `update_stats()` | Refresh statistics after new evaluations |
+| 20 | `_update_best_main_loop()` | Update `best_x_`, `best_y_` |
+| 21 | `determine_termination()` | Produce termination message and `OptimizeResult` |
+
+: Complete Sequential Run Summary {#tbl-seq}
diff --git a/docs/spotoptim_init.qmd b/docs/spotoptim_init.qmd
index 741db063..48ad47f5 100644
--- a/docs/spotoptim_init.qmd
+++ b/docs/spotoptim_init.qmd
@@ -109,14 +109,14 @@ if var_trans is None:
 Before any validation, `__init__` tries to read missing parameters directly
 from the callable `fun` using `getattr`.
 This allows self-describing objective functions to carry their own metadata:
+`fun.bounds` as a list of `(lower, upper)` tuples, `fun.var_type` as a
+per-dimension type string list, `fun.var_name` as human-readable parameter
+names, and `fun.var_trans` as a list of transformation strings.
 
-- `fun.bounds` — list of `(lower, upper)` tuples
-- `fun.var_type` — per-dimension type string list
-- `fun.var_name` — list of human-readable parameter names
-- `fun.var_trans` — list of transformation strings
-
-If neither the constructor argument nor the function attribute provide the
-value, it remains `None` and downstream steps apply their own defaults.
+`bounds` is the only mandatory parameter: if it is `None` after this
+inference step, `__init__` raises a `ValueError` immediately. The remaining
+three attributes (`var_type`, `var_name`, `var_trans`) are permitted to
+remain `None`; downstream steps supply their own defaults when needed.
 
 ```{python}
 import numpy as np
@@ -148,15 +148,28 @@ print("Parameter inference check passed.")
 ```python
 if max_iter < n_initial:
     raise ValueError(...)
+if n_jobs == -1:
+    n_jobs = os.cpu_count() or 1
+elif n_jobs == 0 or n_jobs < -1:
+    raise ValueError(...)
+if eval_batch_size < 1:
+    raise ValueError(...)
+if acquisition_optimizer_kwargs is None:
+    acquisition_optimizer_kwargs = {"maxiter": 10000, "gtol": 1e-9}
 ```
 
-`__init__` enforces that `max_iter >= n_initial`.
-`max_iter` is the total evaluation budget including the initial design,
-so it must accommodate at least `n_initial` points before any sequential
-iterations can begin.
+Four checks run before the configuration object is assembled. `max_iter`
+must be at least `n_initial` because the total budget must accommodate the
+initial design. `n_jobs` follows the scikit-learn convention: `-1` is
+resolved to `os.cpu_count()`, while `0` and values below `-1` are rejected.
+`eval_batch_size` controls how many candidate points accumulate before a
+single vectorised call is dispatched to the process pool and must be at
+least `1`. Finally, when `acquisition_optimizer_kwargs` is not supplied it
+is initialised to `{"maxiter": 10000, "gtol": 1e-9}`, providing tight
+convergence tolerances for the default differential-evolution run.
 
 ```{python}
-import pytest
+import os
 from spotoptim import SpotOptim
 from spotoptim.function import sphere
 
@@ -170,6 +183,23 @@ try:
     raise AssertionError("Expected ValueError was not raised")
 except ValueError as e:
     print(f"Caught expected error: {e}")
+
+# n_jobs=-1 resolves to all available CPU cores
+opt_p = SpotOptim(fun=sphere, bounds=[(-5, 5)], n_jobs=-1)
+assert opt_p.n_jobs == (os.cpu_count() or 1)
+print(f"n_jobs=-1 resolved to: {opt_p.n_jobs}")
+
+# n_jobs=0 is rejected
+try:
+    SpotOptim(fun=sphere, bounds=[(-5, 5)], n_jobs=0)
+    raise AssertionError("Expected ValueError was not raised")
+except ValueError as e:
+    print(f"Caught expected error: {e}")
+
+# acquisition_optimizer_kwargs default
+opt_aq = SpotOptim(fun=sphere, bounds=[(-5, 5)])
+assert opt_aq.acquisition_optimizer_kwargs == {"maxiter": 10000, "gtol": 1e-9}
+print(f"acquisition_optimizer_kwargs (default): {opt_aq.acquisition_optimizer_kwargs}")
 print("Validation check passed.")
 ```
 
@@ -863,25 +893,24 @@ print("TensorBoard no-op check passed.")
 | 2 | Machine epsilon | `self.eps` |
 | 3 | `tolerance_x` default | `tolerance_x` |
 | 4 | Parameter inference from `fun` | `bounds`, `var_type`, `var_name`, `var_trans` |
-| 5 | Validate `max_iter >= n_initial` | — |
+| 5 | Validate `max_iter`, `n_jobs`, `eval_batch_size`; default `acquisition_optimizer_kwargs` | — |
 | 6 | Create `SpotOptimConfig` | `self.config` |
 | 7 | Create `SpotOptimState` | `self.state` |
 | 8 | Store callable and `objective_names` | `self.fun`, `self.objective_names` |
 | 9 | Seed RNG | `self.rng` |
-| 10 | Factor maps and original bounds | `self._factor_maps`, `self._original_bounds` |
-| 11 | `process_factor_bounds()` | `self.bounds` (integer-encoded) |
-| 12 | Compute `n_dim` | `self.n_dim` |
-| 13 | `detect_var_type()` | `self.var_type` |
-| 14 | `modify_bounds_based_on_var_type()` | `self.bounds` |
-| 15 | Unpack bounds to arrays | `self.lower`, `self.upper` |
-| 16 | Default variable names | `self.var_name` |
-| 17 | `handle_default_var_trans()` | `self.var_trans` |
-| 18 | Snapshot bounds, `transform_bounds()` | `self._original_lower`, `self._original_upper`, `self.bounds` |
-| 19 | `setup_dimension_reduction()` | `self.ident`, `self.red_dim`, `all_*` attributes |
-| 20 | `validate_x0()` | `self.x0` (transformed, reduced) |
-| 21 | `init_surrogate()` | `self.surrogate`, `self._surrogates_list` |
-| 22 | Create `lhs_sampler` | `self.lhs_sampler` |
-| 23 | `window_size` default | `self.window_size` |
-| 24 | TensorBoard setup | `self.tb_writer` |
+| 10 | Factor maps, original bounds, `process_factor_bounds()` | `self._factor_maps`, `self._original_bounds`, `self.bounds` |
+| 11 | Compute `n_dim` | `self.n_dim` |
+| 12 | `detect_var_type()` | `self.var_type` |
+| 13 | `modify_bounds_based_on_var_type()` | `self.bounds` |
+| 14 | Unpack bounds to arrays | `self.lower`, `self.upper` |
+| 15 | Default variable names | `self.var_name` |
+| 16 | `handle_default_var_trans()` | `self.var_trans` |
+| 17 | Snapshot bounds, `transform_bounds()` | `self._original_lower`, `self._original_upper`, `self.bounds` |
+| 18 | `setup_dimension_reduction()` | `self.ident`, `self.red_dim`, `all_*` attributes |
+| 19 | `validate_x0()` | `self.x0` (transformed, reduced) |
+| 20 | `init_surrogate()` | `self.surrogate`, `self._surrogates_list` |
+| 21 | Create `lhs_sampler` | `self.lhs_sampler` |
+| 22 | `window_size` default | `self.window_size` |
+| 23 | TensorBoard setup | `self.tb_writer` |
 
 : Complete Initialization Sequence Summary {#tbl-init}
diff --git a/src/spotoptim/SpotOptim.py b/src/spotoptim/SpotOptim.py
index 9f9a0a47..91dfc7f0 100644
--- a/src/spotoptim/SpotOptim.py
+++ b/src/spotoptim/SpotOptim.py
@@ -2398,11 +2398,13 @@ def map_to_factor_values(self, X: np.ndarray) -> np.ndarray:
     # ====================
 
     def get_initial_design(self, X0: Optional[np.ndarray] = None) -> np.ndarray:
-        """Generate or process initial design points.
+        """Generate or process initial design points. Ensures that design points are in
+        internal (transformed and reduced) scale.
+        Calls `generate_initial_design()` if `X0` is None, otherwise processes user-provided `X0`.
         Handles three scenarios:
-            * X0 is None: Generate space-filling design using LHS
-            * X0 is None but x0 is provided: Generate LHS and include x0 as first point
-            * X0 is provided: Transform and prepare user-provided initial design
+            * `X0` is None: Generate space-filling design using LHS
+            * `X0` is None but starting point(s) `x0` is provided: Generate LHS and include `x0` as first point(s)
+            * `X0` is provided: Transform and prepare user-provided initial design
 
         Args:
             X0 (ndarray, optional): User-provided initial design points in original scale,
@@ -2418,6 +2420,7 @@ def get_initial_design(self, X0: Optional[np.ndarray] = None) -> np.ndarray:
             import numpy as np
             from spotoptim import SpotOptim
             from spotoptim.function import sphere
+            from spotoptim.plot.visualization import plot_design_points
             opt = SpotOptim(
                 fun=sphere,
                 bounds=[(-5, 5), (-5, 5)],
@@ -2426,17 +2429,30 @@ def get_initial_design(self, X0: Optional[np.ndarray] = None) -> np.ndarray:
             # Generate default LHS design
             X0 = opt.get_initial_design()
             print(X0.shape)
-            # Provide custom initial design
-            X0_custom = np.array([[0, 0], [1, 1], [2, 2]])
-            X0_processed = opt.get_initial_design(X0_custom)
-            print(X0_processed.shape)
+            plot_design_points(X0)
+            ```
+
+            ```{python}
+            import numpy as np
+            from spotoptim import SpotOptim
+            from spotoptim.function import sphere
+            from spotoptim.plot.visualization import plot_design_points
+            opt = SpotOptim(
+                fun=sphere,
+                bounds=[(-5, 5), (-5, 5)],
+                n_initial=10,
+                x0=np.array([0, 0])  # Starting point to include in initial design
+            )
+            X0 = opt.get_initial_design()
+            print(X0.shape)
+            plot_design_points(X0)
             ```
         """
         # Generate or use provided initial design
         if X0 is None:
             X0 = self.generate_initial_design()
 
-            # If starting point x0 was provided, include it in initial design
+            # If starting point(s) x0 was provided, include it/them in initial design
             if self.x0 is not None:
                 # x0 is already validated and in internal scale
                 # Check if x0 is 1D or 2D
@@ -2474,9 +2490,8 @@ def generate_initial_design(self) -> np.ndarray:
         """Generate initial space-filling design using Latin Hypercube Sampling.
         Used in the optimize() method to create the initial set of design points.
 
-
         Returns:
-            ndarray: Initial design points, shape (n_initial, n_features).
+            ndarray: Initial design points, shape (n_initial, n_features). Points are in the intervals defined by `self.bounds`.
 
         Examples:
             ```{python}
@@ -2484,7 +2499,9 @@ def generate_initial_design(self) -> np.ndarray:
             from spotoptim.function import sphere
             opt = SpotOptim(fun=sphere,
                             bounds=[(-5, 5), (-5, 5)],
-                            n_initial=10)
+                            n_initial=3,
+                            var_type=['float', 'int'],
+                            var_trans=['log10', None])
             X0 = opt.generate_initial_design()
             print(X0.shape)
             ```
@@ -4497,7 +4514,7 @@ def optimize(self, X0: Optional[np.ndarray] = None) -> OptimizeResult:
 
             # Execute one optimization run using the remaining budget; dispatcher
             # selects sequential vs parallel based on `n_jobs` and returns status/result.
-            status, result = self._execute_optimization_run(
+            status, result = self.execute_optimization_run(
                 timeout_start,
                 current_X0,
                 y0_known=y0_known_val,
@@ -4562,7 +4579,7 @@ def optimize(self, X0: Optional[np.ndarray] = None) -> OptimizeResult:
 
         return best_result
 
-    def _execute_optimization_run(
+    def execute_optimization_run(
         self,
         timeout_start: float,
         X0: Optional[np.ndarray] = None,
@@ -4572,7 +4589,7 @@ def _execute_optimization_run(
         shared_lock=None,  # New arg
     ) -> Tuple[str, OptimizeResult]:
         """Dispatcher for optimization run (Sequential vs Steady-State Parallel).
-        Depending on n_jobs, calls _optimize_steady_state (n_jobs > 1) or _optimize_sequential_run (n_jobs == 1).
+        Depending on n_jobs, calls optimize_steady_state (n_jobs > 1) or optimize_sequential_run (n_jobs == 1).
 
         Args:
             timeout_start (float): Start time for timeout.
@@ -4590,9 +4607,7 @@ def _execute_optimization_run(
             import time
             import numpy as np
             from spotoptim import SpotOptim
-            def sphere(X):
-                X = np.atleast_2d(X)
-                return np.sum(X**2, axis=1)
+            from spotoptim.function import sphere
             opt = SpotOptim(
                 fun=sphere,
                 bounds=[(-5, 5), (-5, 5)],
@@ -4602,7 +4617,7 @@ def sphere(X):
                 n_jobs=1,  # Use sequential optimization for deterministic output
                 verbose=True
             )
-            status, result = opt._execute_optimization_run(timeout_start=time.time())
+            status, result = opt.execute_optimization_run(timeout_start=time.time())
             print(status)
             print(result.message.splitlines()[0])
             ```
@@ -4610,14 +4625,14 @@ def sphere(X):
 
         # Dispatch to steady-state optimizer if proper parallelization is requested
         if self.n_jobs > 1:
-            return self._optimize_steady_state(
+            return self.optimize_steady_state(
                 timeout_start,
                 X0,
                 y0_known=y0_known,
                 max_iter_override=max_iter_override,
             )
         else:
-            return self._optimize_sequential_run(
+            return self.optimize_sequential_run(
                 timeout_start,
                 X0,
                 y0_known=y0_known,
@@ -4626,7 +4641,7 @@ def sphere(X):
                 shared_lock=shared_lock,
             )
 
-    def _optimize_sequential_run(
+    def optimize_sequential_run(
         self,
         timeout_start: float,
         X0: Optional[np.ndarray] = None,
@@ -4654,23 +4669,23 @@ def _optimize_sequential_run(
             ValueError: If the initial design has no valid points after removing NaN/inf values, or if the initial design is too small to proceed.
 
         Examples:
-            >>> import time
-            >>> import numpy as np
-            >>> from spotoptim import SpotOptim
-            >>> opt = SpotOptim(
-            ...     fun=lambda X: np.sum(X**2, axis=1),
-            ...     bounds=[(-5, 5), (-5, 5)],
-            ...     n_initial=5,
-            ...     max_iter=10,
-            ...     seed=0,
-            ...     n_jobs=1,  # Use sequential optimization for deterministic output
-            ...     verbose=True
-            ... )
-            >>> status, result = opt._optimize_sequential_run(timeout_start=time.time())
-            >>> print(status)
-            FINISHED
-            >>> print(result.message.splitlines()[0])
-            Optimization terminated: maximum evaluations (20) reached
+            ```{python}
+            import time
+            import numpy as np
+            from spotoptim import SpotOptim
+            from spotoptim.function import sphere
+            opt = SpotOptim(fun=sphere,
+                            bounds=[(-5, 5), (-5, 5)],
+                            n_initial=5,
+                            max_iter=10,
+                            seed=0,
+                            n_jobs=1,  # Use sequential optimization for deterministic output
+                            verbose=True
+             )
+            status, result = opt.optimize_sequential_run(timeout_start=time.time())
+            print(status)
+            print(result.message.splitlines()[0])
+            ```
         """
 
         # Store shared variable if provided
@@ -4710,7 +4725,7 @@ def _initialize_run(
         self, X0: Optional[np.ndarray], y0_known: Optional[float]
     ) -> Tuple[np.ndarray, np.ndarray]:
         """Initialize optimization run: seed, design generation, initial evaluation.
-        Called from _optimize_sequential_run.
+        Called from optimize_sequential_run (sequential path only).
 
         Args:
             X0 (Optional[np.ndarray]): Initial design points in Natural Space, shape (n_initial, n_features).
@@ -5037,7 +5052,7 @@ def _update_storage_steady(self, x, y):
         self.min_y = self.best_y_
         self.min_X = self.best_x_
 
-    def _optimize_steady_state(
+    def optimize_steady_state(
         self,
         timeout_start: float,
         X0: Optional[np.ndarray],
@@ -5045,10 +5060,8 @@ def _optimize_steady_state(
         max_iter_override: Optional[int] = None,
     ) -> Tuple[str, OptimizeResult]:
         """Perform steady-state asynchronous optimization (n_jobs > 1).
-
         This method implements a hybrid steady-state parallelization strategy.
         The executor types are selected at runtime based on GIL availability:
-
         **Standard GIL build (Python ≤ 3.12 or GIL-enabled 3.13+):**
 
         * ``ProcessPoolExecutor`` (``eval_pool``) — objective function evaluations.
@@ -5103,40 +5116,53 @@ def _optimize_steady_state(
         Args:
             timeout_start (float): Start time for timeout.
             X0 (Optional[np.ndarray]): Initial design points in Natural Space, shape (n_initial, n_features).
-            y0_known (Optional[float]): Known best value for initial design.
+            y0_known (Optional[float]): Known best objective value from a previous run.
+                When provided together with ``self.x0``, the matching point in the initial
+                design is pre-filled with this value and not re-submitted to the worker
+                pool, saving one evaluation per restart (restart injection).
             max_iter_override (Optional[int]): Override for maximum number of iterations.
 
         Raises:
-            RuntimeError: If all initial design evaluations fail, likely due to pickling issues or missing imports in the worker process.
-            The error message provides guidance on how to address this issue.
+            RuntimeError: If all initial design evaluations fail, likely due to pickling issues or missing imports in the worker process. The error message provides guidance on how to address this issue.
 
         Returns:
             Tuple[str, OptimizeResult]: Tuple containing status and optimization result.
 
         Examples:
-            >>> import time
-            >>> import numpy as np
-            >>> from spotoptim import SpotOptim
-            >>> opt = SpotOptim(
-            ...     fun=lambda X: np.sum(X**2, axis=1),
-            ...     bounds=[(-5, 5), (-5, 5)],
-            ...     n_initial=5,
-            ...     max_iter=10,
-            ...     seed=0,
-            ...     n_jobs=2,  # Use parallel optimization
-            ...     verbose=True
-            ... )
-            >>> status, result = opt._optimize_steady_state(timeout_start=time.time(), X0=None)
-            >>> print(status)
-            FINISHED
-            >>> print(result.message.splitlines()[0])
-            Optimization finished (Steady State)
+            ```{python}
+            import time
+            from spotoptim import SpotOptim
+            from spotoptim.function import sphere
+            opt = SpotOptim(
+                 fun=sphere,
+                 bounds=[(-5, 5), (-5, 5)],
+                 n_initial=5,
+                 max_iter=10,
+                 seed=0,
+                 n_jobs=2,
+            )
+            status, result = opt.optimize_steady_state(timeout_start=time.time(), X0=None)
+            print(status)
+            print(result.message.splitlines()[0])
+            ```
         """
         # Setup similar to _optimize_single_run
         self.set_seed()
         X0 = self.get_initial_design(X0)
         X0 = self.curate_initial_design(X0)
 
+        # Restart injection: if a known best value is provided, pre-fill the matching
+        # point in the initial design so it is not re-submitted to the worker pool.
+        # This mirrors the logic in _initialize_run() used by the sequential path.
+        y0_prefilled = np.full(len(X0), np.nan)
+        if y0_known is not None and self.x0 is not None:
+            dists = np.linalg.norm(X0 - self.x0, axis=1)
+            matches = dists < 1e-9
+            if np.any(matches):
+                if self.verbose:
+                    print("Skipping re-evaluation of injected best point.")
+                y0_prefilled[matches] = y0_known
+
         # We need to know how many evaluations to do
         effective_max_iter = (
             max_iter_override if max_iter_override is not None else self.max_iter
@@ -5201,10 +5227,14 @@ def _thread_batch_eval_task(X_batch):
             futures = {}  # map future -> type ('eval', 'search')
 
             # --- Phase 1: Initial Design Evaluation ---
-            if self.verbose:
-                print(f"Submitted {len(X0)} initial points for parallel evaluation...")
-
+            # Pre-filled points (restart injection) are stored on the main thread;
+            # all remaining points are submitted to eval_pool concurrently.
+            n_to_submit = 0
             for i, x in enumerate(X0):
+                if np.isfinite(y0_prefilled[i]):
+                    # Known from restart injection — store directly, skip the pool.
+                    self._update_storage_steady(x, y0_prefilled[i])
+                    continue
                 if _no_gil:
                     # Free-threaded: call fun directly in a thread — no dill.
                     fut = eval_pool.submit(_thread_eval_task_single, x)
@@ -5218,10 +5248,22 @@ def _thread_batch_eval_task(X_batch):
                         self.tb_writer = _tb_writer_temp
                     fut = eval_pool.submit(remote_eval_wrapper, pickled_args)
                 futures[fut] = "eval"
+                n_to_submit += 1
+
+            if self.verbose:
+                n_injected = int(np.sum(np.isfinite(y0_prefilled)))
+                suffix = (
+                    f" ({n_injected} injected from restart, skipped re-evaluation)."
+                    if n_injected
+                    else "."
+                )
+                print(
+                    f"Submitted {n_to_submit} initial points for parallel evaluation{suffix}"
+                )
 
-            # Wait for all initial to complete
+            # Wait for all submitted initial evaluations to complete.
             initial_done_count = 0
-            while initial_done_count < len(X0):
+            while initial_done_count < n_to_submit:
                 done, _ = wait(futures.keys(), return_when=FIRST_COMPLETED)
                 for fut in done:
                     # Clean up
diff --git a/tests/test_execute_optimization_run_example.py b/tests/test_execute_optimization_run_example.py
index 566aace8..65e36790 100644
--- a/tests/test_execute_optimization_run_example.py
+++ b/tests/test_execute_optimization_run_example.py
@@ -7,7 +7,7 @@
 from spotoptim import SpotOptim
 
 
-def test_execute_optimization_run_example():
+def testexecute_optimization_run_example():
     opt = SpotOptim(
         fun=lambda X: np.sum(X**2, axis=1),
         bounds=[(-5, 5), (-5, 5)],
@@ -18,7 +18,7 @@ def test_execute_optimization_run_example():
         verbose=False,
     )
 
-    status, result = opt._execute_optimization_run(timeout_start=time.time())
+    status, result = opt.execute_optimization_run(timeout_start=time.time())
 
     assert status == "FINISHED"
     assert result.message.splitlines()[0] == (
diff --git a/tests/test_no_gil_awareness.py b/tests/test_no_gil_awareness.py
index 51a3d131..212e2221 100644
--- a/tests/test_no_gil_awareness.py
+++ b/tests/test_no_gil_awareness.py
@@ -102,7 +102,7 @@ class TestExecutorSelection:
     """Verify the correct executor types are chosen based on GIL status."""
 
     def test_gil_build_uses_process_pool_for_eval(self):
-        """On a GIL build _optimize_steady_state must use ProcessPoolExecutor
+        """On a GIL build optimize_steady_state must use ProcessPoolExecutor
         for eval.  We verify by checking that _is_gil_disabled() is False on
         this interpreter, which is the precondition for that path."""
         # If GIL is enabled (standard build), the eval pool is a Process pool.
@@ -218,7 +218,7 @@ class TestSimulatedNoGilPath:
     """Test the thread-based eval path by mocking _is_gil_disabled() to True.
 
     These tests exercise _thread_eval_task_single and _thread_batch_eval_task
-    on the current interpreter by making _optimize_steady_state believe it is
+    on the current interpreter by making optimize_steady_state believe it is
     running on a free-threaded build.
     """
 
diff --git a/tests/test_optimize_sequential_run_example.py b/tests/test_optimize_sequential_run_example.py
index 93551b8c..1ba17746 100644
--- a/tests/test_optimize_sequential_run_example.py
+++ b/tests/test_optimize_sequential_run_example.py
@@ -9,7 +9,7 @@
 from spotoptim import SpotOptim
 
 
-def test_optimize_sequential_run_example():
+def testoptimize_sequential_run_example():
     opt = SpotOptim(
         fun=lambda X: np.sum(X**2, axis=1),
         bounds=[(-5, 5), (-5, 5)],
@@ -20,7 +20,7 @@ def test_optimize_sequential_run_example():
         verbose=False,
     )
 
-    status, result = opt._optimize_sequential_run(timeout_start=time.time())
+    status, result = opt.optimize_sequential_run(timeout_start=time.time())
 
     assert status == "FINISHED"
     assert result.message.splitlines()[0] == (
diff --git a/tests/test_optimize_steady_state_example.py b/tests/test_optimize_steady_state_example.py
index 05e8b7e3..2e34c7ee 100644
--- a/tests/test_optimize_steady_state_example.py
+++ b/tests/test_optimize_steady_state_example.py
@@ -9,7 +9,7 @@
 from spotoptim import SpotOptim
 
 
-def test_optimize_steady_state_example():
+def testoptimize_steady_state_example():
     opt = SpotOptim(
         fun=lambda X: np.sum(X**2, axis=1),
         bounds=[(-5, 5), (-5, 5)],
@@ -20,7 +20,7 @@ def test_optimize_steady_state_example():
         verbose=True,
     )
 
-    status, result = opt._optimize_steady_state(timeout_start=time.time(), X0=None)
+    status, result = opt.optimize_steady_state(timeout_start=time.time(), X0=None)
 
     assert status == "FINISHED"
     assert result.message.splitlines()[0] == "Optimization finished (Steady State)"
diff --git a/tests/test_refactored_optimize.py b/tests/test_refactored_optimize.py
index 9f884d6f..2826d606 100644
--- a/tests/test_refactored_optimize.py
+++ b/tests/test_refactored_optimize.py
@@ -31,10 +31,10 @@ def sphere(X):
         )
 
     def test_optimize_structure_sequential(self, spot_optim):
-        """Test that optimize calls _execute_optimization_run and loop works for sequential."""
+        """Test that optimize calls execute_optimization_run and loop works for sequential."""
         spot_optim.n_jobs = 1
 
-        # Mock _execute_optimization_run to return a finished result immediately
+        # Mock execute_optimization_run to return a finished result immediately
         mock_result = OptimizeResult(
             x=np.zeros(2),
             fun=0.0,
@@ -45,16 +45,16 @@ def test_optimize_structure_sequential(self, spot_optim):
             X=np.zeros((10, 2)),
             y=np.zeros(10),
         )
-        spot_optim._execute_optimization_run = MagicMock(
+        spot_optim.execute_optimization_run = MagicMock(
             return_value=("FINISHED", mock_result)
         )
 
         result = spot_optim.optimize()
 
         assert result == mock_result
-        spot_optim._execute_optimization_run.assert_called_once()
+        spot_optim.execute_optimization_run.assert_called_once()
         # Verify arguments passed (checking structure)
-        call_args = spot_optim._execute_optimization_run.call_args
+        call_args = spot_optim.execute_optimization_run.call_args
         assert call_args[1]["y0_known"] is None
 
     def test_optimize_structure_restart(self, spot_optim):
@@ -83,7 +83,7 @@ def test_optimize_structure_restart(self, spot_optim):
         )
 
         # Side effect: First call returns RESTART, second returns FINISHED
-        spot_optim._execute_optimization_run = MagicMock(
+        spot_optim.execute_optimization_run = MagicMock(
             side_effect=[("RESTART", res1), ("FINISHED", res2)]
         )
 
@@ -92,39 +92,39 @@ def test_optimize_structure_restart(self, spot_optim):
 
         result = spot_optim.optimize()
 
-        assert spot_optim._execute_optimization_run.call_count == 2
+        assert spot_optim.execute_optimization_run.call_count == 2
         assert result.fun == 0.1
         assert len(spot_optim.restarts_results_) == 2
 
         # Check if second call received the best value from first run
-        args2 = spot_optim._execute_optimization_run.call_args_list[1]
+        args2 = spot_optim.execute_optimization_run.call_args_list[1]
         assert args2[1]["y0_known"] == 1.0
 
     def test_dispatch_sequential(self, spot_optim):
-        """Verify dispatch calls _optimize_sequential_run when n_jobs=1."""
+        """Verify dispatch calls optimize_sequential_run when n_jobs=1."""
         spot_optim.n_jobs = 1
-        spot_optim._optimize_sequential_run = MagicMock(
+        spot_optim.optimize_sequential_run = MagicMock(
             return_value=("FINISHED", MagicMock())
         )
-        spot_optim._optimize_steady_state = MagicMock()
+        spot_optim.optimize_steady_state = MagicMock()
 
-        spot_optim._execute_optimization_run(timeout_start=time.time())
+        spot_optim.execute_optimization_run(timeout_start=time.time())
 
-        spot_optim._optimize_sequential_run.assert_called_once()
-        spot_optim._optimize_steady_state.assert_not_called()
+        spot_optim.optimize_sequential_run.assert_called_once()
+        spot_optim.optimize_steady_state.assert_not_called()
 
     def test_dispatch_parallel(self, spot_optim):
-        """Verify dispatch calls _optimize_steady_state when n_jobs>1."""
+        """Verify dispatch calls optimize_steady_state when n_jobs>1."""
         spot_optim.n_jobs = 2
-        spot_optim._optimize_sequential_run = MagicMock()
-        spot_optim._optimize_steady_state = MagicMock(
+        spot_optim.optimize_sequential_run = MagicMock()
+        spot_optim.optimize_steady_state = MagicMock(
             return_value=("FINISHED", MagicMock())
         )
 
-        spot_optim._execute_optimization_run(timeout_start=time.time())
+        spot_optim.execute_optimization_run(timeout_start=time.time())
 
-        spot_optim._optimize_steady_state.assert_called_once()
-        spot_optim._optimize_sequential_run.assert_not_called()
+        spot_optim.optimize_steady_state.assert_called_once()
+        spot_optim.optimize_sequential_run.assert_not_called()
 
     def test_initialize_run_calls(self, spot_optim):
         """Verify _initialize_run calls necessary setup methods."""
@@ -142,8 +142,8 @@ def test_initialize_run_calls(self, spot_optim):
         # _init_tensorboard should NOT be called here anymore
         spot_optim._init_tensorboard.assert_not_called()
 
-    def test_optimize_sequential_run_calls_init_tensorboard(self, spot_optim):
-        """Verify _optimize_sequential_run calls _init_tensorboard."""
+    def testoptimize_sequential_run_calls_init_tensorboard(self, spot_optim):
+        """Verify optimize_sequential_run calls _init_tensorboard."""
         spot_optim._initialize_run = MagicMock(
             return_value=(np.zeros((5, 2)), np.zeros(5))
         )
@@ -159,7 +159,7 @@ def test_optimize_sequential_run_calls_init_tensorboard(self, spot_optim):
         )
         spot_optim._init_tensorboard = MagicMock()
 
-        spot_optim._optimize_sequential_run(timeout_start=time.time())
+        spot_optim.optimize_sequential_run(timeout_start=time.time())
 
         spot_optim._init_tensorboard.assert_called_once()
 
diff --git a/tests/test_restart_inject_parallel.py b/tests/test_restart_inject_parallel.py
new file mode 100644
index 00000000..7e314c9d
--- /dev/null
+++ b/tests/test_restart_inject_parallel.py
@@ -0,0 +1,149 @@
+# SPDX-FileCopyrightText: 2026 bartzbeielstein
+#
+# SPDX-License-Identifier: AGPL-3.0-or-later
+
+"""Tests for restart injection in the parallel (steady-state) optimisation path.
+
+Verify that when y0_known and self.x0 are set, the matching point in the
+initial design is not re-evaluated by the worker pool, and that the stored
+best solution from the previous run is correctly carried into the next run.
+"""
+
+import time
+
+import numpy as np
+
+from spotoptim import SpotOptim
+from spotoptim.function import sphere
+
+
+class CountingObjective:
+    """Sphere function that records every point it is called with."""
+
+    def __init__(self):
+        self.calls = []
+
+    def __call__(self, X):
+        for x in X:
+            self.calls.append(x.copy())
+        return np.sum(X**2, axis=1)
+
+    def was_called_with(self, x, tol=1e-9):
+        return any(np.linalg.norm(c - x) < tol for c in self.calls)
+
+
+def test_restart_inject_skips_reeval_parallel():
+    """The injected best point must not be re-evaluated in the parallel path."""
+    obj = CountingObjective()
+
+    opt = SpotOptim(
+        fun=obj,
+        bounds=[(-5, 5), (-5, 5)],
+        n_initial=5,
+        max_iter=10,
+        seed=0,
+        n_jobs=2,
+        x0=np.array([1.0, 1.0]),
+    )
+
+    # Inject a known best value; x0=[1,1] matches the injected point.
+    y0_known = 2.0  # sphere([1, 1]) = 2
+    obj.calls.clear()
+
+    status, result = opt.optimize_steady_state(
+        timeout_start=time.time(),
+        X0=None,
+        y0_known=y0_known,
+    )
+
+    assert status == "FINISHED"
+    assert not obj.was_called_with(
+        np.array([1.0, 1.0])
+    ), "Injected best point [1, 1] should not have been re-evaluated by the worker pool."
+
+
+def test_restart_inject_result_stored_parallel():
+    """The injected best value must appear in the stored y_ array."""
+    opt = SpotOptim(
+        fun=sphere,
+        bounds=[(-5, 5), (-5, 5)],
+        n_initial=5,
+        max_iter=10,
+        seed=0,
+        n_jobs=2,
+        x0=np.array([1.0, 1.0]),
+    )
+
+    y0_known = 2.0
+    opt.optimize_steady_state(
+        timeout_start=time.time(),
+        X0=None,
+        y0_known=y0_known,
+    )
+
+    assert (
+        y0_known in opt.y_
+    ), "The pre-filled y0_known value must be present in opt.y_ after the run."
+
+
+def test_restart_inject_no_known_unchanged():
+    """Without y0_known, all n_initial points must reach storage (none skipped)."""
+    opt = SpotOptim(
+        fun=sphere,
+        bounds=[(-5, 5), (-5, 5)],
+        n_initial=5,
+        max_iter=5,
+        seed=0,
+        n_jobs=2,
+    )
+
+    status, result = opt.optimize_steady_state(
+        timeout_start=time.time(),
+        X0=None,
+        y0_known=None,
+    )
+
+    assert status == "FINISHED"
+    # All n_initial points must be stored — no injection means nothing was skipped.
+    assert len(opt.y_) == opt.n_initial
+    assert np.all(np.isfinite(opt.y_))
+
+
+def test_restart_inject_parallel_eval_count():
+    """With one injected point, y_ still contains n_initial entries and the
+    injected value is stored at exact float precision (not re-evaluated).
+
+    Note: evaluations run in separate worker processes (ProcessPoolExecutor on
+    GIL builds), so in-process call counters cannot be used here.  Instead we
+    verify the observable storage state: n_initial rows in X_/y_ and the
+    injected value appearing verbatim in y_.
+    """
+    x_inject = np.array([0.5, -0.5])
+    y_inject = float(np.sum(x_inject**2))
+
+    opt = SpotOptim(
+        fun=sphere,
+        bounds=[(-5, 5), (-5, 5)],
+        n_initial=6,
+        max_iter=6,
+        seed=1,
+        n_jobs=2,
+        x0=x_inject,
+    )
+
+    opt.optimize_steady_state(
+        timeout_start=time.time(),
+        X0=None,
+        y0_known=y_inject,
+    )
+
+    # All n_initial points must be in storage (injected + n_initial-1 via pool).
+    assert (
+        len(opt.y_) == opt.n_initial
+    ), f"Expected {opt.n_initial} stored evaluations, got {len(opt.y_)}."
+    # The injected value must appear at exact float precision — proof it was
+    # stored directly and not recalculated in a worker (sphere may differ by
+    # floating-point rounding for transformed input coordinates).
+    assert (
+        y_inject in opt.y_
+    ), f"Injected y_known={y_inject} not found in opt.y_={opt.y_}."
diff --git a/tests/test_thread_pool_search.py b/tests/test_thread_pool_search.py
index 79ff7b86..66a382cd 100644
--- a/tests/test_thread_pool_search.py
+++ b/tests/test_thread_pool_search.py
@@ -5,7 +5,7 @@
 """Tests for Release 0.8.0 — Improvement C: ThreadPoolExecutor for search tasks.
 
 Verifies that:
-- _optimize_steady_state uses ThreadPoolExecutor for search and ProcessPoolExecutor
+- optimize_steady_state uses ThreadPoolExecutor for search and ProcessPoolExecutor
   for evaluation (hybrid executor design).
 - The surrogate lock prevents concurrent refit/search races.
 - End-to-end results are correct for n_jobs > 1 with various callable types.
diff --git a/uv.lock b/uv.lock
index 17e0b664..44158d21 100644
--- a/uv.lock
+++ b/uv.lock
@@ -2795,7 +2795,7 @@ wheels = [
 
 [[package]]
 name = "spotoptim"
-version = "0.10.0"
+version = "0.10.1"
 source = { editable = "." }
 dependencies = [
     { name = "black" },
@@ -2875,7 +2875,7 @@ requires-dist = [
     { name = "requests", specifier = ">=2.32.3" },
     { name = "ruff", marker = "extra == 'dev'", specifier = ">=0.3.0" },
     { name = "safety", marker = "extra == 'dev'", specifier = ">=3.0.0" },
-    { name = "scikit-learn", specifier = ">=1.3.0" },
+    { name = "scikit-learn", specifier = ">=1.5.0" },
     { name = "scipy", specifier = ">=1.10.1" },
     { name = "seaborn", specifier = ">=0.13.2" },
     { name = "spotdesirability", specifier = ">=0.0.1" },