Skip to content

Non-deterministic TopK results with default SessionOptions #28018

@lrcyyds1

Description

@lrcyyds1

Describe the issue

ort_bug_report.tar.gz

Bug Description

ONNX Runtime produces non-deterministic results for models containing TopK operators when InferenceSession is created without explicit SessionOptions. The same model with identical inputs can produce outputs differing by up to 64,000x across multiple runs.

Severity: Critical

Reproduction Rate: ~15% of runs

Environment

  • ONNX Runtime Version: 1.x
  • Platform: Windows 11 Pro
  • Execution Provider: CPUExecutionProvider
  • Python Version: 3.9+

Minimal Reproduction

import onnxruntime as ort
import numpy as np

# Load model and inputs (see attached files)
model_path = "original.onnx"
inputs = np.load("inputs.npz")
feed_dict = {k: inputs[k] for k in inputs.files}

# BUG: Non-deterministic results
for i in range(50):
    sess = ort.InferenceSession(model_path)  # No SessionOptions
    output = sess.run(None, feed_dict)
    print(f"Run {i}: {output[2].max():.2f}")
# Output varies by 60,000+ across runs!

# FIX: Deterministic with explicit SessionOptions
opts = ort.SessionOptions()
sess = ort.InferenceSession(model_path, opts)
# Now always produces same result

Reproduction Materials

I have prepared a complete reproduction package containing:

  • Detailed bug report with root cause analysis
  • Minimal reproduction script
  • ONNX model files
  • Test input data

Package: ort_bug_report.tar.gz (113 KB)

ort_bug_report.tar.gz

Impact

This bug affects any production system that:

  • Uses TopK operators
  • Creates InferenceSession without explicit SessionOptions
  • Expects deterministic inference (critical for ML systems)

Workaround

Always explicitly create SessionOptions:

opts = ort.SessionOptions()
opts.log_severity_level = 3  # Any explicit setting works
sess = ort.InferenceSession(model_path, opts)

Discovery Method

This bug was discovered through metamorphic testing of ONNX Runtime optimizations. We tested 1,000 diverse models and detected this non-determinism in sample #606.

Request

Please investigate this critical non-determinism issue. I can provide additional information or testing as needed.

To reproduce

I have prepared a complete reproduction package containing:

  • Detailed bug report with root cause analysis
  • Minimal reproduction script
  • ONNX model files
  • Test input data

ort_bug_report.tar.gz

Urgency

No response

Platform

Windows

OS Version

11

ONNX Runtime Installation

Built from Source

ONNX Runtime Version or Commit ID

1.19.2

ONNX Runtime API

Python

Architecture

X64

Execution Provider

Default CPU

Execution Provider Library Version

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions