Incorrect `pytest` error classification and parallel execution issues

## Description

While reviewing PR #71, I observed several unexpected test errors in the submitted results. Some of these errors appear to be caused by running `pytest` with multiple processes, while others appear to be incorrectly classified as failures, and may indicate issues in how test outcomes are handled or executed. 

## Expected Behavior

- Tests that are not applicable under certain conditions (e.g., some operators have not yet been implemented for some platforms) should be marked as skipped, rather than reported as errors.
- Running tests with different levels of parallelism (e.g., multiple workers) should not introduce unexpected errors.
- Some level of non-determinism may occasionally introduce transient errors, but such errors should not be persistent.

## Actual Behavior

1. Prior to #71, there are ~58 errors. In this PR, the submitted logs show ~5 errors. The reason for this reduction is unclear, but it is suspected to be due to environment changes (e.g., different PyTorch versions)
2. Using `pytest -n 1` instead of multiple workers can reduce the number of errors reported; 
3. The persistent errors should technically be classified as "skipped". 

## Logs / Screenshots

For instance, the final submitted `nvidia.log` that only uses a single worker (i.e., `pytest -n 1`):  
```bash
=================================== FAILURES ===================================
______________________________ tests/test_cast.py ______________________________
[gw0] linux -- Python 3.10.16 /home/huangjiacheng/.venv/bin/python
worker 'gw0' crashed while running 'tests/test_cast.py::test_cast[cuda-input_dtype0-out_dtype0-0.001-0.001-shape0-None-None]'
_____________________________ tests/test_linear.py _____________________________
[gw1] linux -- Python 3.10.16 /home/huangjiacheng/.venv/bin/python
worker 'gw1' crashed while running 'tests/test_linear.py::test_linear[cuda-dtype0-0.01-0.05-False-False-False-a_shape0-b_shape0-out_shape0]'
______________________________ tests/test_cast.py ______________________________
[gw2] linux -- Python 3.10.16 /home/huangjiacheng/.venv/bin/python
worker 'gw2' crashed while running 'tests/test_cast.py::test_cast[cuda-input_dtype4-out_dtype4-0.01-0.005-shape4-None-None]'
_____________________________ tests/test_linear.py _____________________________
[gw3] linux -- Python 3.10.16 /home/huangjiacheng/.venv/bin/python
worker 'gw3' crashed while running 'tests/test_linear.py::test_linear[cuda-dtype0-0.01-0.05-False-False-False-a_shape1-b_shape1-out_shape1]'
______________________________ tests/test_cast.py ______________________________
[gw4] linux -- Python 3.10.16 /home/huangjiacheng/.venv/bin/python
worker 'gw4' crashed while running 'tests/test_cast.py::test_cast[cuda-input_dtype4-out_dtype4-0.01-0.005-shape3-None-None]'
================== xdist: maximum crashed workers reached: 4 ===================
=========================== short test summary info ============================
FAILED tests/test_cast.py::test_cast[cuda-input_dtype0-out_dtype0-0.001-0.001-shape0-None-None]
FAILED tests/test_linear.py::test_linear[cuda-dtype0-0.01-0.05-False-False-False-a_shape0-b_shape0-out_shape0]
FAILED tests/test_cast.py::test_cast[cuda-input_dtype4-out_dtype4-0.01-0.005-shape4-None-None]
FAILED tests/test_linear.py::test_linear[cuda-dtype0-0.01-0.05-False-False-False-a_shape1-b_shape1-out_shape1]
FAILED tests/test_cast.py::test_cast[cuda-input_dtype4-out_dtype4-0.01-0.005-shape3-None-None]
=========== 5 failed, 3108 passed, 1000 skipped in 191.93s (0:03:11) ===========
```

But when using more than 1 worker: 
```bash
================== xdist: maximum crashed workers reached: 16 ==================
=========================== short test summary info ============================
FAILED tests/test_cast.py::test_cast[cuda-input_dtype0-out_dtype0-0.001-0.001-shape0-None-None]
FAILED tests/test_linear.py::test_linear[cuda-dtype0-0.01-0.05-False-False-False-a_shape0-b_shape0-out_shape0]
FAILED tests/test_linear.py::test_linear[cuda-dtype1-0.01-0.01-True-True-False-a_shape2-b_shape2-out_shape2]
FAILED tests/test_matmul.py::test_matmul[cuda-dtype1-0.01-0.01-False-False-a_shape1-b_shape1-c_shape1]
FAILED tests/test_mul.py::test_mul[cuda-dtype0-1e-07-1e-07-shape0-None-None-None]
FAILED tests/test_mul.py::test_mul[cuda-dtype7-0-0-shape3-None-None-None] - w...
FAILED tests/test_cast.py::test_cast[cuda-input_dtype0-out_dtype0-0.001-0.001-shape1-input_strides1-out_strides1]
FAILED tests/test_linear.py::test_linear[cuda-dtype0-0.01-0.05-False-False-False-a_shape1-b_shape1-out_shape1]
FAILED tests/test_linear.py::test_linear[cuda-dtype0-0.01-0.05-True-True-True-a_shape4-b_shape4-out_shape4]
FAILED tests/test_mul.py::test_mul[cuda-dtype0-1e-07-1e-07-shape1-input_strides1-other_strides1-out_strides1]
FAILED tests/test_mul.py::test_mul[cuda-dtype3-0-0-shape5-input_strides5-other_strides5-None]
FAILED tests/test_mul.py::test_mul[cuda-dtype6-0-0-shape8-input_strides8-other_strides8-out_strides8]
FAILED tests/test_matmul.py::test_matmul[cpu-dtype0-0.01-0.01-True-False-a_shape0-b_shape0-c_shape0]
FAILED tests/test_cast.py::test_cast[cuda-input_dtype0-out_dtype0-0.001-0.001-shape2-None-None]
FAILED tests/test_linear.py::test_linear[cuda-dtype2-0.01-0.01-True-False-True-a_shape3-b_shape3-out_shape3]
FAILED tests/test_mul.py::test_mul[cuda-dtype2-0.01-0.005-shape8-input_strides8-other_strides8-out_strides8]
FAILED tests/test_mul.py::test_mul[cuda-dtype6-0-0-shape3-None-None-None] - w...
FAILED tests/test_cat.py::test_cat[cuda-dtype1-0.001-0.001-shapes3-1-out_shape3]
========== 18 failed, 4087 passed, 1000 skipped in 125.52s (0:02:05) ===========
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Incorrect `pytest` error classification and parallel execution issues #75

Description

Expected Behavior

Actual Behavior

Logs / Screenshots

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Incorrect pytest error classification and parallel execution issues #75

Description

Description

Expected Behavior

Actual Behavior

Logs / Screenshots

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Incorrect `pytest` error classification and parallel execution issues #75