Silent failure with --runner=DataflowRunner

System information

- Environment: Google Colab, Vertex AI (`KubeflowV2DagRunner`)
- TensorFlow version: 2.8.0
- TFX Version: 1.7.0
- Python version: 3.7.12

---

Here's my `preprocessing_fn` (redacted for clarity):

```python
_FEATURES = [# list of str
]
_SPECIAL_IMPUTE = {
    'special_foo': 1,
}
HOURS = [1, 2, 3, 4]
TABLE_KEYS = {
    'XXX': ['XXX_1', 'XXX_2', 'XXX_3'],
    'YYY': ['YYY_1', 'YYY_2', 'YYY_3'],
}

@tf.function
def _divide(a, b):
  return tf.math.divide_no_nan(tf.cast(a, tf.float32), tf.cast(b, tf.float32))

def preprocessing_fn(inputs):
  x = {}

  for name, tensor in sorted(inputs.items()):
    if tensor.dtype == tf.bool:
      tensor = tf.cast(tensor, tf.int64)

    if isinstance(tensor, tf.sparse.SparseTensor):
      default_value = '' if tensor.dtype == tf.string else 0
      tensor = tft.sparse_tensor_to_dense_with_shape(tensor, [None, 1], default_value)
    
    x[name] = tensor

  x['foo'] = _divide((x['foo1'] - x['foo2']), x['foo_denom'])
  x['bar'] = tf.cast(x['bar'] > 0, tf.int64)

  for hour in HOURS:
      total = tf.constant(0, dtype=tf.int64)
      for device_type in DEVICE_TYPES.keys():
          total = total + x[f'some_device_{device_type}_{hour}h']

  # one hot encode categorical values
  for name, keys in TABLE_KEYS.items():
    with tf.init_scope():
      initializer = tf.lookup.KeyValueTensorInitializer(
          tf.constant(keys), 
          tf.constant([i for i in range(len(keys))]))
      table = tf.lookup.StaticHashTable(initializer, default_value=-1)
      
    indices = table.lookup(tf.squeeze(x[name], axis=1))
    one_hot = tf.one_hot(indices, len(keys), dtype=tf.int64)

    for i, _tensor in enumerate(tf.split(one_hot, num_or_size_splits=len(keys), axis=1)):
      x[f'{name}_{keys[i]}'] = _tensor

  return {name: tft.scale_to_0_1(x[name]) for name in _FEATURES}
```

Here's the `beam_pipeline_args`:

```python
BIG_QUERY_WITH_DIRECT_RUNNER_BEAM_PIPELINE_ARGS = [
   '--project=' + GOOGLE_CLOUD_PROJECT,
   '--temp_location=' + os.path.join('gs://', GCS_BUCKET_NAME, 'tmp'),
   '--runner=DataflowRunner',
   '--region=us-central1',
  '--experiments=upload_graph', # must be enabled, otherwise fails with 413
   '--dataflow_service_options=enable_prime',
   '--autoscaling_algorithm=THROUGHPUT_BASED',
   ]
```

Not sure if related but with the above `preprocessing_fn`, my transform first failed with the error:

```
RuntimeError: The order of analyzers in your `preprocessing_fn` appears to be non-deterministic. This can be fixed either by changing your `preprocessing_fn` such that tf.Transform analyzers are encountered in a deterministic order or by passing a unique name to each analyzer API call.
```

I then added names to the `tft.scale_to_0_1` analyzers:

```python
return {name: tft.scale_to_0_1(x[name], name=f'{name}_scale_to_0_1') for name in _FEATURES}
```

After which my transform just silently failed without logs (see first screenshot). I check the worker logs but there's nothing substantial, only warnings (see second screenshot).

It's worth noting that I have the `enable_prime` flag.

<img width="1048" alt="Screen Shot 2022-03-25 at 2 01 59 PM" src="https://user-images.githubusercontent.com/42384776/160201090-4b37e7e3-a1ba-4835-a098-3e703bf47d57.png">

<img width="901" alt="Screen Shot 2022-03-25 at 2 01 40 PM" src="https://user-images.githubusercontent.com/42384776/160201117-fb1e676f-a76e-4e05-85dc-ca76ccd65520.png">



Provide feedback

Saved searches

Use saved searches to filter your results more quickly