Skip to content

[ntuple][python][ATLAS experiment] Re-Implement context management pr…#22432

Open
rybkine wants to merge 1 commit into
root-project:masterfrom
rybkine:master-ntuple-python-ctx-mgr
Open

[ntuple][python][ATLAS experiment] Re-Implement context management pr…#22432
rybkine wants to merge 1 commit into
root-project:masterfrom
rybkine:master-ntuple-python-ctx-mgr

Conversation

@rybkine

@rybkine rybkine commented May 29, 2026

Copy link
Copy Markdown

…otocol for RNTupleReader/Writer

bindings/pyroot/pythonizations/python/ROOT/_pythonization/_rntuple.py: add __enter__ method - returns self (an instance of RNTupleReader/RNTupleWriter), __exit__ method - calls RNTupleReader/RNTupleWriter destructor (if not destructed yet).
tree/ntuple/test/ntuple_basics.py: update tests

This Pull request:

Changes or fixes:

Checklist:

  • tested changes locally
  • updated the docs (if necessary)

This PR fixes #22431

@rybkine rybkine force-pushed the master-ntuple-python-ctx-mgr branch from fcff6da to c4e56cb Compare May 29, 2026 21:08
@ferdymercury ferdymercury requested a review from silverweed May 30, 2026 13:05
…otocol for RNTupleReader/Writer

bindings/pyroot/pythonizations/python/ROOT/_pythonization/_rntuple.py: add __enter__ method - returns self (an instance of RNTupleReader/RNTupleWriter), __exit__ method - calls RNTupleReader/RNTupleWriter destructor (if not destructed yet).
tree/ntuple/test/ntuple_basics.py: update tests
@rybkine rybkine force-pushed the master-ntuple-python-ctx-mgr branch from c4e56cb to 52bc377 Compare May 30, 2026 17:36
@jblomer

jblomer commented Jun 1, 2026

Copy link
Copy Markdown
Contributor

@vepadulano @silverweed Your reviews would be useful

@rybkine

rybkine commented Jun 1, 2026

Copy link
Copy Markdown
Author

Perhaps, to say the obvious - the proposed implementation is virtually the same as that of the Python file object. And this is exactly what is needed here.

@vepadulano vepadulano left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand the feature request, but I'm not sure deleting the current implementation is the right approach. This PR is modifying quite a few tests which were present before, which is a sign of major changes and thus need to be carefully evaluated. I believe @jblomer and ultimately @silverweed should say whether/how the feature request should be addressed.

@rybkine

rybkine commented Jun 1, 2026

Copy link
Copy Markdown
Author

It is not a feature request - it is rather an alternative implementation proposed, which closely follows the implementation in the Python file object. Hence, the changes made.

@vepadulano

Copy link
Copy Markdown
Member

closely follows the implementation in the Python file object

I am not sure what you mean with "Python file object".

I think at the current stage the changes are too drastic. I would be happy to review a PR that proposes to keep the functionality of current context manager as-is, while also adding the iterator capabilities on top of that. In such case, the current test suite should not be changed. New tests would then need to be added.

@rybkine

rybkine commented Jun 2, 2026

Copy link
Copy Markdown
Author

I am not sure what you mean with "Python file object".

https://docs.python.org/3/glossary.html#term-file-object

I think at the current stage the changes are too drastic.

This is what an alternative implementation means - replacement. The current implementation is not needed.

I would be happy to review a PR that proposes to keep the functionality of current context manager as-is, while also adding the iterator capabilities on top of that.

The current context manager does not provide a useful functionality. Quite on the contrary. What it does is counterproductive - it interferes with the available RNTupleReder functionality, e.g., breaks iterability. The only sensible approach is to get rid of it altogether and simply add the context manager functionality in a transparent way. That is what the PR proposes. And that is how it is done for the Python file object for that matter https://github.com/python/cpython/blob/629da5c914b4407e01c1dc06cbcbd8dce825fef3/Lib/_pyio.py#L473-L490.
Incidentally, the context manager is implemented almost the same way for the ROOT TFile as well

def _TFileExit(obj, exc_type, exc_val, exc_tb):
"""
Close the TFile object.
Signature and return value are imposed by Python, see
https://docs.python.org/3/library/stdtypes.html#typecontextmanager.
"""
# A TFile might be storing references to objects retrieved by the user in
# a cache. Make sure the cache is cleaned at exit time rather than having
# to wait for the garbage collector.
try:
delattr(obj, "_cached_items")
except AttributeError:
pass
obj.Close()
return False
@pythonization('TFile')
def pythonize_tfile(klass):
"""
TFile inherits from
- TDirectory the pythonized attr syntax (__getattr__) and WriteObject method.
- TDirectoryFile the pythonized Get method (pythonized only in Python)
and defines the __enter__ and __exit__ methods to work as a context manager.
"""
# Pythonizations for TFile::Open
klass.Open.__creates__ = True
klass._OriginalOpen = klass.Open
klass.Open = classmethod(_TFileOpen)
# Pythonization for TFile constructor
klass._OriginalConstructor = klass.__init__
klass.__init__ = _TFileConstructor
# Pythonization for __enter__ and __exit__ methods
# These make TFile usable in a `with` statement as a context manager
klass.__enter__ = lambda tfile: tfile
klass.__exit__ = _TFileExit
.

In such case, the current test suite should not be changed. New tests would then need to be added.

First of all, the changes to the tests reflect the crucial improvement of context management protocol implementation that the proposed PR brings. But they also do the testing in a more useful and natural way, e.g., they fill and read an ntuple with several events (rather than one) and test the iterability as well. They demonstrate in detail that NTupleReader/RNTupleWriter context managers are single use context managers - see test_singleuse_ctxmanager, using the standard Python terminology (rather than "weird", for example). They make use of more specific exceptions, e.g., ROOT.RException instead of Exception, or ReferenceError instead of RuntimeError. They also make better use of the unittest functionality, in particular, moving all the code comments into the test assertion messages (that will be displayed in case of error, failure). All in all, the proposed changes to the tests are also a major improvement. And they are indispensable as they are.

@vepadulano

Copy link
Copy Markdown
Member

https://docs.python.org/3/glossary.html#term-file-object

Thanks for the pointer, this is a glossary term, there is no "Python file object" in general.

This is what an alternative implementation means - replacement. The current implementation is not needed.

I disagree on both parts of this sentence.

The current context manager does not provide a useful functionality. Quite on the contrary. What it does is counterproductive - it interferes with the available RNTupleReder functionality, e.g., breaks iterability.

I understand that is your opinion, but it cannot be used to claim that this PR is fixing an existing bug.

The only sensible approach is to get rid of it altogether and simply add the context manager functionality in a transparent way.

I disagree.

def _TFileExit(obj, exc_type, exc_val, exc_tb):
"""
Close the TFile object.
Signature and return value are imposed by Python, see
https://docs.python.org/3/library/stdtypes.html#typecontextmanager.
"""
# A TFile might be storing references to objects retrieved by the user in
# a cache. Make sure the cache is cleaned at exit time rather than having
# to wait for the garbage collector.
try:
delattr(obj, "_cached_items")
except AttributeError:
pass
obj.Close()
return False
@pythonization('TFile')
def pythonize_tfile(klass):
"""
TFile inherits from
- TDirectory the pythonized attr syntax (__getattr__) and WriteObject method.
- TDirectoryFile the pythonized Get method (pythonized only in Python)
and defines the __enter__ and __exit__ methods to work as a context manager.
"""
# Pythonizations for TFile::Open
klass.Open.__creates__ = True
klass._OriginalOpen = klass.Open
klass.Open = classmethod(_TFileOpen)
# Pythonization for TFile constructor
klass._OriginalConstructor = klass.__init__
klass.__init__ = _TFileConstructor
# Pythonization for __enter__ and __exit__ methods
# These make TFile usable in a `with` statement as a context manager
klass.__enter__ = lambda tfile: tfile
klass.__exit__ = _TFileExit

Just because the TFile context manager was implemented in a certain way it does not mean that the RNTupleReader/Writer context managers must be implemented in the same way. After all, TFile and RNTupleReader/Writer are different classes.

First of all, the changes to the tests reflect the crucial improvement of context management protocol implementation that the proposed PR brings.

As I have already argued, this PR is not bringing any crucial improvements, rather an opinionated dismissal of existing implementation.

they fill and read an ntuple with several events (rather than one)

I agree that in general we should have tests on the Python side of reading/writing more than one event via RNTupleReader/Writer.

All in all, the proposed changes to the tests are also a major improvement. And they are indispensable as they are.

The proposed test changes are wrong in general. I appreciate that the stile of the testing can be improved, e.g. by using terms like "single use context manager" instead of weird and by adding a bit more context to the assertion messages. Everything else needs to be seriously reconsidered.

Once more, this PR should be reviewed by the original author of the Pythonization of RNTupleReader/Writer just to evaluate if in principle the idea of providing the iterator protocol on the Python side is desirable or not. Every other decision will derive from this first one. Until then, there's nothing else to discuss.

@rybkine

rybkine commented Jun 3, 2026

Copy link
Copy Markdown
Author

https://docs.python.org/3/glossary.html#term-file-object

Thanks for the pointer, this is a glossary term

This term is as specific as we need here.

there is no "Python file object" in general.

There is - anything returned by the Python built-in function open (on success).

The current context manager does not provide a useful functionality. Quite on the contrary. What it does is counterproductive - it interferes with the available RNTupleReder functionality, e.g., breaks iterability.

I understand that is your opinion, but it cannot be used to claim that this PR is fixing an existing bug.

This PR restores iterability and thus does fix a bug, why can it not be affirmed?

The only sensible approach is to get rid of it altogether and simply add the context manager functionality in a transparent way.

I disagree.

Without arguments to support this disagreement what is it worth?

Just because the TFile context manager was implemented in a certain way it does not mean that the RNTupleReader/Writer context managers must be implemented in the same way. After all, TFile and RNTupleReader/Writer are different classes.

This does mean the RNTupleReader/Writer context managers must be implemented in way not inferior to the known implementations. This PR does so by virtually borrowing one (the Python file object implementation).

First of all, the changes to the tests reflect the crucial improvement of context management protocol implementation that the proposed PR brings.

As I have already argued, this PR is not bringing any crucial improvements, rather an opinionated dismissal of existing implementation.

This PR is an improvement as it fixes an issue and has every right to dismiss the existing implementation simply because it proposes a superior implementation.

The proposed test changes are wrong in general.

We are expected to be very specific when making this sort of statements. All the test changes in this PR are perfectly correct and relevant until shown otherwise.

evaluate if in principle the idea of providing the iterator protocol on the Python side is desirable or not.

The iterator protocol and anything else available on the C++ side (with very rare exceptions) are supposed to be available on the Python side. This is what the Python bindings to the ROOT framework written in C++ are about.

@silverweed

silverweed commented Jun 8, 2026

Copy link
Copy Markdown
Contributor

Dear @rybkine,

Thank you for your contribution. While reintroducing the iterability of RNTupleReader/Writer is definitely a good thing which we should do, and while less code is always better than more code everything else being equal, your change removes the one piece of functionality that the wrapper class was made to provide: i.e. a more helpful error message in case the reader/writer is accessed after exiting the context.

Given something like:

import ROOT

with ROOT.RNTupleReader.Open("Contributors", "RNTuple.root") as reader:
   pass

reader.GetNEntries()

Output before this change:

Traceback (most recent call last):
  File "<python-input-3>", line 1, in <module>
    reader.GetNEntries()
    ^^^^^^^^^^^^^^^^^^
  File "/home/jp/root_build/debug_pyroot/lib/ROOT/_pythonization/_rntuple.py", line 148, in __getattribute__
    raise RuntimeError(
        f"cannot access {super().__getattribute__('_pretty_name')} after the `with` statement is exited"
    )
RuntimeError: cannot access RNTupleReader after the `with` statement is exited

After this change:

Traceback (most recent call last):
  File "<python-input-3>", line 1, in <module>
    reader.GetNEntries()
    ~~~~~~~~~~~~~~~~~~^^
ReferenceError: attempt to access a null-pointer

A solution like the one you're proposing here was considered as one of the first options (one could say it's the obvious solution), but after internal discussions with @vepadulano we agreed that it's valuable to give the user a better feedback in case of accessing the object past the context lifetime.

That said, breaking the iteration functionality was clearly not intended and we should reintroduce it, but not at the cost of removing the custom exception.

@rybkine

rybkine commented Jun 9, 2026

Copy link
Copy Markdown
Author

RuntimeError: cannot access RNTupleReader after the with statement is exited

This error message is in general incorrect, as it does not correspond to the cause of the error. For example,

>>> reader = ROOT.RNTupleReader.Open("ntuple", "RootFile.root")
>>> reader.GetNEntries()
250
>>> reader._closed = True
>>> reader.GetNEntries()
Traceback (most recent call last):
  File "<python-input-19>", line 1, in <module>
    reader.GetNEntries()
    ^^^^^^^^^^^^^^^^^^
  File "<masked>/x86_64-el9-gcc15-opt/lib/ROOT/_pythonization/_rntuple.py", line 148, in __getattribute__
    raise RuntimeError(
        f"cannot access {super().__getattribute__('_pretty_name')} after the `with` statement is exited"
    )
RuntimeError: cannot access RNTupleReader after the `with` statement is exited

There was no with statement involved whatsoever. What is more the RuntimeError exception is supposed to be raised when an error is detected that doesn’t fall in any of the other categories.
In the case after exiting the context the cause of the error is "attempt to access a null-pointer", and the relevant type is ReferenceError in full agreement with its purpose in Python and cppyy documentation for that matter.
Appropriate usage of exception messages and types is another reason why the proposed alternative implementation of the context management protocol is superior to the current implementation.

@silverweed

Copy link
Copy Markdown
Contributor

This error message is in general incorrect, as it does not correspond to the cause of the error. For example [...]

In your example you are modifying the "private" field _closed, so it's not really a counter argument.

What is more the RuntimeError exception is supposed to be raised when an error is detected that doesn’t fall in any of the other categories.

Sure, we can change the type of the error raised; this does not really undermine the point of having this check in the first place.

Appropriate usage of exception messages and types is another reason why the proposed alternative implementation of the context management protocol is superior to the current implementation.

This is a non-sequitur. Could you please provide an actual reason why you think it's acceptable to sacrifice the user-friendly error message in favor of the generic "null pointer access" that your code changes imply?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

In Python, RNTupleReader no longer iterable

4 participants