MRG: Use `mffpy` for EGI MFF event reading by PragnyaKhandelwal · Pull Request #13932 · mne-tools/mne-python

PragnyaKhandelwal · 2026-05-29T10:03:41Z

Reference issue (if any)

None.

What does this implement/fix?

This replaces the internal EGI MFF event-reading path with mffpy and keeps the existing RawMff event contract intact. It also adds a small compatibility shim in mne/fixes.py for mffpy timestamp parsing so real-world MFF files with nanosecond fractional seconds can still be read correctly.

Additional information

I tested this change against the EGI MFF test module with the current testing dataset, and the full mne/io/egi/tests/test_egi.py suite passes locally.

This branch intentionally keeps the scope narrow. The separate changelog fragment should use the final PR number once the PR is opened on GitHub.

AI disclosure:
I used GitHub Copilot to help draft the PR text and review the scope.

…fixes

for more information, see https://pre-commit.ci

PragnyaKhandelwal · 2026-05-29T13:47:35Z

Hi @scott-huberty! Here is the first micro-PR for the event reader refactor, exactly as we discussed yesterday.
Quick status update on what this PR includes:
Refactored the EGI MFF event-reading path to use mffpy while keeping the existing RawMff event contract intact.
Removed legacy unused event helper code to address the Vulture dead-code warnings (commit b10ea86).
Added a temporary compatibility shim in mne/fixes.py for 9-digit fractional-second beginTime values (using defusedxml).
Kept the scope intentionally narrow to event-reading internals only.

Upstream Update:
I raised the issue here BEL-Public/mffpy#138 regarding the timestamp bug. The maintainer got back to me immediately and confirmed their pending PR #133 fixes this case! Once we review this PR and figure out the test failures, I will update the TODO comment in our shim to explicitly track their PR

Failing Checks:
The style checks are green now, but we are hitting some functional test failures in the CI. I am going to look into the logs to track down the mismatch, but if you have a minute to look at the failing CI checks and see if it's something obvious that I'm missing, I'd really appreciate the help!

pmolfese · 2026-05-29T14:58:46Z

+    # mffpy.Reader for locating the Events.xml files inside the MFF.
+    _soft_import("mffpy", "reading EGI MFF data")
+    _soft_import("defusedxml", "reading EGI MFF data")
+    import defusedxml.ElementTree as DET


Is there a reason you're using difusedxml instead of mffpy.XML.from_file( )?

Thanks so much for taking a look! There are two reasons for this:

MNE-Python has a strict internal policy requiring defusedxml for all XML parsing to protect against XML vulnerability attacks.

Because of the 9-digit fractional timestamp bug we discussed in issue BEL-Public/mffpy#138 calling mffpy.XML directly crashes the CI right now. I'm using defusedxml to route the parsing through a temporary shim in mne/fixes.py until your PR gets merged and released!

The struggle is real as we say. This might be something more for @scott-huberty and @drammock to ponder beyond this GSOC.

The MFF reader in mffpy currently used for reading epochs or evoked files is via Reader() is already using the mffpy XML.from_file() at least for getting header information and events (mne/io/egi/egimff.py). So there's already some exposure to another XML reader in the current codebase.

The pros of keeping the mffpy way of things is just how easy some things are like returning typed objects (.sensors, .epochs, .events), recover=True for malformed XML, and the fact that Reader() is already using it.

for evfile in sorted(glob(op.join(input_fname, "Events_*.xml"))): track = XML.from_file(evfile) for event in track.events: code = event.get("code") if code is None: continue ... ...

I suppose we could have a conversation with BEL and other mffpy maintainers about switching the current lxml backend to defusedxml.

My thesis is currently: "if the point of this project is to use mffpy to parse MFF files, then you should use mffpy's way of doing things and if those need changing, then we should probably change those upstream". Perhaps the compromise could be using the XML.* functions for this with the goal of changing things upstream in mffpy.

Hmm - okay faster than expected I have a mostly working defusedxml backend for mffpy. Might be able to crank this out in a week or so.

@drammock / @larsoner : 1) I assume it's ok if one of our dependencies parses XML files using lxml?

and 2) If mffpy devs cut a release after the aforementioned timestamp bug is fixed upstream... Is there any chance that MNE is comfortable with putting a lower pin on mffpy's latest release, so that we do not have to keep our shim around? 😁

Pins are fine for newly introduced dependencies. It's only an issue if we move from no pin to pin, or bump the pin.

I assume it's ok if one of our dependencies parses XML files using lxml?

we use defusedxml in our code because it closes some vulnerabilities present in the standard library. We cannot force dependencies to do likewise, but can suggest/encourage it upstream. IIRC, lxml has similar safeguards to defusedxml, but is a much more heavyweight package (does more stuff). I'd need to look into it to be sure though.

If a required dependency used lxml then I'd actually consider switching us to use it too --- why have 2 libs installed if one will suffice? But mffpy will be an optional dep (right?) so that logic doesn't necessarily apply, at least not as strongly, esp if there are salient differences in security or install size or API convenience.

Edit: sorry for the in Felicity of my last message, I just saw that @pmolfese is already looking into what it would take to switch to defusedxml upstream. Thanks! LMK if I can be of help there.

Pins are fine for newly introduced dependencies. It's only an issue if we move from no pin to pin, or bump the pin.

OK - we already depend on mffpy for exporting to MFF.. so this could cause problems for users who already have mffpy installed on their machine. If we do add this pin, we should use MNE's _check_version helper to ensure that the user has the minimum necessary version of mffpy at runtime.

I looked through the MFFPy commits since 0.10.0 (2024 june) and it looks like all bugfixes / CI / maintenance stuff since then. So the risk of a pin breaking user code is low. No objection to setting lower pin of 0.11 when it's released.

pmolfese · 2026-05-29T15:07:00Z

+        files_list = []
+    tracks = []
+    for xml_name in files_list:
+        if not xml_name.lower().endswith(".xml"):


This will have you parsing all XML files instead of just event ones. Probably unnecessary file IO.

Great catch here! You are completely right—parsing info.xml and the others here is totally unnecessary I/O. I will update this loop to explicitly filter for the event files (e.g., startswith("Events_") or similar) in my next commit. Thank you!

pmolfese · 2026-05-29T15:10:40Z

+            if event_start is None:
+                continue
+            start_sec = (event_start - start_time).total_seconds()
+            code_str = ev.get("code", "")


There are probably a couple checks you could add:

codes are supposed to be 4 characters (e.g. "STIM") and if not, then you're probably not reading a real MFF file

You probably also want to read in the labels for use if/when you add annotations

These are really great suggestions. Because this is the very first step of my GSoC project, my mentors requested that I keep this initial micro-PR strictly confined to 1:1 functional parity with MNE's legacy reader. The legacy reader didn't enforce the 4-character limit or map the labels, so I want to avoid expanding the scope just yet.
However, adding proper annotations is on my roadmap for Phase 2 here #13926, so I will definitely be referencing those labels when we get there!

scott-huberty · 2026-05-29T17:39:55Z

Thx for taking a look at this @pmolfese !!

pmolfese · 2026-06-01T14:01:40Z

    """Parse an MFF timestamp with nanosecond fractional seconds.

-    TODO VERSION: Remove once mffpy fixes EventTrack.beginTime parsing.
+    TODO VERSION: Remove once BEL-Public/mffpy#133 is released.


This PR is now merged. If you pull against the develop branch you should be able to test it against your code without the monkey patch.

Hey @pmolfese, just pulled the develop branch and tested it locally—it works perfectly! No more crashes on the nanosecond timestamps. Thank you so much for turning that around so fast!
I'll leave the shim in this PR for now just to keep the MNE CI happy (since it pulls from PyPI), but I'll open a quick PR to rip it out as soon as you guys cut the next official release.

I requested a new release from mffpy: BEL-Public/mffpy#141

However even if they do cut a release, we will need to keep the shim if MNE decides against pinning to the latest release.

@PragnyaKhandelwal see #13932 (comment) - we are good to bump the lower pin (when mffpy cuts a new release).

Awesome, thanks for digging into the commit history @drammock, and thanks for the update @scott-huberty!
Since mffpy hasn't officially published v0.11 to PyPI yet, how would you prefer to handle this PR?
We can merge it as-is (with the shim), and I can open a quick follow-up PR to bump the pin and remove the shim the day the release drops. Or, we can just leave this PR open and do it all right here once v0.11 is live.
Happy to do whichever fits the workflow best!

@PragnyaKhandelwal I suppose that we can merge it with the shim and remove it later

@scott-huberty should the xml parse be switched out to the mffpy one first? Or make that a follow up PR?

@pmolfese do you mean should we use lxml for our shim instead of defusedxml?

That is a good question..Given the discussion at BEL-Public/mffpy#139, I guess I would vote for a follow up PR that is entirely dedicated to swapping out all uses of defusedxml for lxml in our codebase.

And hopefully we get a new mffpy release soon and can remove our shim, which would make that follow up PR a little simpler!

@scott-huberty - I meant more to use the mffpy methods to get the XML info (like events) which abstract away the actual XML parser. For example, I might do something like the following to get event codes, labels, and onset times:

markers = [] # list of (code, sample_int, label) track = XML.from_file(evfile) for event in track.events: code = event.get("code") if code is None: continue sample = _dt_to_sample(event["beginTime"], start_dt, sfreq) #convert time to sample label = event.get("label") if label is None or label == "None": label = code if code not in codes: codes.append(code) markers.append((code, sample, label))

Obviously can be done after initial merge as well. And of course happy to help with other conversions to lxml but I suspect for EGI files, the mffpy functions should wrap that functionality entirely. They have methods for doing events, epochs, etc. Should abstract away some complexity and lay the burden on other (likely my) shoulders to make sure they're working with any updates or iterations of MFF.

Ah Ok! Yes, I think that ideally we should let mffpy do the XML parsing, and we just use their API to get the events, etc.

@PragnyaKhandelwal after BEL cuts v0.11, can you make sure that we rely on mffpy API to extract the events, particularly in our function _read_mff_events? Ideally we should not have to do any xml parsing ourselves.

scott-huberty

@PragnyaKhandelwal this looks good, but the code coverage is low!

I ran pytest --cov=mne/io/egi --cov-report=term-missing:skip-covered mne/io/egi/tests/test_egi.py

On this branch I got 67% for events.py, but on main it is 91%

Here is the CodeCov report for events.py

It looks like there are some fallbacks / conditionals that are never hit in the tests. So we need to add tests, or if we really don't expect to hit these conditions, we can discuss removing them.

scott-huberty · 2026-06-03T18:38:51Z

+    # Use defusedxml to parse Events XML directly (avoid mffpy's strict
+    # datetime parsing which may include nanosecond fractions). We still use
+    # mffpy.Reader for locating the Events.xml files inside the MFF.


Suggested change

# Use defusedxml to parse Events XML directly (avoid mffpy's strict

# datetime parsing which may include nanosecond fractions). We still use

# mffpy.Reader for locating the Events.xml files inside the MFF.

# Use defusedxml to parse Events XML directly until

# mffpy v0.11 is released

scott-huberty · 2026-06-03T20:13:57Z

    """Parse an MFF timestamp with nanosecond fractional seconds.

-    TODO VERSION: Remove once mffpy fixes EventTrack.beginTime parsing.
+    TODO VERSION: Remove once BEL-Public/mffpy#133 is released.


@pmolfese do you mean should we use lxml for our shim instead of defusedxml?

That is a good question..Given the discussion at BEL-Public/mffpy#139, I guess I would vote for a follow up PR that is entirely dedicated to swapping out all uses of defusedxml for lxml in our codebase.

And hopefully we get a new mffpy release soon and can remove our shim, which would make that follow up PR a little simpler!

pmolfese · 2026-06-03T21:31:59Z

BEL is cutting the v0.11 release now. Waiting for one more reviewer, but the tests look solid.

pmolfese · 2026-06-03T23:28:55Z

Realizing the new docs aren't posted! XML.from_file() is a context aware parser of MFF:

`XML.from_file(filepointer, recover=True)`

Parses an MFF XML file and returns the appropriate typed object based on the file's root tag.

Common return types

File	Type returned
`Events_*.xml`	`EventTrack`
`categories.xml`	`Categories`
`info.xml`	`FileInfo`
`epochs.xml`	`Epochs`

Example — reading events

import mffpy
from mffpy.xml_files import XML

folder = mffpy.Reader("my_recording.mff")
fp = folder.directory.filepointer("Events_ECI.xml")
event_track = XML.from_file(fp)

for evt in event_track.events:
    print(evt['code'], evt['beginTime'], evt.get('duration'))

Useful patterns

All stimulus codes

codes = [evt['code'] for evt in event_track.events]

Filter to a specific code

targets = [evt for evt in event_track.events if evt['code'] == 'fix+']

Onset times in seconds relative to recording start

onsets_s = [evt['relativeBeginTime'] / 1e6 for evt in event_track.events]

Access a key value safely

cell = evt.get('keys', {}).get('cel#')

scott-huberty · 2026-06-04T16:45:01Z

BEL is cutting the v0.11 release now. Waiting for one more reviewer, but the tests look solid.

awesome!

@PragnyaKhandelwal and I met today -- It looks like some of LOC in this PR that are not covered by tests are related to the shim. Once mffpy v0.11 is released, @PragnyaKhandelwal will remove the shim (and with it, some of uncovered LOC)! Then we can address any remaining gaps in the test coverage, and merge.

scott-huberty · 2026-06-04T17:00:24Z

Access a key value safely

cell = evt.get('keys', {}).get('cel#')

Incorporating the cel# value when reading events would be really helpful I think.

In my lab, experiments are designed such that for example the event code is always stm+ and the cel# is used to differentiate conditions. (e.g. {'standard': 1, 'deviant': 2}). I've always relied on my own bespoke scripts to extract this information. I believe EEGLAB's mff reader would read these events as stm+_1, stm+_2, whereas MNE's reader would just return stm+ for all events.

I think this can be a dedicated follow-up PR, though.

PragnyaKhandelwal and others added 2 commits May 29, 2026 15:24

MRG: use mffpy for EGI MFF events; add timestamp parsing shim in mne.…

e465273

…fixes

[pre-commit.ci] auto fixes from pre-commit.com hooks

5a2248e

for more information, see https://pre-commit.ci

PragnyaKhandelwal mentioned this pull request May 29, 2026

GSoC 2026: Use mffpy for EGI Reader #13926

Open

11 tasks

PragnyaKhandelwal and others added 3 commits May 29, 2026 15:40

Add changelog entry

e8a58ad

MAINT: remove unused helper functions from EGI event reader

b10ea86

[pre-commit.ci] auto fixes from pre-commit.com hooks

31d8824

for more information, see https://pre-commit.ci

PragnyaKhandelwal marked this pull request as ready for review May 29, 2026 13:55

PragnyaKhandelwal requested review from agramfort, drammock and larsoner as code owners May 29, 2026 13:55

pmolfese reviewed May 29, 2026

View reviewed changes

PragnyaKhandelwal added 2 commits May 29, 2026 23:51

MAINT: narrow EGI MFF XML parsing to event files

b815651

TEST: align EGI bad XML coverage with event files

f4119cf

pmolfese reviewed Jun 1, 2026

View reviewed changes

scott-huberty mentioned this pull request Jun 2, 2026

Release v0.11? BEL-Public/mffpy#141

Open

scott-huberty reviewed Jun 3, 2026

View reviewed changes

Uh oh!

Conversation

PragnyaKhandelwal commented May 29, 2026

Reference issue (if any)

What does this implement/fix?

Additional information

Uh oh!

PragnyaKhandelwal commented May 29, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

pmolfese May 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

scott-huberty May 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

scott-huberty commented May 29, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

PragnyaKhandelwal Jun 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

pmolfese Jun 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

scott-huberty left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

pmolfese commented Jun 3, 2026

Uh oh!

pmolfese commented Jun 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

XML.from_file(filepointer, recover=True)

Useful patterns

pmolfese May 29, 2026 •

edited

Loading

scott-huberty May 30, 2026 •

edited

Loading

PragnyaKhandelwal Jun 3, 2026 •

edited

Loading

pmolfese Jun 3, 2026 •

edited

Loading

pmolfese commented Jun 3, 2026 •

edited

Loading

`XML.from_file(filepointer, recover=True)`

scott-huberty commented Jun 4, 2026 •

edited

Loading