Skip to content

tools/mpremote: Poll for friendly REPL prompt before raw mode entry.#48

Draft
andrewleech wants to merge 73 commits into
masterfrom
mpremote_raw_repl_retry
Draft

tools/mpremote: Poll for friendly REPL prompt before raw mode entry.#48
andrewleech wants to merge 73 commits into
masterfrom
mpremote_raw_repl_retry

Conversation

@andrewleech
Copy link
Copy Markdown
Owner

Summary

Issue micropython#13504 has been open since 2023 and affects users primarily on macOS: mpremote fails with TransportError: could not enter raw repl on the first invocation after plugging in a device. The thread has accumulated a workaround (a 2-second sleep inside enter_raw_repl), confirmations across boards and Python versions, and a comment from @pavelrevak noting his mpytool does not hit it because of a different polling strategy.

The root cause is that opening the serial port toggles DTR/RTS on common USB-serial converters, which resets ESP32-class boards. enter_raw_repl then sends a single Ctrl-C, flushes input, sends Ctrl-A, and blocks waiting for the raw banner. If the device is still booting, the banner never arrives and the call times out. The time.sleep(2) workaround works only because the device finishes booting in that window.

This change replaces the single Ctrl-C + flush with _wait_for_friendly_prompt, which polls for \r\n>>> with a 0.2s interval, alternating Ctrl-C (interrupt program) and Ctrl-B (exit raw REPL, every third attempt). After four unsuccessful attempts it also sends the VFS-hook ACK byte (0x18) to unblock a fs_hook agent stuck on a host read. Once the friendly prompt is confirmed, Ctrl-A is sent and the existing banner-read logic runs unchanged. The prompt-confirmation phase is capped at min(3, timeout_overall), so existing callers see at most a small additional delay before the original failure mode is reported, and successful connections on already-responsive devices are not slowed.

Approach is modelled on the polling pattern in mpytool that @pavelrevak described in micropython#13504, adapted to mpremote's existing read_until and serial abstractions.

Testing

Not yet tested on hardware. The change needs to be validated against the macOS + ESP32 + CP2102/CP210x configuration that reproduces micropython#13504, and against a representative set of other ports (STM32, RP2, nRF) to confirm no regression in the fast path where devices respond to the first Ctrl-C. I'm planning to do this before this PR is ready for merge; flagging it now so the approach can be reviewed in parallel.

Trade-offs and Alternatives

The friendly-prompt phase adds up to 3 seconds of wall time on a device that never responds, before the same TransportError is raised. In practice this replaces the 10s banner-read timeout that previously fired in the same scenario, so the failure path is faster, not slower.

An alternative would be to keep the current code path and add time.sleep(2) as the issue thread suggests. That works around the macOS case but does not address devices stuck in raw REPL or with a stalled fs_hook agent, and it penalises every invocation rather than only the ones that need it.

Generative AI

I used generative AI tools when creating this PR, but a human has checked the code and is responsible for the description above.

dpgeorge and others added 30 commits May 18, 2026 14:14
At version 5.9.0, to replace the existing copy at `lib/cmsis`.

Signed-off-by: Damien George <damien@micropython.org>
At version v6.3.0.

Signed-off-by: Damien George <damien@micropython.org>
This is a no-op change to the firmware because these CMSIS sources are
equivalent (verified building the default board for all 6 affected ports).

Signed-off-by: Damien George <damien@micropython.org>
This is now replaced by the `lib/CMSIS_5` submodule, which has exactly the
same content at the 5.9.0 tag in the directory `CMSIS/Core/Include`.

Signed-off-by: Damien George <damien@micropython.org>
Signed-off-by: Damien George <damien@micropython.org>
Built all 14 boards, all have binary equivalent firmware.

Signed-off-by: Damien George <damien@micropython.org>
Built all 8 PCA100xx boards, and they have binary equivalent firmware with
this change.

Signed-off-by: Damien George <damien@micropython.org>
Built all 18 boards, all have binary equivalent firmware.

Signed-off-by: Damien George <damien@micropython.org>
Built the 30 NUCLEO_xxx boards, all have binary equivalent firmware except
for NUCLEO_N657X0 (although it's firmware is the same size before and after
this commit).  Ran full test suite on OPENMV_N6 with CMSIS 6 to verify the
N6 and it passed.

Built mboot for NUCLEO_WB55, PYBV10, PYBD_SF2 and PYBD_SF6.  All have
binary equivalent firmware.

Signed-off-by: Damien George <damien@micropython.org>
This work was funded through GitHub Sponsors.

Signed-off-by: Angus Gratton <angus@redyak.com.au>
Issue turned out to be something else, but I think there's still a
potential race here without this fix.

This work was funded through GitHub Sponsors.

Signed-off-by: Angus Gratton <angus@redyak.com.au>
Was causing the next test to fail (maybe only when the other instance
skipped, not sure.)

This work was funded through GitHub Sponsors.

Signed-off-by: Angus Gratton <angus@redyak.com.au>
Test always fails on this chip, looks like possibly an ESP-IDF v5.5.1 bug.

Signed-off-by: Angus Gratton <angus@redyak.com.au>
Feature check .py.exp files are not needed at all (and most of them are
empty).  Instead, the output of the feaure test is checked against a known
value within `run-tests.py` to see if that feature is enabled or not (and
in some cases like `byteorder.py` there are multiple valid output
values).

Signed-off-by: Damien George <damien@micropython.org>
They are provided in a private header that is already included.

Signed-off-by: Damien George <damien@micropython.org>
Also support v5.5.4, but don't use it yet due to possible issues discussed
in micropython#19149.

Signed-off-by: Damien George <damien@micropython.org>
Signed-off-by: Damien George <damien@micropython.org>
Signed-off-by: Damien George <damien@micropython.org>
Instead of unconditionally (ie when the `select` module is enabled).  This
way, a minimal board can enable the `select` module (eg for asyncio) but
still have `select.select()` disabled by default to save size.

Signed-off-by: Damien George <damien@micropython.org>
Prior to this change some of the smaller boards (eg nRF51) would have
the .py asyncio code frozen but be missing both the `select` module and
the core C-level `_asyncio` module, making `asyncio` useless.

This commit lets asyncio be configured at the makefile level, in order to
freeze the correct code.  Both the `select` and `_asyncio` modules will be
enabled if asyncio is enabled.

asyncio is now enabled by default on all boards except MICROBIT (which is
too small due to the other modules is has).  This costs about +2100 bytes
on boards that did not already have it enabled, eg PCA10028.

Signed-off-by: Damien George <damien@micropython.org>
To match other ports, and allow the `pyproject.toml` F821 ignore rule to
cover it.

Signed-off-by: Damien George <damien@micropython.org>
The UART DRE flag is always set if the DATA register is empty.  That would
lead to a TXIDLE event each time the IRQ handler was run.  To fix that,
only fire the TXIDLE event if the DRE interrupt flag is enabled.

Partially fixes the `tests/extmod/machine_uart_irq_txidle.py` test on
`ADAFRUIT_ITSYBITSY_M0_EXPRESS`.

Signed-off-by: Damien George <damien@micropython.org>
The C IRQ handler can be called with multiple IRQ flags active, eg both
RXC and DRE set.  But the Python IRQ handler should only see the events
that it registered.  So mask the flags as appropriate.

Combined with the parent commit, `tests/extmod/machine_uart_irq_txidle.py`
now passes on `ADAFRUIT_ITSYBITSY_M0_EXPRESS`.

Signed-off-by: Damien George <damien@micropython.org>
Clear TRCENA (if it was enabled for DWT CYCCNT) before
calling NVIC_SystemReset(), otherwise it spins forever.

Signed-off-by: iabdalkader <i.abdalkader@gmail.com>
Signed-off-by: Damien George <damien@micropython.org>
The LPTIMER does not reset when the CPU is reset and may have state left
over, eg a timer interrupt running.  So clear that when the device starts
up.

This fixes `machine.deepsleep()` in the follow scenario: the device first
wakes via LPTIMER but then goes to deepsleep again without it, and is
unable to wake again via LPGPIO.

Signed-off-by: Damien George <damien@micropython.org>
This allows LPTIMER to wake the device from lightsleep mode.

Signed-off-by: Damien George <damien@micropython.org>
Signed-off-by: Damien George <damien@micropython.org>
Passing in a millisecond timeout argument to `machine.lightslep()` will
configure the LPTIMER to wake the system.

Signed-off-by: Damien George <damien@micropython.org>
This allows the GPIO IRQ to be enabled even if a Python `Pin.irq()` is not
configured for that pin.

Signed-off-by: Damien George <damien@micropython.org>
robert-hh and others added 28 commits May 27, 2026 16:07
Based on original work by Kwabena W. Agyeman.

Signed-off-by: robert-hh <robert@hammelrath.com>
First version, working to some extent:
- create CAN
- init()
- deinit()
- send()
- cancel_send()
- get_counters()
- get_timings()
- state()
- recv()
- restart()
- irq handlers (TX irq is missing)

All port specific functions are implemented, but not all of them are
tested, and some aspects still need consideration.

Open topics:
- Filter masks.
- TX IRQ. Maybe not useful.
- Get the number of FIFO elements instead just a single flag.
- Use the timing settings from init().
- Consider, if the actual value of RFFN is appropriate
  (25 TXMB and 128 Filters). Maybe more TXMB's
- Add setting for the other boards.
- More TESTING, espcially with other boards.

Signed-off-by: robert-hh <robert@hammelrath.com>
Of course only for the boards that support CAN.
Moved CAN1 at Teensy to Pins A8 and A9.
Added a CAN pinout section to the documentation.

Signed-off-by: robert-hh <robert@hammelrath.com>
Supporting filter masks now matches the documentation.

Signed-off-by: robert-hh <robert@hammelrath.com>
Add a conditional compile to either use the timing parameters
as provided by calculate_brp() or use the timing values as determined
by the NXP library. In that case, report back the final settings for
the object print.
Provide C constants for the min/max values of the timing parameters.

Signed-off-by: robert-hh <robert@hammelrath.com>
- Use a single setting to define the number of filters and TX MBs.
  The number of filters is set in machine_can_port_init() to
  CAN_IDFILTERNUM_MAX, and then RFFN and the number of TX MBs
  are derived from that number. At the moment it's 128 Filter
  and 25 TX MBs. Another reasonable number would be 64 filters as there
  are 64 filter masks and 41 TX MBs.
- Remove code duplication, especially the port's print function.
  The common print function is good and provides more information.

Signed-off-by: robert-hh <robert@hammelrath.com>
Not just a single write of the abort command. Step 3 of the instructions
is omitted, because the IFLAG is never set. It returns True if
the message was waiting for transmission or is actually being
transmitted when calling, and False, when no message is waiting or
is transferred.

The code cannot tell the difference between a pending message and a
message being transferred. So it may happen that can.cancel_send()
returns True, but the message is transmitted.

Finally, the CS field of the MB is set to the initial state.

Signed-off-by: robert-hh <robert@hammelrath.com>
Switching between modes requires to call can.deinit() before a new
call to can.init(). Calling can.deinit() will clear the device
including the filters, but must not clear the CAN object.
That is deferred until soft reset.

MP_CAN_MODE_SILENT only receives acknowledged messages.

Signed-off-by: robert-hh <robert@hammelrath.com>
These were define with default values, but port specific values can
be used instead.

Signed-off-by: robert-hh <robert@hammelrath.com>
Signed-off-by: robert-hh <robert@hammelrath.com>
tests/multi-extmod:
- Test 4: Use a hard IRQ avoiding event loss.
- Test 5: Count the result of can.cancel_send() instead of the
  non-supported IRQ_TX_FAILED event.
- Test 8: Add some dummy print statements expected by the test script
  for situations which cannot be handled by MIMXRT.
  Change irq handler to deal either with RX or TX.

tests/tests/extmod_hardware/machine_can_instances.py:
- Set the mode to CAN.MODE_SILENT_LOOPBACK.

Signed-off-by: robert-hh <robert@hammelrath.com>
- The approach does not stop with fail at the first non-empty MB
  with a matching ID, but keeps scanning for a free MB with a higher
  index number and therefore lower priority.
- If CAN_MSG_FLAG_UNORDERED is set, allow to pick the first unused MB
  if the MB with the matching ID is busy.
- Add MACROs to make can_find_txmb() better readable.

Signed-off-by: robert-hh <robert@hammelrath.com>
Changes:
- Add a TX interrupt.
- Do not enable Warning and overflow interrupts for IRQ_RX, avoiding
  additional calls of the IRQ handler.
- Add flags for bus state interrupts.
- The recv() code is slightly shortened with the assumption, the data
  buffer for a received message is always sufficiently large.
  That applies most likely to send() as well, but it's out of scope
  for machine_can.

Comments:
- The handling of interrupts follows the scheme, that the respective
  interrupt is disabled in the handler, if it was triggered. The flags
  telling which interrupt happened stay enabled, such that they can
  be reported by can.irq().flags().
- Reading a message or defining the trigger (re-)enables the RX IRQ.
- The IRQ_TX interrupt happens when any message have been successfully
  sent. At the moment there is not feedback which message of possibly
  many in the TX queue.
- Call can.irq().flags() repeatedly to get the message indexes of all
  transmitted messages. These are coded in the upper 16 bits of
  the value returned by can.irq().flags() (as documented).
- The IRQ_TX interrupt for a MMB is enabled when sending a message or
  when can.irq().flags() is called.

Signed-off-by: robert-hh <robert@hammelrath.com>
Make it consistent to the documentation.
The code now counts and the errors when state transitions happen to
a more severe state and triggers an callback if requested.
A state IRQ is to be signalled for every bus state change. Not all
bus state changes cause a dedicated hardware interrupt. In that case
the state change will only be noticed during processing of another
event, like general bus error, RX, TX, ....
Consider the value of CAN_TX_QUEUE_LEN depending on the value of
FSL_FEATURE_FLEXCAN_HAS_ERRATA_5829.

Signed-off-by: robert-hh <robert@hammelrath.com>
Signed-off-by: robert-hh <robert@hammelrath.com>
- Declare local functions as "static".
- Use named constants for numbers when possible
- Put the IRQ handler into RAM
- Clarify comments

Signed-off-by: robert-hh <robert@hammelrath.com>
Making it a little bit more efficient with MIMXRT.

Signed-off-by: robert-hh <robert@hammelrath.com>
Signed-off-by: robert-hh <robert@hammelrath.com>
That way, a port can do the internal update of the filter set at once.

Signed-off-by: robert-hh <robert@hammelrath.com>
If instance0 starts too quickly before instance1, it could "babble"
enough messages onto the bus without ACK that it goes into Error
Passive. Once in Error Passive, it has to leave an extra 8 bit times
("Suspend Transmission Time") after each message to allow other nodes
to communicate. This prevents the test from working as designed.

This work was funded through GitHub Sponsors.

Signed-off-by: Angus Gratton <angus@redyak.com.au>
This patch updates the protocol to the latest stable
release (v1.0.0.0.0).

Signed-off-by: iabdalkader <i.abdalkader@gmail.com>
Add support for esp-hosted v1.0.0.0.0 firmware.

Signed-off-by: iabdalkader <i.abdalkader@gmail.com>
Newer IDF/ESP seems to send some response after reset that must be
skipped. This change is backward compatible with older firmware.

Signed-off-by: iabdalkader <i.abdalkader@gmail.com>
Update the logging macros to automatically include the function name
using __func__, removing the need to manually specify it at each call
site.

Signed-off-by: iabdalkader <i.abdalkader@gmail.com>
This change refactors the driver to enqueue TX frames into a queue
instead of calling spi_transfer directly. This allows the polling
function to perform full-duplex transfers that send queued TX frames
while simultaneously receiving RX data. This change prevents RX frames
that were already queued by the firmware from being lost during TX-only
transfers.

Signed-off-by: iabdalkader <i.abdalkader@gmail.com>
- Move the "generic" mpconfigboard.cmake files to per-chip
  files in the top-level board directory.

- Replace all the copy-paste in other boards' mpconfigboard.cmake
  files to directly include those ones.

- Rename sdkconfig.c3usb config file to sdkconfig.c3 as this file no longer
  contains any USB-specific config settings. This allowed deleting some
  other sdkconfig.board files which were copies of sdkconfig.c3usb.

- Replace using set() to extend a list with CMake list(APPEND ...)

The downside is a bit of extra complexity, and had to use a CMake function
to opt-out of sdkconfig.spiram_sx on S2/S3 boards without SPIRAM.

This work was funded through GitHub Sponsors.

Signed-off-by: Angus Gratton <angus@redyak.com.au>
Signed-off-by: Angus Gratton <angus@redyak.com.au>
Replaces the common practice of setting 80MHz & QIO flash in
individual sdkconfig.board files.

This work was funded through GitHub Sponsors.

Signed-off-by: Angus Gratton <angus@redyak.com.au>
enter_raw_repl previously sent a single Ctrl-C, flushed the input
buffer, then sent Ctrl-A and waited up to 10s for the raw REPL banner.
This fails on devices that are still booting (e.g. after DTR/RTS
toggling on macOS USB-serial converters resets the board on port open),
or that are stuck in raw REPL, or that have a fs_hook agent stalled on
a VFS read; the banner never arrives because the device is not in a
state where Ctrl-A is meaningful.

Add _wait_for_friendly_prompt which polls for `\r\n>>> ` with short
0.2s timeouts, alternating Ctrl-C and Ctrl-B and (after a few attempts)
also sending the VFS-hook ACK byte (0x18). enter_raw_repl now confirms
the device is in friendly REPL before sending Ctrl-A.

Approach matches the polling pattern used successfully by mpytool to
work around the same DTR/RTS reset issue on macOS, discussed in micropython#13504.
@andrewleech andrewleech force-pushed the mpremote_raw_repl_retry branch from d640931 to 74fa650 Compare May 28, 2026 13:52
@codecov-commenter
Copy link
Copy Markdown

⚠️ Please install the 'codecov app svg image' to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 98.47%. Comparing base (1c63211) to head (74fa650).
❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files
@@           Coverage Diff           @@
##           master      #48   +/-   ##
=======================================
  Coverage   98.47%   98.47%           
=======================================
  Files         176      176           
  Lines       22845    22845           
=======================================
  Hits        22497    22497           
  Misses        348      348           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.