Skip to content

fix(windows): use wide-char API for non-ASCII path support#386

Merged
DeusData merged 1 commit into
DeusData:mainfrom
jjserenity:fix/windows-wide-path-support
May 30, 2026
Merged

fix(windows): use wide-char API for non-ASCII path support#386
DeusData merged 1 commit into
DeusData:mainfrom
jjserenity:fix/windows-wide-path-support

Conversation

@jjserenity
Copy link
Copy Markdown
Contributor

Fixes #357

Replace ANSI Windows APIs with their wide-char (W) counterparts to support non-ASCII UTF-8 paths (CJK, Cyrillic, Turkish, etc.) on Windows.

Changes

  • New src/foundation/win_utf8.h: UTF-8 ↔ UTF-16 conversion helpers (cbm_utf8_to_wide, cbm_wide_to_utf8)
  • src/foundation/compat_fs.c:
    • cbm_opendir/cbm_readdir: FindFirstFileA/FindNextFileAFindFirstFileW/FindNextFileW
    • cbm_mkdir_p: _mkdir_wmkdir
    • cbm_unlink: _unlink_wunlink
    • cbm_rmdir: _rmdir_wrmdir
  • src/foundation/platform.c:
    • cbm_mmap_read: CreateFileA/CreateFileMappingACreateFileW/CreateFileMappingW
    • cbm_file_exists/cbm_is_dir: GetFileAttributesAGetFileAttributesW
    • cbm_file_size: GetFileAttributesExAGetFileAttributesExW
  • src/discover/discover.c:
    • safe_stat + try_load_nested_gitignore + cbm_discover: stat()_wstat64() via new wide_stat() helper

Scope

Windows only — all changes are guarded by #ifdef _WIN32. POSIX paths untouched. Zero behavioral change for macOS/Linux.

Tested

  • All changed files compile clean on MinGW GCC 14.2 (-Wall -Wextra -Werror -fsyntax-only)
  • Full test suite requires POSIX build environment (CI will verify)

Replace ANSI Windows APIs (FindFirstFileA, CreateFileA, etc.) with
wide-char (W) counterparts. Add UTF-8/UTF-16 conversion helpers.
Fix stat() calls in discover pipeline to use _wstat64.

Fixes DeusData#357
@DeusData
Copy link
Copy Markdown
Owner

Thank you, @jjserenity! 🙏 This is a clean, comprehensive wide-char migration — exactly what was needed for #357 (the Turkish-path report) and the broader Windows non-ASCII class. I reviewed it closely:

  • The new win_utf8.h helpers are textbook-correct (two-call MultiByteToWideChar/WideCharToMultiByte sizing, NULL-safe, malloc'd return).
  • I traced the free-discipline at every conversion site — each cbm_utf8_to_wide is freed on every return path (error and success), and the cbm_wide_to_utf8/free(u8) in readdir is balanced too. No leaks, no double-frees.
  • The safe_statwide_stat (_wstat64), FindFirstFileW/FindNextFileW, _wmkdir/_wunlink/_wrmdir, and CreateFileW/GetFileAttributesW swaps all look right, each gated behind #ifdef _WIN32 with the POSIX path untouched.

Verified locally on macOS: the POSIX branches and shared refactors compile clean under -Werror and all 3,622 tests pass (the _WIN32 branches are validated by the Windows CI build). Merging via squash — authorship preserved. Closes #357, and contributes to the Windows cluster in #394. Thank you for making non-ASCII paths work on Windows! 🙏

@DeusData DeusData merged commit 2389d82 into DeusData:main May 30, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Windows non-ASCII path support issue in indexing pipeline

2 participants