[pull] master from git:master#208
Merged
Merged
Conversation
…tion f85b49f (diff: improve scaling of filenames in diffstat to handle UTF-8 chars, 2026-01-16) introduced a loop in show_stats() that calls utf8_width() repeatedly to skip leading characters until the displayed width fits. However, utf8_width() can return problematic values: - For invalid UTF-8 sequences, pick_one_utf8_char() sets the name pointer to NULL and utf8_width() returns 0. Since name_len does not change, the loop iterates once more and pick_one_utf8_char() dereferences the NULL pointer, crashing. - For control characters, utf8_width() returns -1, so name_len grows when it is expected to shrink. This can cause the loop to consume more characters than the string contains, reading past the trailing NUL. By default, fill_print_name() will C-quote filenames which escapes control characters and invalid bytes to printable text. That avoids this bug from being triggered; however, with core.quotePath=false, most characters are no longer escaped (though some control characters still are) and raw bytes can reach this code. Add tests exercising both failure modes with core.quotePath=false and a narrow --stat-name-width to force truncation: one with a bare 0xC0 byte (invalid UTF-8 lead byte, triggers NULL deref) and one with several C1 control characters (repeats of 0xC2 0x9F, causing the loop to read past the end of the string). The second test reliably catches the out-of-bounds read when run under ASan, though it may pass silently without sanitizers. Fix both issues by introducing utf8_ish_width(), a thin wrapper around utf8_width() that guarantees the pointer always advances and the returned width is never negative: - On invalid UTF-8 it restores the pointer, advances by one byte, and returns width 1 (matching the strlen()-based fallback used by utf8_strwidth()). - On a control character it returns 0 (matching utf8_strnwidth() which skips them). Also add a "&& *name" guard to the while-loop condition so it terminates at end-of-string even when utf8_strwidth()'s strlen() fallback causes name_len to exceed the sum of per-character widths. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Historically, config entries like alias.foo.bar expanded the alias "foo.bar". The subsection-based alias syntax introduced in ac1f12a (alias: support non-alphanumeric names via subsection syntax, 2026-02-18) broke that behavior by treating such entries as if they were subsection syntax. Restore support for the old dotted form by falling back to the full name when the final key is not "command". Add tests covering execution and help output for simple dotted aliases. Reported-by: Michael Grossfeld <michael.grossfeld@amd.com> Helped-by: Jeff King <peff@peff.net> Signed-off-by: Jonatan Holmgren <jonatan@jontes.page> Signed-off-by: Junio C Hamano <gitster@pobox.com>
A future patch will change the `safe.bareRepository` default from `all` to `explicit` under `WITH_BREAKING_CHANGES`. At that point, every test that operates on a bare repository through implicit discovery would fail, regardless of whether the test is actually about discovery or about how a specific command behaves once inside a bare repository. The maintainer suggested [1] setting `safe.bareRepository=all` in the test environment's global config whenever `WITH_BREAKING_CHANGES` is in effect, rather than adjusting each affected test to access bare repositories explicitly (via `--git-dir`, `GIT_DIR`, or similar). This means the test suite continues to exercise only the historical default behavior even after the user-facing default changes, relying on a small number of dedicated tests in t0035 to validate the new, stricter default. Since `$HOME` points at the trash directory (which doubles as the test repository's working tree), writing to `$HOME/.gitconfig` also creates a file inside the working tree. Exclude it via `.git/info/exclude` to limit the fallout, though this does not help tests that use `git ls-files --others` without `--exclude-standard` or `git status --ignored`; those are addressed by subsequent commits. The `.git/info/exclude` write is guarded by `test -d .git/info` rather than using `mkdir -p`, because some tests (e.g. t0008) expect to create `.git/info/` themselves and would fail with Patrick Steinhardt's `set -e` preparation (ps/test-set-e-clean) if the directory already existed. For tests using `TEST_NO_CREATE_REPO` (where no `.git/` exists at all), the guard also handles that case. [1] https://lore.kernel.org/git/xmqqse98cc51.fsf@gitster.g/ Original-patch-by: Junio C Hamano <gitster@pobox.com> Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
The XDG config tests for `git maintenance register/unregister` create a fresh `$XDG_CONFIG_HOME/git/config` and expect git to use that location. However, if `$HOME/.gitconfig` exists (which may happen when test-lib.sh writes global config, e.g. to set `safe.bareRepository`), git prefers `$HOME/.gitconfig` over the XDG location, and the `maintenance.repo` entry ends up in the wrong file. This is an inherent consequence of setting global config in test-lib.sh rather than adjusting individual tests: writing any entry to `$HOME/.gitconfig` has side effects beyond the intended setting, because the mere existence of that file changes which global config location git prefers for all subsequent writes. Individual per-test adjustments would not have this interaction. Fix this by overriding `HOME` to a non-existent directory inside the subshells that test XDG behavior. Since these subshells already override `XDG_CONFIG_HOME`, they do not need `$HOME/.gitconfig` at all, and the subshell scoping ensures the original `HOME` is restored automatically. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Since test-lib.sh now writes `safe.bareRepository=all` to the global config when `WITH_BREAKING_CHANGES` is in effect, that entry shows up in `git config --list` output. Tests in t1300 that expect exact config contents then fail because of this unexpected extra line. Unlike the working-tree contamination fixed in the preceding commits, this is not about the file's existence but about its content leaking into test expectations. Since t1300 does not use bare repositories, simply remove the injected setting in a preparatory step. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Assisted-by: Claude Opus 4.6 Signed-off-by: Junio C Hamano <gitster@pobox.com>
Earlier tests in t1305 overwrite `$HOME/.gitconfig` with their own content as part of testing config includes. This clobbers the `safe.bareRepository=all` entry that test-lib.sh writes when `WITH_BREAKING_CHANGES` is in effect, causing `git -C cycle config` to fail with "not in a git directory" when it tries to access the bare repository created by `git init --bare cycle`. Use `--git-dir=.` to access the bare repo explicitly, avoiding the dependency on global config for repository discovery. Assisted-by: Claude Opus 4.6 Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
One test in t5601 overwrites `$HOME/.gitconfig` with an `includeIf` configuration snippet and removes the file in its cleanup. This destroys the `safe.bareRepository=all` entry that test-lib.sh writes when `WITH_BREAKING_CHANGES` is in effect, causing later tests that use `git -C <bare-repo> config` to fail with "not in a git directory". Back up `.gitconfig` before overwriting and restore it in the cleanup, so the global config survives into subsequent tests. Assisted-by: Claude Opus 4.6 Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
The global `safe.bareRepository=all` setting in test-lib.sh is written to `$HOME/.gitconfig`, which unfortunately lives inside the test repository's working tree. The `.git/info/exclude` entry added alongside it handles most commands, but `git ls-files --others` without `--exclude-standard` does not consult `info/exclude` at all, so the file appears in the output. Ideally, each test that accesses a bare repository would simply specify `--git-dir` or `GIT_DIR` explicitly, which would require no global config and produce no side effects in the working tree. As that approach was not taken, filter `.gitconfig` from the output before comparing against expected results. In t7104, the test already uses `--exclude-standard`, so it suffices to switch from the bare `git ls-files -o` to `git ls-files -o --exclude-standard` which respects the `info/exclude` entry; the other tests deliberately omit `--exclude-standard` because their purpose is to verify unfiltered `--others` output. Assisted-by: Claude Opus 4.6 Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Since test-lib.sh creates `$HOME/.gitconfig` when `WITH_BREAKING_CHANGES` is in effect, the file appears in `git status` output as either untracked (`?? .gitconfig`) or ignored (`!! .gitconfig` / `! .gitconfig`, depending on porcelain version), because the `.git/info/exclude` entry causes git to treat it as an ignored file rather than hiding it entirely. In t7061 and t7521, which are pervasively affected, introduce a `filter_gitconfig` helper that strips all status-prefix variants of `.gitconfig` from the output before comparison. In the remaining scripts (t7060, t7064, t7508), apply targeted adjustments. Assisted-by: Claude Opus 4.6 Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
When an attacker can convince a user to clone a crafted repository that contains an embedded bare repository with malicious hooks, any Git command the user runs after entering that subdirectory will discover the bare repository and execute the hooks. The user does not even need to run a Git command explicitly: many shell prompts run `git status` in the background to display branch and dirty state information, and `git status` in turn may invoke the fsmonitor hook if so configured, making the user vulnerable the moment they `cd` into the directory. The `safe.bareRepository` configuration variable (introduced in 8959555 (setup_git_directory(): add an owner check for the top-level directory, 2022-03-02)) already provides protection against this attack vector by allowing users to set it to "explicit", but the default remained "all" for backwards compatibility. Since Git 3.0 is the natural point to change defaults to safer values, flip the default from "all" to "explicit" when built with `WITH_BREAKING_CHANGES`. This means Git will refuse to work with bare repositories that are discovered implicitly by walking up the directory tree. Bare repositories specified via `--git-dir` or `GIT_DIR` continue to work, and directories that look like `.git`, worktrees, or submodule directories are unaffected (the existing `is_implicit_bare_repo()` whitelist handles those cases). Users who rely on implicit bare repository discovery can restore the previous behavior by setting `safe.bareRepository=all` in their global or system configuration. The test for the "safe.bareRepository in the repository" scenario needed a more involved fix: it writes a `safe.bareRepository=all` entry into the bare repository's own config to verify that repo-local config does not override the protected (global) setting. Previously, `test_config -C` was used to write that entry, but its cleanup runs `git -C <bare-repo> config --unset`, which itself fails when the default is "explicit" and the global config has already been cleaned up. Switching to direct git config --file access avoids going through repository discovery entirely. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
When replaying commits it may happen that some of the commits become empty relative to their parent. Such commits are for now automatically dropped by the replay subsystem without much control from the user. Introduce a new enum that allows the caller to drop, keep or abort in this case. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
The function `commit_tree_with_edited_message_ext()` can be used to commit a tree with a specific list of parents with an edited commit message. This function is useful outside of editing the commit message though, as it also performs the plumbing to extract the original commit message and strip some headers from it. Refactor the function to receive a flags field that allows the caller to control whether or not the commit message should be edited, or whether it should be retained as-is. This will be used in a subsequent commit. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
The newly introduced git-history(1) command provides functionality to
easily edit commit history while also rebasing dependent branches. The
functionality exposed by this command is still somewhat limited though.
One common use case when editing commit history that is not yet covered
is fixing up a specific commit. Introduce a new subcommand that allows
the user to do exactly that by performing a three-way merge into the
target's commit tree, using HEAD's tree as the merge base. The flow is
thus essentially:
$ echo changes >file
$ git add file
$ git history fixup HEAD~
Like with the other commands, this will automatically rebase dependent
branches, as well. Unlike the other commands though:
- The command does not work in a bare repository as it interacts with
the index.
- The command may run into merge conflicts. If so, the command will
simply abort.
Especially the second item limits the usefulness of this command a bit.
But there are plans to introduce first-class conflicts into Git, which
will help use cases like this one.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We have a function named verify_utf8, but it does more than verify, it modifies the buffer if it is not UTF-8. This is different from what most people would expect, so call the function ensure_utf8, since it mutates the buffer in some cases. Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
The ensure_utf8 function can mutate the buffer to change its encoding, so we must call it before signing the buffer so that we do not invalidate the signature, which is made over raw bytes. Fix a bug which caused the compatibility code to not convert the compatibility buffer if the main buffer was invalid UTF-8. We expect both buffers to be valid UTF-8 or both invalid, since the only data that would differ between them would be hex object IDs, which are always valid UTF-8. Add a test for this case using 0xfe and 0xff, which are never valid in UTF-8. Reported-by: Kushal Das <kushal@sunet.se> Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
index-pack and unpack-objects both read pack data from stdin through a 4 KiB static buffer. In index-pack, each fill() flushes consumed bytes to the pack file via write_or_die(), capping every write(2) at 4 KiB. unpack-objects uses the same buffer pattern for reads. On FUSE-backed filesystems every write(2) is a synchronous round trip through the FUSE protocol (userspace -> kernel -> userspace -> back), so the 4 KiB buffer turns a clone into many unnecessary tiny writes with noticeable latency overhead. Increase the buffer from 4 KiB to 128 KiB. Introduce a shared DEFAULT_IO_BUFFER_SIZE constant in git-compat-util.h (next to MAX_IO_SIZE) and use it in index-pack, unpack-objects, and the hashfile layer in csum-file (which already used 128 KiB but hardcoded the value). Pack file writes to a FUSE filesystem with writeback caching disabled during HTTPS clones of git/git (~293 MB pack): 74,958 -> 4,687 (94% fewer) Wall-clock time of git clone over HTTPS onto a FUSE passthrough filesystem with writeback caching disabled, 3 runs per variant: vscode (~1.26 GB pack): 84.5s -> 75.7s avg (10% faster) git/git (~306 MB pack): 22.6s -> 20.0s avg (11% faster) Signed-off-by: Scott Bauersfeld <sbauersfeld@g.ucla.edu> Signed-off-by: Junio C Hamano <gitster@pobox.com>
The vendored nedmalloc allocator under compat/nedmalloc/ has been unmaintained upstream for a very long time: the original repository at https://github.com/ned14/nedmalloc received its last commit on July 5, 2014, and was archived (made read-only) by its owner on March 15, 2019. Our copy has been carried forward unchanged ever since. The Git for Windows commit that introduced mimalloc as a replacement on Windows ("mingw: use mimalloc", 2019-06-24, present in the Git for Windows branch thicket but not upstream) already observed at that time that nedmalloc had ceased to see any updates for several years. This came to a head when the Git for Windows SDK upgraded to GCC 16: the `add_segment()` function in `compat/nedmalloc/malloc.c.h` declares `int nfences = 0` and only references it inside an `assert()`, which GCC 16 now flags as `-Wunused-but-set-variable`. Combined with the `-Werror` enabled by `DEVELOPER=1`, this turns into a hard build failure: compat/nedmalloc/malloc.c.h: In function 'add_segment': compat/nedmalloc/malloc.c.h:3897:7: error: variable 'nfences' set but not used [-Werror=unused-but-set-variable=] 3897 | int nfences = 0; | ^~~~~~~ cc1.exe: all warnings being treated as errors The same source built without complaint under GCC 15.2.0; the regression was bisected to the SDK package update at git-for-windows/git-sdk-64@188d93dd455 (`mingw-w64-x86_64-gcc 15.2.0-14 -> 16.1.0-1`), with the failing CI run captured at https://github.com/git-for-windows/git-sdk-64/actions/runs/25244795074. Rather than patch the unmaintained vendored sources to silence the warning, stop opting into nedmalloc altogether on Windows. The platform allocator is what every non-MINGW build already uses, and a fresh build of git.git's master against a minimal Git for Windows SDK upgraded to GCC 16 completes successfully. The compat/nedmalloc/ subtree itself is removed by subsequent commits in this series. Assisted-by: Opus 4.7 Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
With the previous commit removing every opt-in, the build-system plumbing for nedmalloc has nothing left to switch on. Remove it so that the upcoming deletion of the compat/nedmalloc/ tree is a pure file removal. Assisted-by: Opus 4.7 Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
The previous two commits stopped opting into nedmalloc on Windows and stripped out the build-system plumbing that referenced it; the compat/nedmalloc/ subtree now has no callers and no consumers in the build, so retire it from the tree. Note that this patch is larger than can be sent via the mailing list, and was originally sent in three-pieces and merged back on the receiving end. Assisted-by: Opus 4.7 Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
When unpacking objects from a packfile, the object size is decoded from a variable-length encoding. On platforms where unsigned long is 32-bit (such as Windows, even in 64-bit builds), the shift operation overflows when decoding sizes larger than 4GB. The result is a truncated size value, causing the unpacked object to be corrupted or rejected. Fix this by changing the size variable to size_t, which is 64-bit on 64-bit platforms, and ensuring the shift arithmetic occurs in 64-bit space. Declare the per-byte continuation variable `c` as size_t as well, matching the canonical varint decoder unpack_object_header_buffer() in packfile.c. With c as size_t the expression (c & 0x7f) << shift is naturally size_t-typed, so the explicit cast that an earlier iteration carried at the use site is no longer needed. While at it, add the same overflow guard that unpack_object_header_buffer() carries: if the cumulative shift would exceed bitsizeof(size_t) - 7, refuse the input rather than invoking undefined behavior. Unlike unpack_object_header_buffer(), which labels this case "bad object header", report it as the platform limit it actually is: a header may be perfectly well-formed and still encode a size we cannot represent locally (notably on a 32-bit build consuming a packfile produced on a 64-bit host). This was originally authored by LordKiRon <https://github.com/LordKiRon>, who preferred not to reveal their real name and therefore agreed that I take over authorship. Helped-by: Torsten Bögershausen <tboegi@web.de> Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
On Windows, zlib's `uLong` type is 32-bit even on 64-bit systems. When processing data streams larger than 4GB, the `total_in` and `total_out` fields in zlib's `z_stream` structure wrap around, which caused the sanity checks in `zlib_post_call()` to trigger `BUG()` assertions. The git_zstream wrapper now tracks its own 64-bit totals rather than copying them from zlib. The sanity checks compare only the low bits, using `maximum_unsigned_value_of_type(uLong)` to mask appropriately for the platform's `uLong` size. This is based on work by LordKiRon in git-for-windows#6076. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
The odb_read_stream structure uses unsigned long for the size field,
which is 32-bit on Windows even in 64-bit builds. When streaming
objects larger than 4GB, the size would be truncated to zero or an
incorrect value, resulting in empty files being written to disk.
Change the size field in odb_read_stream to size_t and introduce
unpack_object_header_sz() to return sizes via size_t pointer. Since
object_info.sizep remains unsigned long for API compatibility, use
temporary variables where the types differ, with comments noting the
truncation limitation for code paths that still use unsigned long.
Widening the producers to size_t in this way introduces a handful of
silent size_t -> unsigned long narrowings on Windows, all in
builtin/pack-objects.c, where the consumers are still typed
unsigned long. Make those narrowings explicit with
cast_size_t_to_ulong() so they assert loudly the moment an object
actually exceeds ULONG_MAX bytes:
- oe_get_size_slow() returns unsigned long but holds a size_t
locally; cast at the return.
- write_reuse_object() passes a size_t into check_pack_inflate(),
whose expect parameter is unsigned long; cast at the call.
- check_object() routes a size_t through SET_SIZE() and
SET_DELTA_SIZE(), both of which take unsigned long via
oe_set_size() / oe_set_delta_size(); cast at the three call
sites in the OBJ_OFS_DELTA / OBJ_REF_DELTA branches and in the
non-delta default arm.
The cast-only treatment is deliberately a stop-gap. Properly
widening oe_set_size, oe_get_size_slow's return type,
check_pack_inflate's expect parameter, object_info.sizep,
patch_delta, and the OE_SIZE_BITS bit-fields cascades into a series
that is too large to be reviewable, so the proper widening is
deferred to a follow-up topic. Until then,
cast_size_t_to_ulong() at least makes the truncation explicit at
the source: it documents the boundary, and on a 64-bit non-Windows
platform it is a no-op.
This was originally authored by LordKiRon <https://github.com/LordKiRon>,
who preferred not to reveal their real name and therefore agreed that I
take over authorship.
Helped-by: Torsten Bögershausen <tboegi@web.de>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The delta header decoding functions return unsigned long, which truncates on Windows for objects larger than 4GB. Introduce size_t variants get_delta_hdr_size_sz() and get_size_from_delta_sz() that preserve the full 64-bit size, and use them in packed_object_info() where the size is needed for streaming decisions. This was originally authored by LordKiRon <https://github.com/LordKiRon>, who preferred not to reveal their real name and therefore agreed that I take over authorship. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
To test Git's behavior with very large pack files, we need a way to generate such files quickly. A naive approach using only readily-available Git commands would take over 10 hours for a 4GB pack file, which is prohibitive. Side-stepping Git's machinery and actual zlib compression by writing uncompressed content with the appropriate zlib header makes things much faster. The fastest method using this approach generates many small, unreachable blob objects and takes about 1.5 minutes for 4GB. However, this cannot be used because we need to test git clone, which requires a reachable commit history. Generating many reachable commits with small, uncompressed blobs takes about 4 minutes for 4GB. But this approach 1) does not reproduce the issues we want to fix (which require individual objects larger than 4GB) and 2) is comparatively slow because of the many SHA-1 calculations. The approach taken here generates a single large blob (filled with NUL bytes), along with the trees and commits needed to make it reachable. This takes about 2.5 minutes for 4.5GB, which is the fastest option that produces a valid, clonable repository with an object large enough to trigger the bugs we want to test. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
The shift overflow bug in index-pack and unpack-objects caused incorrect object size calculation when the encoded size required more than 32 bits of shift. This would result in corrupted or failed unpacking of objects larger than 4GB. Add a test that creates a pack file containing a 4GB+ blob using the new 'test-tool synthesize pack --reachable-large' command, then clones the repository to verify the fix works correctly. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Jeff King pointed out on the mailing list [1] that t5608's new >4GB test cases dominate the entire test suite runtime: 160 seconds on his laptop when the rest of the suite finishes in under 90 seconds, and 305-850 seconds across CI jobs. The bottleneck is that the synthesize helper hashes roughly 8 GB of data through SHA-1 (4 GB for the pack checksum plus 4 GB for the blob OID) for a 4 GB+1 blob. Since the helper generates known test data, collision detection is unnecessary. Switch from repo->hash_algo to unsafe_hash_algo(), which uses hardware-accelerated SHA-1 (via OpenSSL or Apple CommonCrypto) when available. Benchmarks on an x86_64 machine generating a 4 GB+1 pack (2 runs each, interleaved): SHA-1 backend Run 1 Run 2 SHA1DC (safe) 75s 80s OpenSSL (unsafe) 21s 19s The effect scales linearly. At 64 MB with 10 randomized interleaved runs, the OpenSSL unsafe backend shows a 5.4x improvement (median 0.202s vs 1.088s) with tight variance (stdev 0.028s vs 0.095s). The speedup is only realized when the build has a fast unsafe backend compiled in. The CI's linux-TEST-vars job already sets OPENSSL_SHA1_UNSAFE=YesPlease; macOS benefits from Apple CommonCrypto when configured. On builds without a separate unsafe backend (such as the default Windows builds), unsafe_hash_algo() returns the regular collision-detecting implementation and the change is a no-op. [1] https://lore.kernel.org/git/20260501063805.GA2038915@coredump.intra.peff.net/ Assisted-by: Claude Opus 4.6 Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
The synthesize helper hashes roughly 8 GiB of data through SHA-1 to produce a 4 GiB + 1 pack (4 GiB for the pack checksum, 4 GiB for the blob OID). Since the blob content is all NUL bytes, every byte in the resulting pack file is deterministic for a given blob size and hash algorithm. Add a fast path that writes the pack from precomputed constants: a 25-byte prefix (pack header, object header, zlib header, first block header), the zero-filled bulk with periodic 5-byte deflate block headers, and a 513-byte suffix (tree, two commits, empty tree, pack SHA-1 checksum). This eliminates all SHA-1 and adler32 computation, making the helper purely I/O-bound. The precomputed constants are stored in a struct fast_pack array keyed by hash algorithm format_id, so that adding SHA-256 support later requires only adding another array entry with its suffix. The constants were generated by running the generic path and extracting the non-zero bytes from the resulting pack file. Benchmarks generating a 4 GiB + 1 pack (3 runs each, SHA1DC on x86_64): generic path: 88s / 81s / 140s fast path: 14s / 13s / 15s On CI, where t5608 currently takes 200-850 seconds depending on the job, the fast path cuts the pack-generation phase from minutes to seconds, leaving only the clone operations themselves. Assisted-by: Claude Opus 4.6 Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add a SHA-256 entry to the fast_packs[] table. The pack prefix and deflate block structure are identical to SHA-1 (the pack format does not encode the hash algorithm in its header). Only the suffix differs: SHA-256 OIDs are 32 bytes instead of 20, giving a 609-byte suffix compared to 513 for SHA-1, and a different pack checksum. The constants were generated by running the generic path inside a repository initialized with --object-format=sha256. Assisted-by: Claude Opus 4.6 Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Even with precomputed pack constants that reduced the helper's runtime from minutes to seconds, the >4GB clone tests still take 200-850 seconds across CI jobs. The bottleneck is no longer the pack generation but the clone operations themselves: transporting, unpacking, and indexing 4 GiB of data through unpack-objects and index-pack is inherently expensive. As Jeff King pointed out [1], t5608 alone takes 160 seconds on his laptop while the rest of the entire test suite finishes in under 90 seconds, and the test's disk footprint (4+ GiB source repo, then two clones) is problematic for developers who use RAM disks for their trash directories. Gate the >4GB tests on the EXPENSIVE prereq (which requires GIT_TEST_LONG to be set) in addition to SIZE_T_IS_64BIT, keeping them out of normal local test runs. [1] https://lore.kernel.org/git/20260501063805.GA2038915@coredump.intra.peff.net/ Assisted-by: Claude Opus 4.6 Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Derrick Stolee suggested [1] that expensive tests should be run at a regular cadence rather than on every PR iteration. Gate GIT_TEST_LONG on push builds to the integration branches (next, master, main, maint) so that the EXPENSIVE prereq is satisfied there but not during PR validation, where the extra minutes of wall-clock time do not justify themselves. [1] https://lore.kernel.org/git/e1e8837f-7374-4079-ba87-ab95dd156e33@gmail.com/ Helped-by: Derrick Stolee <derrickstolee@github.com> Assisted-by: Claude Opus 4.6 Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Update code paths that assumed "unsigned long" was long enough for "size_t". * js/objects-larger-than-4gb-on-windows: ci: run expensive tests on push builds to integration branches t5608: mark >4GB tests as EXPENSIVE test-tool synthesize: add precomputed SHA-256 pack for 4 GiB + 1 test-tool synthesize: precompute pack for 4 GiB + 1 test-tool synthesize: use the unsafe hash for speed t5608: add regression test for >4GB object clone test-tool: add a helper to synthesize large packfiles delta, packfile: use size_t for delta header sizes odb, packfile: use size_t for streaming object sizes git-zlib: handle data streams larger than 4GB index-pack, unpack-objects: use size_t for object size
Stop using unmaintained custom allocator in Windows build which was the last user of the code. * js/mingw-no-nedmalloc: mingw: remove the vendored compat/nedmalloc/ subtree mingw: drop the build-system plumbing for nedmalloc mingw: stop using nedmalloc
The computation to shorten the filenames shown in diffstat measured width of individual UTF-8 characters to add up, but forgot to take into account error cases (e.g., an invalid UTF-8 sequence, or a control character). * en/diffstat-utf8-truncation-fix: diff: fix out-of-bounds reads and NULL deref in diffstat UTF-8 truncation
Some tests assume that bare repository accesses are by default allowed; rewrite some of them to avoid the assumption, rewrite others to explicitly set safe.bareRepository to allow them. * js/adjust-tests-to-explicitly-access-bare-repo: safe.bareRepository: default to "explicit" with WITH_BREAKING_CHANGES status tests: filter `.gitconfig` from status output ls-files tests: filter `.gitconfig` from `--others` output t5601: restore `.gitconfig` after includeIf test t1305: use `--git-dir=.` for bare repo in include cycle test t1300: remove global config settings injected by test-lib.sh t7900: do not let `$HOME/.gitconfig` interfere with XDG tests test-lib: allow bare repository access when breaking changes are enabled
Signing commit with custom encoding was passing the data to be signed at a wrong stage in the pipeline, which has been corrected. * bc/sign-commit-with-custom-encoding: commit: sign commit after mutating buffer commit: name UTF-8 function appropriately
Further update to the i18n alias support to avoid regressions. * jh/alias-i18n-fixes: alias: restore support for simple dotted aliases
"git history" learned "fixup" command. * ps/history-fixup: builtin/history: introduce "fixup" subcommand builtin/history: generalize function to commit trees replay: allow callers to control what happens with empty commits
Use a larger buffer size in the code paths to ingest pack stream. * sb/unpack-index-pack-buffer-resize: index-pack, unpack-objects: increase input buffer from 4 KiB to 128 KiB
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
See Commits and Changes for more details.
Created by
pull[bot] (v2.0.0-alpha.4)
Can you help keep this open source service alive? 💖 Please sponsor : )