Skip to content

[pull] master from git:master#208

Merged
pull[bot] merged 39 commits into
turkdevops:masterfrom
git:master
May 20, 2026
Merged

[pull] master from git:master#208
pull[bot] merged 39 commits into
turkdevops:masterfrom
git:master

Conversation

@pull
Copy link
Copy Markdown

@pull pull Bot commented May 20, 2026

See Commits and Changes for more details.


Created by pull[bot] (v2.0.0-alpha.4)

Can you help keep this open source service alive? 💖 Please sponsor : )

newren and others added 30 commits April 20, 2026 09:30
…tion

f85b49f (diff: improve scaling of filenames in diffstat to handle
UTF-8 chars, 2026-01-16) introduced a loop in show_stats() that calls
utf8_width() repeatedly to skip leading characters until the displayed
width fits.  However, utf8_width() can return problematic values:

  - For invalid UTF-8 sequences, pick_one_utf8_char() sets the name
    pointer to NULL and utf8_width() returns 0.  Since name_len does
    not change, the loop iterates once more and pick_one_utf8_char()
    dereferences the NULL pointer, crashing.

  - For control characters, utf8_width() returns -1, so name_len
    grows when it is expected to shrink.  This can cause the loop to
    consume more characters than the string contains, reading past
    the trailing NUL.

By default, fill_print_name() will C-quote filenames which escapes
control characters and invalid bytes to printable text.  That avoids
this bug from being triggered; however, with core.quotePath=false,
most characters are no longer escaped (though some control characters
still are) and raw bytes can reach this code.

Add tests exercising both failure modes with core.quotePath=false and
a narrow --stat-name-width to force truncation: one with a bare 0xC0
byte (invalid UTF-8 lead byte, triggers NULL deref) and one with
several C1 control characters (repeats of 0xC2 0x9F, causing
the loop to read past the end of the string).  The second test
reliably catches the out-of-bounds read when run under ASan, though
it may pass silently without sanitizers.

Fix both issues by introducing utf8_ish_width(), a thin wrapper
around utf8_width() that guarantees the pointer always advances and
the returned width is never negative:

  - On invalid UTF-8 it restores the pointer, advances by one byte,
    and returns width 1 (matching the strlen()-based fallback used
    by utf8_strwidth()).
  - On a control character it returns 0 (matching utf8_strnwidth()
    which skips them).

Also add a "&& *name" guard to the while-loop condition so it
terminates at end-of-string even when utf8_strwidth()'s strlen()
fallback causes name_len to exceed the sum of per-character widths.

Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Historically, config entries like alias.foo.bar expanded the alias
"foo.bar". The subsection-based alias syntax introduced in
ac1f12a (alias: support non-alphanumeric names via subsection
syntax, 2026-02-18) broke that behavior by treating such entries as
if they were subsection syntax.

Restore support for the old dotted form by falling back to the full
name when the final key is not "command". Add tests covering execution
and help output for simple dotted aliases.

Reported-by: Michael Grossfeld <michael.grossfeld@amd.com>
Helped-by: Jeff King <peff@peff.net>
Signed-off-by: Jonatan Holmgren <jonatan@jontes.page>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
A future patch will change the `safe.bareRepository` default from
`all` to `explicit` under `WITH_BREAKING_CHANGES`. At that point,
every test that operates on a bare repository through implicit
discovery would fail, regardless of whether the test is actually
about discovery or about how a specific command behaves once inside
a bare repository.

The maintainer suggested [1] setting `safe.bareRepository=all` in
the test environment's global config whenever `WITH_BREAKING_CHANGES`
is in effect, rather than adjusting each affected test to access
bare repositories explicitly (via `--git-dir`, `GIT_DIR`, or
similar). This means the test suite continues to exercise only the
historical default behavior even after the user-facing default
changes, relying on a small number of dedicated tests in t0035 to
validate the new, stricter default.

Since `$HOME` points at the trash directory (which doubles as the
test repository's working tree), writing to `$HOME/.gitconfig` also
creates a file inside the working tree. Exclude it via
`.git/info/exclude` to limit the fallout, though this does not
help tests that use `git ls-files --others` without
`--exclude-standard` or `git status --ignored`; those are addressed
by subsequent commits.

The `.git/info/exclude` write is guarded by `test -d .git/info`
rather than using `mkdir -p`, because some tests (e.g. t0008)
expect to create `.git/info/` themselves and would fail with
Patrick Steinhardt's `set -e` preparation (ps/test-set-e-clean) if
the directory already existed. For tests using `TEST_NO_CREATE_REPO`
(where no `.git/` exists at all), the guard also handles that case.

[1] https://lore.kernel.org/git/xmqqse98cc51.fsf@gitster.g/

Original-patch-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The XDG config tests for `git maintenance register/unregister`
create a fresh `$XDG_CONFIG_HOME/git/config` and expect git to use
that location. However, if `$HOME/.gitconfig` exists (which may
happen when test-lib.sh writes global config, e.g. to set
`safe.bareRepository`), git prefers `$HOME/.gitconfig` over the XDG
location, and the `maintenance.repo` entry ends up in the wrong
file.

This is an inherent consequence of setting global config in
test-lib.sh rather than adjusting individual tests: writing any
entry to `$HOME/.gitconfig` has side effects beyond the intended
setting, because the mere existence of that file changes which
global config location git prefers for all subsequent writes.
Individual per-test adjustments would not have this interaction.

Fix this by overriding `HOME` to a non-existent directory inside the
subshells that test XDG behavior. Since these subshells already
override `XDG_CONFIG_HOME`, they do not need `$HOME/.gitconfig` at
all, and the subshell scoping ensures the original `HOME` is
restored automatically.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Since test-lib.sh now writes `safe.bareRepository=all` to the global
config when `WITH_BREAKING_CHANGES` is in effect, that entry shows
up in `git config --list` output. Tests in t1300 that expect exact
config contents then fail because of this unexpected extra line.

Unlike the working-tree contamination fixed in the preceding
commits, this is not about the file's existence but about its
content leaking into test expectations. Since t1300 does not use
bare repositories, simply remove the injected setting in a
preparatory step.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Assisted-by: Claude Opus 4.6
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Earlier tests in t1305 overwrite `$HOME/.gitconfig` with their own
content as part of testing config includes. This clobbers the
`safe.bareRepository=all` entry that test-lib.sh writes when
`WITH_BREAKING_CHANGES` is in effect, causing `git -C cycle config`
to fail with "not in a git directory" when it tries to access the
bare repository created by `git init --bare cycle`.

Use `--git-dir=.` to access the bare repo explicitly, avoiding the
dependency on global config for repository discovery.

Assisted-by: Claude Opus 4.6
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
One test in t5601 overwrites `$HOME/.gitconfig` with an `includeIf`
configuration snippet and removes the file in its cleanup. This
destroys the `safe.bareRepository=all` entry that test-lib.sh
writes when `WITH_BREAKING_CHANGES` is in effect, causing later
tests that use `git -C <bare-repo> config` to fail with "not in a
git directory".

Back up `.gitconfig` before overwriting and restore it in the
cleanup, so the global config survives into subsequent tests.

Assisted-by: Claude Opus 4.6
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The global `safe.bareRepository=all` setting in test-lib.sh is
written to `$HOME/.gitconfig`, which unfortunately lives inside the
test repository's working tree. The `.git/info/exclude` entry added
alongside it handles most commands, but `git ls-files --others`
without `--exclude-standard` does not consult `info/exclude` at
all, so the file appears in the output.

Ideally, each test that accesses a bare repository would simply
specify `--git-dir` or `GIT_DIR` explicitly, which would require no
global config and produce no side effects in the working tree. As
that approach was not taken, filter `.gitconfig` from the output
before comparing against expected results. In t7104, the test
already uses `--exclude-standard`, so it suffices to switch from
the bare `git ls-files -o` to `git ls-files -o --exclude-standard`
which respects the `info/exclude` entry; the other tests
deliberately omit `--exclude-standard` because their purpose is to
verify unfiltered `--others` output.

Assisted-by: Claude Opus 4.6
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Since test-lib.sh creates `$HOME/.gitconfig` when
`WITH_BREAKING_CHANGES` is in effect, the file appears in `git
status` output as either untracked (`?? .gitconfig`) or ignored
(`!! .gitconfig` / `! .gitconfig`, depending on porcelain version),
because the `.git/info/exclude` entry causes git to treat it as an
ignored file rather than hiding it entirely.

In t7061 and t7521, which are pervasively affected, introduce a
`filter_gitconfig` helper that strips all status-prefix variants of
`.gitconfig` from the output before comparison. In the remaining
scripts (t7060, t7064, t7508), apply targeted adjustments.

Assisted-by: Claude Opus 4.6
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When an attacker can convince a user to clone a crafted repository
that contains an embedded bare repository with malicious hooks, any Git
command the user runs after entering that subdirectory will discover
the bare repository and execute the hooks. The user does not even need
to run a Git command explicitly: many shell prompts run `git status`
in the background to display branch and dirty state information, and
`git status` in turn may invoke the fsmonitor hook if so configured,
making the user vulnerable the moment they `cd` into the directory. The
`safe.bareRepository` configuration variable (introduced in 8959555
(setup_git_directory(): add an owner check for the top-level directory,
2022-03-02)) already provides protection against this attack vector by
allowing users to set it to "explicit", but the default remained "all"
for backwards compatibility.

Since Git 3.0 is the natural point to change defaults to safer
values, flip the default from "all" to "explicit" when built with
`WITH_BREAKING_CHANGES`. This means Git will refuse to work with bare
repositories that are discovered implicitly by walking up the directory
tree. Bare repositories specified via `--git-dir` or `GIT_DIR` continue
to work, and directories that look like `.git`, worktrees, or submodule
directories are unaffected (the existing `is_implicit_bare_repo()`
whitelist handles those cases).

Users who rely on implicit bare repository discovery can restore the
previous behavior by setting `safe.bareRepository=all` in their global
or system configuration.

The test for the "safe.bareRepository in the repository" scenario
needed a more involved fix: it writes a `safe.bareRepository=all`
entry into the bare repository's own config to verify that repo-local
config does not override the protected (global) setting. Previously,
`test_config -C` was used to write that entry, but its cleanup runs `git
-C <bare-repo> config --unset`, which itself fails when the default is
"explicit" and the global config has already been cleaned up. Switching
to direct git config --file access avoids going through repository
discovery entirely.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When replaying commits it may happen that some of the commits become
empty relative to their parent. Such commits are for now automatically
dropped by the replay subsystem without much control from the user.

Introduce a new enum that allows the caller to drop, keep or abort in
this case.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The function `commit_tree_with_edited_message_ext()` can be used to
commit a tree with a specific list of parents with an edited commit
message. This function is useful outside of editing the commit message
though, as it also performs the plumbing to extract the original commit
message and strip some headers from it.

Refactor the function to receive a flags field that allows the caller to
control whether or not the commit message should be edited, or whether
it should be retained as-is. This will be used in a subsequent commit.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The newly introduced git-history(1) command provides functionality to
easily edit commit history while also rebasing dependent branches. The
functionality exposed by this command is still somewhat limited though.

One common use case when editing commit history that is not yet covered
is fixing up a specific commit. Introduce a new subcommand that allows
the user to do exactly that by performing a three-way merge into the
target's commit tree, using HEAD's tree as the merge base. The flow is
thus essentially:

    $ echo changes >file
    $ git add file
    $ git history fixup HEAD~

Like with the other commands, this will automatically rebase dependent
branches, as well. Unlike the other commands though:

  - The command does not work in a bare repository as it interacts with
    the index.

  - The command may run into merge conflicts. If so, the command will
    simply abort.

Especially the second item limits the usefulness of this command a bit.
But there are plans to introduce first-class conflicts into Git, which
will help use cases like this one.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We have a function named verify_utf8, but it does more than verify, it
modifies the buffer if it is not UTF-8.  This is different from what
most people would expect, so call the function ensure_utf8, since it
mutates the buffer in some cases.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The ensure_utf8 function can mutate the buffer to change its encoding,
so we must call it before signing the buffer so that we do not
invalidate the signature, which is made over raw bytes.  Fix a bug which
caused the compatibility code to not convert the compatibility buffer if
the main buffer was invalid UTF-8.  We expect both buffers to be valid
UTF-8 or both invalid, since the only data that would differ between
them would be hex object IDs, which are always valid UTF-8.

Add a test for this case using 0xfe and 0xff, which are never valid in
UTF-8.

Reported-by: Kushal Das <kushal@sunet.se>
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
index-pack and unpack-objects both read pack data from stdin through
a 4 KiB static buffer. In index-pack, each fill() flushes consumed
bytes to the pack file via write_or_die(), capping every write(2)
at 4 KiB. unpack-objects uses the same buffer pattern for reads.

On FUSE-backed filesystems every write(2) is a synchronous round
trip through the FUSE protocol (userspace -> kernel -> userspace ->
back), so the 4 KiB buffer turns a clone into many unnecessary tiny
writes with noticeable latency overhead.

Increase the buffer from 4 KiB to 128 KiB. Introduce a shared
DEFAULT_IO_BUFFER_SIZE constant in git-compat-util.h (next to
MAX_IO_SIZE) and use it in index-pack, unpack-objects, and the
hashfile layer in csum-file (which already used 128 KiB but
hardcoded the value).

Pack file writes to a FUSE filesystem with writeback caching
disabled during HTTPS clones of git/git (~293 MB pack):

  74,958 -> 4,687 (94% fewer)

Wall-clock time of git clone over HTTPS onto a FUSE passthrough
filesystem with writeback caching disabled, 3 runs per variant:

  vscode (~1.26 GB pack): 84.5s -> 75.7s avg (10% faster)
  git/git (~306 MB pack):  22.6s -> 20.0s avg (11% faster)

Signed-off-by: Scott Bauersfeld <sbauersfeld@g.ucla.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The vendored nedmalloc allocator under compat/nedmalloc/ has been
unmaintained upstream for a very long time: the original repository at
https://github.com/ned14/nedmalloc received its last commit on July 5,
2014, and was archived (made read-only) by its owner on March 15, 2019.
Our copy has been carried forward unchanged ever since.

The Git for Windows commit that introduced mimalloc as a replacement
on Windows ("mingw: use mimalloc", 2019-06-24, present in the Git for
Windows branch thicket but not upstream) already observed at that time
that nedmalloc had ceased to see any updates for several years.

This came to a head when the Git for Windows SDK upgraded to GCC 16:
the `add_segment()` function in `compat/nedmalloc/malloc.c.h` declares
`int nfences = 0` and only references it inside an `assert()`, which
GCC 16 now flags as `-Wunused-but-set-variable`. Combined with the
`-Werror` enabled by `DEVELOPER=1`, this turns into a hard build
failure:

	compat/nedmalloc/malloc.c.h: In function 'add_segment':
	compat/nedmalloc/malloc.c.h:3897:7: error: variable 'nfences' set but not used [-Werror=unused-but-set-variable=]
	 3897 |   int nfences = 0;
	      |       ^~~~~~~
	cc1.exe: all warnings being treated as errors

The same source built without complaint under GCC 15.2.0; the
regression was bisected to the SDK package update at
git-for-windows/git-sdk-64@188d93dd455
(`mingw-w64-x86_64-gcc 15.2.0-14 -> 16.1.0-1`), with the failing CI
run captured at
https://github.com/git-for-windows/git-sdk-64/actions/runs/25244795074.

Rather than patch the unmaintained vendored sources to silence the
warning, stop opting into nedmalloc altogether on Windows. The
platform allocator is what every non-MINGW build already uses, and a
fresh build of git.git's master against a minimal Git for Windows SDK
upgraded to GCC 16 completes successfully.

The compat/nedmalloc/ subtree itself is removed by subsequent commits
in this series.

Assisted-by: Opus 4.7
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
With the previous commit removing every opt-in, the build-system
plumbing for nedmalloc has nothing left to switch on. Remove it so
that the upcoming deletion of the compat/nedmalloc/ tree is a pure
file removal.

Assisted-by: Opus 4.7
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The previous two commits stopped opting into nedmalloc on Windows
and stripped out the build-system plumbing that referenced it; the
compat/nedmalloc/ subtree now has no callers and no consumers in
the build, so retire it from the tree.

Note that this patch is larger than can be sent via the mailing
list, and was originally sent in three-pieces and merged back on the
receiving end.

Assisted-by: Opus 4.7
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When unpacking objects from a packfile, the object size is decoded
from a variable-length encoding. On platforms where unsigned long is
32-bit (such as Windows, even in 64-bit builds), the shift operation
overflows when decoding sizes larger than 4GB. The result is a
truncated size value, causing the unpacked object to be corrupted or
rejected.

Fix this by changing the size variable to size_t, which is 64-bit on
64-bit platforms, and ensuring the shift arithmetic occurs in 64-bit
space.

Declare the per-byte continuation variable `c` as size_t as well,
matching the canonical varint decoder unpack_object_header_buffer()
in packfile.c. With c as size_t the expression (c & 0x7f) << shift
is naturally size_t-typed, so the explicit cast that an earlier
iteration carried at the use site is no longer needed.

While at it, add the same overflow guard that
unpack_object_header_buffer() carries: if the cumulative shift would
exceed bitsizeof(size_t) - 7, refuse the input rather than invoking
undefined behavior. Unlike unpack_object_header_buffer(), which
labels this case "bad object header", report it as the platform
limit it actually is: a header may be perfectly well-formed and
still encode a size we cannot represent locally (notably on a
32-bit build consuming a packfile produced on a 64-bit host).

This was originally authored by LordKiRon <https://github.com/LordKiRon>,
who preferred not to reveal their real name and therefore agreed that I
take over authorship.

Helped-by: Torsten Bögershausen <tboegi@web.de>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
On Windows, zlib's `uLong` type is 32-bit even on 64-bit systems. When
processing data streams larger than 4GB, the `total_in` and `total_out`
fields in zlib's `z_stream` structure wrap around, which caused the
sanity checks in `zlib_post_call()` to trigger `BUG()` assertions.

The git_zstream wrapper now tracks its own 64-bit totals rather than
copying them from zlib. The sanity checks compare only the low bits,
using `maximum_unsigned_value_of_type(uLong)` to mask appropriately for
the platform's `uLong` size.

This is based on work by LordKiRon in git-for-windows#6076.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The odb_read_stream structure uses unsigned long for the size field,
which is 32-bit on Windows even in 64-bit builds. When streaming
objects larger than 4GB, the size would be truncated to zero or an
incorrect value, resulting in empty files being written to disk.

Change the size field in odb_read_stream to size_t and introduce
unpack_object_header_sz() to return sizes via size_t pointer. Since
object_info.sizep remains unsigned long for API compatibility, use
temporary variables where the types differ, with comments noting the
truncation limitation for code paths that still use unsigned long.

Widening the producers to size_t in this way introduces a handful of
silent size_t -> unsigned long narrowings on Windows, all in
builtin/pack-objects.c, where the consumers are still typed
unsigned long. Make those narrowings explicit with
cast_size_t_to_ulong() so they assert loudly the moment an object
actually exceeds ULONG_MAX bytes:

  - oe_get_size_slow() returns unsigned long but holds a size_t
    locally; cast at the return.
  - write_reuse_object() passes a size_t into check_pack_inflate(),
    whose expect parameter is unsigned long; cast at the call.
  - check_object() routes a size_t through SET_SIZE() and
    SET_DELTA_SIZE(), both of which take unsigned long via
    oe_set_size() / oe_set_delta_size(); cast at the three call
    sites in the OBJ_OFS_DELTA / OBJ_REF_DELTA branches and in the
    non-delta default arm.

The cast-only treatment is deliberately a stop-gap. Properly
widening oe_set_size, oe_get_size_slow's return type,
check_pack_inflate's expect parameter, object_info.sizep,
patch_delta, and the OE_SIZE_BITS bit-fields cascades into a series
that is too large to be reviewable, so the proper widening is
deferred to a follow-up topic. Until then,
cast_size_t_to_ulong() at least makes the truncation explicit at
the source: it documents the boundary, and on a 64-bit non-Windows
platform it is a no-op.

This was originally authored by LordKiRon <https://github.com/LordKiRon>,
who preferred not to reveal their real name and therefore agreed that I
take over authorship.

Helped-by: Torsten Bögershausen <tboegi@web.de>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The delta header decoding functions return unsigned long, which
truncates on Windows for objects larger than 4GB. Introduce size_t
variants get_delta_hdr_size_sz() and get_size_from_delta_sz() that
preserve the full 64-bit size, and use them in packed_object_info()
where the size is needed for streaming decisions.

This was originally authored by LordKiRon <https://github.com/LordKiRon>,
who preferred not to reveal their real name and therefore agreed that I
take over authorship.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
To test Git's behavior with very large pack files, we need a way to
generate such files quickly.

A naive approach using only readily-available Git commands would take
over 10 hours for a 4GB pack file, which is prohibitive.

Side-stepping Git's machinery and actual zlib compression by writing
uncompressed content with the appropriate zlib header makes things
much faster. The fastest method using this approach generates many
small, unreachable blob objects and takes about 1.5 minutes for 4GB.
However, this cannot be used because we need to test git clone, which
requires a reachable commit history.

Generating many reachable commits with small, uncompressed blobs takes
about 4 minutes for 4GB. But this approach 1) does not reproduce the
issues we want to fix (which require individual objects larger than
4GB) and 2) is comparatively slow because of the many SHA-1
calculations.

The approach taken here generates a single large blob (filled with NUL
bytes), along with the trees and commits needed to make it reachable.
This takes about 2.5 minutes for 4.5GB, which is the fastest option
that produces a valid, clonable repository with an object large enough
to trigger the bugs we want to test.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The shift overflow bug in index-pack and unpack-objects caused incorrect
object size calculation when the encoded size required more than 32 bits
of shift. This would result in corrupted or failed unpacking of objects
larger than 4GB.

Add a test that creates a pack file containing a 4GB+ blob using the
new 'test-tool synthesize pack --reachable-large' command, then clones
the repository to verify the fix works correctly.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Jeff King pointed out on the mailing list [1] that t5608's new >4GB
test cases dominate the entire test suite runtime: 160 seconds on his
laptop when the rest of the suite finishes in under 90 seconds, and
305-850 seconds across CI jobs. The bottleneck is that the synthesize
helper hashes roughly 8 GB of data through SHA-1 (4 GB for the pack
checksum plus 4 GB for the blob OID) for a 4 GB+1 blob.

Since the helper generates known test data, collision detection is
unnecessary. Switch from repo->hash_algo to unsafe_hash_algo(), which
uses hardware-accelerated SHA-1 (via OpenSSL or Apple CommonCrypto)
when available.

Benchmarks on an x86_64 machine generating a 4 GB+1 pack (2 runs
each, interleaved):

  SHA-1 backend      Run 1    Run 2
  SHA1DC (safe)       75s      80s
  OpenSSL (unsafe)    21s      19s

The effect scales linearly. At 64 MB with 10 randomized interleaved
runs, the OpenSSL unsafe backend shows a 5.4x improvement (median
0.202s vs 1.088s) with tight variance (stdev 0.028s vs 0.095s).

The speedup is only realized when the build has a fast unsafe backend
compiled in. The CI's linux-TEST-vars job already sets
OPENSSL_SHA1_UNSAFE=YesPlease; macOS benefits from Apple CommonCrypto
when configured. On builds without a separate unsafe backend (such as
the default Windows builds), unsafe_hash_algo() returns the regular
collision-detecting implementation and the change is a no-op.

[1] https://lore.kernel.org/git/20260501063805.GA2038915@coredump.intra.peff.net/

Assisted-by: Claude Opus 4.6
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The synthesize helper hashes roughly 8 GiB of data through SHA-1 to
produce a 4 GiB + 1 pack (4 GiB for the pack checksum, 4 GiB for
the blob OID). Since the blob content is all NUL bytes, every byte
in the resulting pack file is deterministic for a given blob size and
hash algorithm.

Add a fast path that writes the pack from precomputed constants:
a 25-byte prefix (pack header, object header, zlib header, first
block header), the zero-filled bulk with periodic 5-byte deflate
block headers, and a 513-byte suffix (tree, two commits, empty tree,
pack SHA-1 checksum). This eliminates all SHA-1 and adler32
computation, making the helper purely I/O-bound.

The precomputed constants are stored in a struct fast_pack array
keyed by hash algorithm format_id, so that adding SHA-256 support
later requires only adding another array entry with its suffix.

The constants were generated by running the generic path and
extracting the non-zero bytes from the resulting pack file.

Benchmarks generating a 4 GiB + 1 pack (3 runs each, SHA1DC on
x86_64):

  generic path:   88s / 81s / 140s
  fast path:      14s / 13s / 15s

On CI, where t5608 currently takes 200-850 seconds depending on the
job, the fast path cuts the pack-generation phase from minutes to
seconds, leaving only the clone operations themselves.

Assisted-by: Claude Opus 4.6
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add a SHA-256 entry to the fast_packs[] table. The pack prefix and
deflate block structure are identical to SHA-1 (the pack format does
not encode the hash algorithm in its header). Only the suffix differs:
SHA-256 OIDs are 32 bytes instead of 20, giving a 609-byte suffix
compared to 513 for SHA-1, and a different pack checksum.

The constants were generated by running the generic path inside a
repository initialized with --object-format=sha256.

Assisted-by: Claude Opus 4.6
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Even with precomputed pack constants that reduced the helper's
runtime from minutes to seconds, the >4GB clone tests still take
200-850 seconds across CI jobs. The bottleneck is no longer the
pack generation but the clone operations themselves: transporting,
unpacking, and indexing 4 GiB of data through unpack-objects and
index-pack is inherently expensive.

As Jeff King pointed out [1], t5608 alone takes 160 seconds on his
laptop while the rest of the entire test suite finishes in under 90
seconds, and the test's disk footprint (4+ GiB source repo, then
two clones) is problematic for developers who use RAM disks for
their trash directories.

Gate the >4GB tests on the EXPENSIVE prereq (which requires
GIT_TEST_LONG to be set) in addition to SIZE_T_IS_64BIT, keeping
them out of normal local test runs.

[1] https://lore.kernel.org/git/20260501063805.GA2038915@coredump.intra.peff.net/

Assisted-by: Claude Opus 4.6
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Derrick Stolee suggested [1] that expensive tests should be run at a
regular cadence rather than on every PR iteration. Gate GIT_TEST_LONG
on push builds to the integration branches (next, master, main, maint)
so that the EXPENSIVE prereq is satisfied there but not during PR
validation, where the extra minutes of wall-clock time do not justify
themselves.

[1] https://lore.kernel.org/git/e1e8837f-7374-4079-ba87-ab95dd156e33@gmail.com/

Helped-by: Derrick Stolee <derrickstolee@github.com>
Assisted-by: Claude Opus 4.6
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
gitster added 9 commits May 20, 2026 10:30
Update code paths that assumed "unsigned long" was long enough for
"size_t".

* js/objects-larger-than-4gb-on-windows:
  ci: run expensive tests on push builds to integration branches
  t5608: mark >4GB tests as EXPENSIVE
  test-tool synthesize: add precomputed SHA-256 pack for 4 GiB + 1
  test-tool synthesize: precompute pack for 4 GiB + 1
  test-tool synthesize: use the unsafe hash for speed
  t5608: add regression test for >4GB object clone
  test-tool: add a helper to synthesize large packfiles
  delta, packfile: use size_t for delta header sizes
  odb, packfile: use size_t for streaming object sizes
  git-zlib: handle data streams larger than 4GB
  index-pack, unpack-objects: use size_t for object size
Stop using unmaintained custom allocator in Windows build which was
the last user of the code.

* js/mingw-no-nedmalloc:
  mingw: remove the vendored compat/nedmalloc/ subtree
  mingw: drop the build-system plumbing for nedmalloc
  mingw: stop using nedmalloc
The computation to shorten the filenames shown in diffstat measured
width of individual UTF-8 characters to add up, but forgot to take
into account error cases (e.g., an invalid UTF-8 sequence, or a
control character).

* en/diffstat-utf8-truncation-fix:
  diff: fix out-of-bounds reads and NULL deref in diffstat UTF-8 truncation
Some tests assume that bare repository accesses are by default
allowed; rewrite some of them to avoid the assumption, rewrite
others to explicitly set safe.bareRepository to allow them.

* js/adjust-tests-to-explicitly-access-bare-repo:
  safe.bareRepository: default to "explicit" with WITH_BREAKING_CHANGES
  status tests: filter `.gitconfig` from status output
  ls-files tests: filter `.gitconfig` from `--others` output
  t5601: restore `.gitconfig` after includeIf test
  t1305: use `--git-dir=.` for bare repo in include cycle test
  t1300: remove global config settings injected by test-lib.sh
  t7900: do not let `$HOME/.gitconfig` interfere with XDG tests
  test-lib: allow bare repository access when breaking changes are enabled
Signing commit with custom encoding was passing the data to be
signed at a wrong stage in the pipeline, which has been corrected.

* bc/sign-commit-with-custom-encoding:
  commit: sign commit after mutating buffer
  commit: name UTF-8 function appropriately
Further update to the i18n alias support to avoid regressions.

* jh/alias-i18n-fixes:
  alias: restore support for simple dotted aliases
"git history" learned "fixup" command.

* ps/history-fixup:
  builtin/history: introduce "fixup" subcommand
  builtin/history: generalize function to commit trees
  replay: allow callers to control what happens with empty commits
Use a larger buffer size in the code paths to ingest pack stream.

* sb/unpack-index-pack-buffer-resize:
  index-pack, unpack-objects: increase input buffer from 4 KiB to 128 KiB
Signed-off-by: Junio C Hamano <gitster@pobox.com>
@pull pull Bot locked and limited conversation to collaborators May 20, 2026
@pull pull Bot added the ⤵️ pull label May 20, 2026
@pull pull Bot merged commit 1c00d2d into turkdevops:master May 20, 2026
4 of 6 checks passed
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants