Skip to content

CodeQL-inspired fixes #1891

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 11 commits into
base: master
Choose a base branch
from
2 changes: 1 addition & 1 deletion builtin/commit.c
Original file line number Diff line number Diff line change
Expand Up @@ -1022,7 +1022,7 @@ static int prepare_to_commit(const char *index_file, const char *prefix,
for (i = 0; i < the_repository->index->cache_nr; i++)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Jeff King wrote (reply to this):

On Thu, May 15, 2025 at 01:11:39PM +0000, Johannes Schindelin via GitGitGadget wrote:

> diff --git a/builtin/commit.c b/builtin/commit.c
> index 66bd91fd523d..fba0dded64a7 100644
> --- a/builtin/commit.c
> +++ b/builtin/commit.c
> @@ -1022,7 +1022,7 @@ static int prepare_to_commit(const char *index_file, const char *prefix,
>  			for (i = 0; i < the_repository->index->cache_nr; i++)
>  				if (ce_intent_to_add(the_repository->index->cache[i]))
>  					ita_nr++;
> -			committable = the_repository->index->cache_nr - ita_nr > 0;
> +			committable = the_repository->index->cache_nr > ita_nr;

I guess it is not possible for ita_nr to be greater than cache_nr, since
we are counting up entries in the loop above. If ita_nr were greater,
the original would wrap around and set committable to true, but yours
would not.

So really, I think the original was equivalent to:

  committable = cache_nr != ita_nr;

but I think ">" probably expresses the intent better (we want to know if
there are any non-ita entries). Though in that case I'd think:

  committable = 0;
  for (i = 0; i < cache_nr; i++) {
	if (!ce_intent_to_add(...) {
		committable = 1;
		break;
	}
  }

would be the most clear, since we do not otherwise care about the actual
number of ita entries. And lets us break out of the loop early.

I dunno if it is worth refactoring further, though. Your patch does the
correct thing and fixes the codeql complaint (which I do think is a
false positive, because ita_nr must be less than cache_nr).

-Peff

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Junio C Hamano wrote (reply to this):

Jeff King <peff@peff.net> writes:

> ... there are any non-ita entries). Though in that case I'd think:
>
>   committable = 0;
>   for (i = 0; i < cache_nr; i++) {
> 	if (!ce_intent_to_add(...) {
> 		committable = 1;
> 		break;
> 	}
>   }
>
> would be the most clear, since we do not otherwise care about the actual
> number of ita entries. And lets us break out of the loop early.

Exactly.  If you focus on the warning too narrowly, the minimal
change in the original patch does look OK, but in the original (even
before Dscho's patch, that is) the intent is unclear, as opposed to
what you showed above.  And the update to squelch false positive
does not improve the clarity of the logic as the above rewrite does.

Thanks.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Jeff King wrote (reply to this):

On Thu, May 15, 2025 at 01:37:00PM -0700, Junio C Hamano wrote:

> Jeff King <peff@peff.net> writes:
> 
> > ... there are any non-ita entries). Though in that case I'd think:
> >
> >   committable = 0;
> >   for (i = 0; i < cache_nr; i++) {
> > 	if (!ce_intent_to_add(...) {
> > 		committable = 1;
> > 		break;
> > 	}
> >   }
> >
> > would be the most clear, since we do not otherwise care about the actual
> > number of ita entries. And lets us break out of the loop early.
> 
> Exactly.  If you focus on the warning too narrowly, the minimal
> change in the original patch does look OK, but in the original (even
> before Dscho's patch, that is) the intent is unclear, as opposed to
> what you showed above.  And the update to squelch false positive
> does not improve the clarity of the logic as the above rewrite does.

OK. If we do want to refactor, I think pulling it into a separate
function is the most descriptive, like:

diff --git a/builtin/commit.c b/builtin/commit.c
index 66bd91fd52..a8d43d223d 100644
--- a/builtin/commit.c
+++ b/builtin/commit.c
@@ -740,6 +740,15 @@ static void change_data_free(void *util, const char *str UNUSED)
 	free(d);
 }
 
+static int has_non_ita_entries(struct index_state *index)
+{
+	int i;
+	for (i = 0; i < index->cache_nr; i++)
+		if (!ce_intent_to_add(index->cache[i]))
+			return 1;
+	return 0;
+}
+
 static int prepare_to_commit(const char *index_file, const char *prefix,
 			     struct commit *current_head,
 			     struct wt_status *s,
@@ -1015,14 +1024,10 @@ static int prepare_to_commit(const char *index_file, const char *prefix,
 			parent = "HEAD^1";
 
 		if (repo_get_oid(the_repository, parent, &oid)) {
-			int i, ita_nr = 0;
-
 			/* TODO: audit for interaction with sparse-index. */
 			ensure_full_index(the_repository->index);
-			for (i = 0; i < the_repository->index->cache_nr; i++)
-				if (ce_intent_to_add(the_repository->index->cache[i]))
-					ita_nr++;
-			committable = the_repository->index->cache_nr - ita_nr > 0;
+			committable =
+				has_non_ita_entries(the_repository->index);
 		} else {
 			/*
 			 * Unless the user did explicitly request a submodule

-Peff

if (ce_intent_to_add(the_repository->index->cache[i]))
ita_nr++;
committable = the_repository->index->cache_nr - ita_nr > 0;
committable = the_repository->index->cache_nr > ita_nr;
} else {
/*
* Unless the user did explicitly request a submodule
Expand Down
3 changes: 2 additions & 1 deletion builtin/fetch.c
Original file line number Diff line number Diff line change
Expand Up @@ -1728,7 +1728,7 @@ static int do_fetch(struct transport *transport,
if (transport->remote->follow_remote_head != FOLLOW_REMOTE_NEVER)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Jeff King wrote (reply to this):

On Thu, May 15, 2025 at 01:11:44PM +0000, Johannes Schindelin via GitGitGadget wrote:

> As pointed out by CodeQL, `branch_get()` may return `NULL`, in which
> case `branch_has_merge_config()` would return early, but we can even
> avoid enumerating the refs prefixes in that case, saving even more CPU
> cycles.

I am not sure how this patch changes anything with respect to CPU. If
branch is NULL, then branch_has_merge_config(branch) will always return
false.

I think this is just an issue that CodeQL is not looking inside
branch_has_merge_config(), and thus does not realize we will never hit
the rest of the short-circuit conditional (let alone the body) in that
case?

Still may be worth dealing with, but it makes me a little sad to have to
add an extra redundant check (one place is not a big deal, but as a
general pattern).

-Peff

do_set_head = 1;
}
if (branch_has_merge_config(branch) &&
if (branch && branch_has_merge_config(branch) &&
!strcmp(branch->remote_name, transport->remote->name)) {
int i;
for (i = 0; i < branch->merge_nr; i++) {
Expand Down Expand Up @@ -2560,6 +2560,7 @@ int cmd_fetch(int argc,
if (server_options.nr)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Jeff King wrote (reply to this):

On Thu, May 15, 2025 at 01:11:40PM +0000, Johannes Schindelin via GitGitGadget wrote:

> From: Johannes Schindelin <johannes.schindelin@gmx.de>
> 
> As pointed out by CodeQL, it is a potentially dangerous practice to
> store local variables' addresses in non-local structs. Yet this is
> exactly what happens with the `acked_commits` attribute that is used in
> `cmd_fetch()`: The pointer to a local variable is assigned to it.
> 
> Now, it is Git's convention that `cmd_*()` functions are essentially
> only returning just before exiting the process, therefore there is
> little danger that this attribute is used after the code flow returns
> from that function.

I was going to say: the real sin here is using a global variable in the
first place, without which gtransport would not survive outside of
cmd_fetch(). But the issue is even worse than that. The acked_commits
variable is inside a conditional block, so the address is stale for the
rest of cmd_fetch(), too!

It doesn't look like we ever examine it after that, but it's hard to
trace, since it's a global. ;)

> diff --git a/builtin/fetch.c b/builtin/fetch.c
> index cda6eaf1fd6e..c1a1434c7096 100644
> --- a/builtin/fetch.c
> +++ b/builtin/fetch.c
> @@ -2560,6 +2560,7 @@ int cmd_fetch(int argc,
>  		if (server_options.nr)
>  			gtransport->server_options = &server_options;
>  		result = transport_fetch_refs(gtransport, NULL);
> +		gtransport->smart_options->acked_commits = NULL;
>  
>  		oidset_iter_init(&acked_commits, &iter);
>  		while ((oid = oidset_iter_next(&iter)))

Here you unset it within that conditional block, which is the right
spot. Looks good.

-Peff

gtransport->server_options = &server_options;
result = transport_fetch_refs(gtransport, NULL);
gtransport->smart_options->acked_commits = NULL;

oidset_iter_init(&acked_commits, &iter);
while ((oid = oidset_iter_next(&iter)))
Expand Down
12 changes: 7 additions & 5 deletions bundle-uri.c
Original file line number Diff line number Diff line change
Expand Up @@ -532,11 +532,13 @@ static int fetch_bundles_by_token(struct repository *r,
*/
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Junio C Hamano wrote (reply to this):

"Johannes Schindelin via GitGitGadget" <gitgitgadget@gmail.com>
writes:

> From: Johannes Schindelin <johannes.schindelin@gmx.de>
>
> In c429bed102 (bundle-uri: store fetch.bundleCreationToken, 2023-01-31)
> code was introduced that assumes that an `sscanf()` call leaves its
> output variables unchanged unless the return value indicates success.
>
> However, the POSIX documentation makes no such guarantee:
> https://pubs.opengroup.org/onlinepubs/9699919799/functions/sscanf.html
>
> So let's make sure that the output variable `maxCreationToken` is
> always well-defined.
>
> diff --git a/bundle-uri.c b/bundle-uri.c
> index 96d2ba726d99..13a42f92387e 100644
> --- a/bundle-uri.c
> +++ b/bundle-uri.c
> @@ -532,11 +532,13 @@ static int fetch_bundles_by_token(struct repository *r,
>  	 */
>  	if (!repo_config_get_value(r,
>  				   "fetch.bundlecreationtoken",
> -				   &creationTokenStr) &&
> -	    sscanf(creationTokenStr, "%"PRIu64, &maxCreationToken) == 1 &&
> -	    bundles.items[0]->creationToken <= maxCreationToken) {
> -		free(bundles.items);
> -		return 0;

The original said "if we successfully parsed and the value of the
token is larger than the token, we are done", which is probably OK,
but the problem is if we were fed garbage and failed to parse it, we
would have smudged maxCreationToken to some unknown value, and the
code path that follows here, which assumes that maxCreationToken is
left as initialized to 0 will be broken.

So the problem is real, but I find the rewritten form a bit hard to
follow.  Namely, when sscanf() failed to grab maxCreationToken, we
never compared it with bundles.items[0]->creationToken, which makes
perfect sense to me, but now...

> +				   &creationTokenStr)) {
> +		if (sscanf(creationTokenStr, "%"PRIu64, &maxCreationToken) != 1)
> +			maxCreationToken = 0;
> +		if (bundles.items[0]->creationToken <= maxCreationToken) {
> +			free(bundles.items);
> +			return 0;
> +		}

... the updated code does use the just-assigned-because-we-failed-to-parse
value 0 in comparison.

I have to wonder if the attached patch is simpler to reason about,
more in line with what the original wanted to do, and more correct?

When we fail to grab the configuration, or when the value we grabbed
from the configuration does not parse, then we reset
maxCreationToken to 0, but otherwise we have a valid
maxCreationToken so use it to see if we should return early.

The cases we do not return early here are either (1) we did not have
usable configured value, in which case maxCreationToken is set to 0
before reaching the loop after this code, or (2) the value of
maxCreationToken we grabbed from the configuration is smaller than
the creationToken in the bundle list, in which case that value is
used when entering the loop.

Thanks.

 bundle-uri.c | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git c/bundle-uri.c w/bundle-uri.c
index f3579e228e..13a43f8e32 100644
--- c/bundle-uri.c
+++ w/bundle-uri.c
@@ -531,11 +531,11 @@ static int fetch_bundles_by_token(struct repository *r,
 	 * is not strictly smaller than the maximum creation token in the
 	 * bundle list, then do not download any bundles.
 	 */
-	if (!repo_config_get_value(r,
-				   "fetch.bundlecreationtoken",
-				   &creationTokenStr) &&
-	    sscanf(creationTokenStr, "%"PRIu64, &maxCreationToken) == 1 &&
-	    bundles.items[0]->creationToken <= maxCreationToken) {
+	if (repo_config_get_value(r, "fetch.bundlecreationtoken",
+				  &creationTokenStr) ||
+	    sscanf(creationTokenStr, "%"PRIu64, &maxCreationToken) != 1)
+		maxCreationToken = 0;
+	else if (bundles.items[0]->creationToken <= maxCreationToken) {
 		free(bundles.items);
 		return 0;
 	}

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Jeff King wrote (reply to this):

On Thu, May 15, 2025 at 01:11:48PM +0000, Johannes Schindelin via GitGitGadget wrote:

> In c429bed102 (bundle-uri: store fetch.bundleCreationToken, 2023-01-31)
> code was introduced that assumes that an `sscanf()` call leaves its
> output variables unchanged unless the return value indicates success.
> 
> However, the POSIX documentation makes no such guarantee:
> https://pubs.opengroup.org/onlinepubs/9699919799/functions/sscanf.html
> 
> So let's make sure that the output variable `maxCreationToken` is
> always well-defined.

Definitely an issue, but...why are we using sscanf() at all?

Wouldn't strtoul() be the usual thing in our code base? Or even just
repo_config_get_ulong()? The behavior of the latter would differ in that
we'd complain about a garbage value in fetch.bundlecreationtoken, but
wouldn't that be a good thing?

-Peff

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Phillip Wood wrote (reply to this):

On 15/05/2025 21:25, Jeff King wrote:
> On Thu, May 15, 2025 at 01:11:48PM +0000, Johannes Schindelin via GitGitGadget wrote:
> >> In c429bed102 (bundle-uri: store fetch.bundleCreationToken, 2023-01-31)
>> code was introduced that assumes that an `sscanf()` call leaves its
>> output variables unchanged unless the return value indicates success.
>>
>> However, the POSIX documentation makes no such guarantee:
>> https://pubs.opengroup.org/onlinepubs/9699919799/functions/sscanf.html
>>
>> So let's make sure that the output variable `maxCreationToken` is
>> always well-defined.
> > Definitely an issue, but...why are we using sscanf() at all?
> > Wouldn't strtoul() be the usual thing in our code base? Or even just
> repo_config_get_ulong()? The behavior of the latter would differ in that
> we'd complain about a garbage value in fetch.bundlecreationtoken, but
> wouldn't that be a good thing?

I had a similar thought, though to make sure that we parsed 64 bit values correctly on windows so we'd need something based on strtoumax() I think. There is another call to sscanf() in this file which the analyzer does not complain about because it stores the result in a local variable that is not used if the call to sscanf() fails. We should stop using sscanf() there as well. I wonder if we should add something about not using sscanf() to our coding guidelines. Apart from this file the only other use of sscanf() is in a test helper which doesn't seem so bad though if we removed that we could add sscanf() to banned.h.

Best Wishes

Phillip

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Phillip Wood wrote (reply to this):

On 16/05/2025 11:11, Phillip Wood wrote:

> I had a similar thought, though to make sure that we parsed 64 bit 
> values correctly on windows so we'd need something based on strtoumax() 
> I think.

Perhaps something like the diff below which adds strtoul_u64() in a
similar vein to strtoul_ui(). I think it's debatable whether we really
want to skip leading whitespace so we could perhaps tighten things up
by replacing "if (strchr(s, '-'))" with "if (!isdigit(*s))" though
that would mean this function would behave slightly differently to
strtoul_ui().

Best Wishes

Phillip

---- >8 ----
diff --git a/bundle-uri.c b/bundle-uri.c
index 96d2ba726d9..9dff7a1c09d 100644
--- a/bundle-uri.c
+++ b/bundle-uri.c
@@ -214,7 +214,7 @@ static int bundle_list_update(const char *key, const char *value,
 	}
 
 	if (!strcmp(subkey, "creationtoken")) {
-		if (sscanf(value, "%"PRIu64, &bundle->creationToken) != 1)
+		if (strtoul_u64(value, 10, &bundle->creationToken))
 			warning(_("could not parse bundle list key %s with value '%s'"),
 				"creationToken", value);
 		return 0;
@@ -533,7 +533,7 @@ static int fetch_bundles_by_token(struct repository *r,
 	if (!repo_config_get_value(r,
 				   "fetch.bundlecreationtoken",
 				   &creationTokenStr) &&
-	    sscanf(creationTokenStr, "%"PRIu64, &maxCreationToken) == 1 &&
+	    strtoul_u64(creationTokenStr,10, &maxCreationToken) &&
 	    bundles.items[0]->creationToken <= maxCreationToken) {
 		free(bundles.items);
 		return 0;
diff --git a/git-compat-util.h b/git-compat-util.h
index 36b9577c8d4..d34d07fce1e 100644
--- a/git-compat-util.h
+++ b/git-compat-util.h
@@ -939,6 +939,22 @@ static inline int strtol_i(char const *s, int base, int *result)
 	return 0;
 }
 
+static inline int strtoul_u64(char const *s, int base, uint64_t *result)
+{
+	uintmax_t ul;
+	char *p;
+
+	errno = 0;
+	/* negative values would be accepted by strtoumax */
+	if (strchr(s, '-'))
+		return -1;
+	ul = strtoumax(s, &p, base);
+	if (errno || *p || p == s || (uint64_t) ul != ul)
+		return -1;
+	*result = ul;
+	return 0;
+}
+
 #ifndef REG_STARTEND
 #error "Git requires REG_STARTEND support. Compile with NO_REGEX=NeedsStartEnd"
 #endif
-- 
2.49.0.897.gfad3eb7d210

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Jeff King wrote (reply to this):

On Fri, May 16, 2025 at 02:40:54PM +0100, Phillip Wood wrote:

> On 16/05/2025 11:11, Phillip Wood wrote:
> 
> > I had a similar thought, though to make sure that we parsed 64 bit 
> > values correctly on windows so we'd need something based on strtoumax() 
> > I think.
> 
> Perhaps something like the diff below which adds strtoul_u64() in a
> similar vein to strtoul_ui(). I think it's debatable whether we really
> want to skip leading whitespace so we could perhaps tighten things up
> by replacing "if (strchr(s, '-'))" with "if (!isdigit(*s))" though
> that would mean this function would behave slightly differently to
> strtoul_ui().

It feels like we would had to have dealt with this before for other
large values. But poking around at a few obvious suspects (e.g.,
packSizeLimit), it looks like they are all constrained to "unsigned
long".

So yeah, we probably do need something new. IMHO we should probably have
repo_config_get_u64() or similar (with the appropriate underlying
helpers as well) as use it here. But I am happy with any solution.

And I do agree that we should consider banning *scanf(). With numeric
placeholders I don't think they're a security problem (though they are
easy to get wrnog, as this discussion shows). But using them with "%s"
should generally be disallowed.

There is an fscanf() in builtin/gc.c that uses "%s", but it is careful
to construct a custom format string that limits the string size. Yuck.
The usual thing in our code base would be to read into a buffer and
parse from there.

-Peff

if (!repo_config_get_value(r,
"fetch.bundlecreationtoken",
&creationTokenStr) &&
sscanf(creationTokenStr, "%"PRIu64, &maxCreationToken) == 1 &&
bundles.items[0]->creationToken <= maxCreationToken) {
free(bundles.items);
return 0;
&creationTokenStr)) {
if (sscanf(creationTokenStr, "%"PRIu64, &maxCreationToken) != 1)
maxCreationToken = 0;
if (bundles.items[0]->creationToken <= maxCreationToken) {
free(bundles.items);
return 0;
}
}

/*
Expand Down
148 changes: 77 additions & 71 deletions commit-graph.c
Original file line number Diff line number Diff line change
Expand Up @@ -2509,7 +2509,17 @@ int write_commit_graph(struct object_directory *odb,
const struct commit_graph_opts *opts)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Jeff King wrote (reply to this):

On Thu, May 15, 2025 at 01:11:41PM +0000, Johannes Schindelin via GitGitGadget wrote:

> From: Johannes Schindelin <johannes.schindelin@gmx.de>
> 
> We do need a context to write the commit graph, but that context is only
> needed during the life time of `commit_graph_write()`, therefore it can
> easily be a stack variable.

Yay. I am in favor of using stack variables when possible as a general
rule.

> diff --git a/commit-graph.c b/commit-graph.c
> index 6394752b0b08..9f0115dac9b5 100644
> --- a/commit-graph.c
> +++ b/commit-graph.c
> @@ -2509,7 +2509,17 @@ int write_commit_graph(struct object_directory *odb,
>  		       const struct commit_graph_opts *opts)
>  {
>  	struct repository *r = the_repository;
> -	struct write_commit_graph_context *ctx;
> +	struct write_commit_graph_context ctx = {
> +		.r = r,
> +		.odb = odb,
> +		.append = flags & COMMIT_GRAPH_WRITE_APPEND ? 1 : 0,
> +		.report_progress = flags & COMMIT_GRAPH_WRITE_PROGRESS ? 1 : 0,
> +		.split = flags & COMMIT_GRAPH_WRITE_SPLIT ? 1 : 0,
> +		.opts = opts,
> +		.total_bloom_filter_data_size = 0,
> +		.write_generation_data = (get_configured_generation_version(r) == 2),
> +		.num_generation_data_overflows = 0,
> +	};
>  	uint32_t i;
>  	int res = 0;
>  	int replace = 0;
> @@ -2531,17 +2541,6 @@ int write_commit_graph(struct object_directory *odb,
>  		return 0;
>  	}
>  
> -	CALLOC_ARRAY(ctx, 1);
> -	ctx->r = r;
> -	ctx->odb = odb;
> -	ctx->append = flags & COMMIT_GRAPH_WRITE_APPEND ? 1 : 0;
> -	ctx->report_progress = flags & COMMIT_GRAPH_WRITE_PROGRESS ? 1 : 0;
> -	ctx->split = flags & COMMIT_GRAPH_WRITE_SPLIT ? 1 : 0;
> -	ctx->opts = opts;
> -	ctx->total_bloom_filter_data_size = 0;
> -	ctx->write_generation_data = (get_configured_generation_version(r) == 2);
> -	ctx->num_generation_data_overflows = 0;

OK, this moves the initialization to the top of the function. So to
review this for correctness, we must make sure that we do not change the
values of any of those variables between the two spots (i.e., in the
diff context that is omitted).

Most of it looks fine. Our call to get_configured_generation_version()
now happens earlier, before the call to prepare_repo_settings(). I think
that is OK, because the former calls repo_config_get_int() directly. It
does seem like a potential maintenance problem if that call is ever
rolled into prepare_repo_settings().

So maybe OK, but the smaller change would be to just replace the calloc
with a memset(), and s/->/./ on the subsequent lines.

-Peff

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Junio C Hamano wrote (reply to this):

Jeff King <peff@peff.net> writes:

> So maybe OK, but the smaller change would be to just replace the calloc
> with a memset(), and s/->/./ on the subsequent lines.

True, and it would be a bit easier to merge with other topics in
flight.  The .oid member and parameter are both renamed IIUC.

{
struct repository *r = the_repository;
struct write_commit_graph_context *ctx;
struct write_commit_graph_context ctx = {
.r = r,
.odb = odb,
.append = flags & COMMIT_GRAPH_WRITE_APPEND ? 1 : 0,
.report_progress = flags & COMMIT_GRAPH_WRITE_PROGRESS ? 1 : 0,
.split = flags & COMMIT_GRAPH_WRITE_SPLIT ? 1 : 0,
.opts = opts,
.total_bloom_filter_data_size = 0,
.write_generation_data = (get_configured_generation_version(r) == 2),
.num_generation_data_overflows = 0,
};
uint32_t i;
int res = 0;
int replace = 0;
Expand All @@ -2531,32 +2541,21 @@ int write_commit_graph(struct object_directory *odb,
return 0;
}

CALLOC_ARRAY(ctx, 1);
ctx->r = r;
ctx->odb = odb;
ctx->append = flags & COMMIT_GRAPH_WRITE_APPEND ? 1 : 0;
ctx->report_progress = flags & COMMIT_GRAPH_WRITE_PROGRESS ? 1 : 0;
ctx->split = flags & COMMIT_GRAPH_WRITE_SPLIT ? 1 : 0;
ctx->opts = opts;
ctx->total_bloom_filter_data_size = 0;
ctx->write_generation_data = (get_configured_generation_version(r) == 2);
ctx->num_generation_data_overflows = 0;

bloom_settings.hash_version = r->settings.commit_graph_changed_paths_version;
bloom_settings.bits_per_entry = git_env_ulong("GIT_TEST_BLOOM_SETTINGS_BITS_PER_ENTRY",
bloom_settings.bits_per_entry);
bloom_settings.num_hashes = git_env_ulong("GIT_TEST_BLOOM_SETTINGS_NUM_HASHES",
bloom_settings.num_hashes);
bloom_settings.max_changed_paths = git_env_ulong("GIT_TEST_BLOOM_SETTINGS_MAX_CHANGED_PATHS",
bloom_settings.max_changed_paths);
ctx->bloom_settings = &bloom_settings;
ctx.bloom_settings = &bloom_settings;

init_topo_level_slab(&topo_levels);
ctx->topo_levels = &topo_levels;
ctx.topo_levels = &topo_levels;

prepare_commit_graph(ctx->r);
if (ctx->r->objects->commit_graph) {
struct commit_graph *g = ctx->r->objects->commit_graph;
prepare_commit_graph(ctx.r);
if (ctx.r->objects->commit_graph) {
struct commit_graph *g = ctx.r->objects->commit_graph;

while (g) {
g->topo_levels = &topo_levels;
Expand All @@ -2565,15 +2564,15 @@ int write_commit_graph(struct object_directory *odb,
}

if (flags & COMMIT_GRAPH_WRITE_BLOOM_FILTERS)
ctx->changed_paths = 1;
ctx.changed_paths = 1;
if (!(flags & COMMIT_GRAPH_NO_WRITE_BLOOM_FILTERS)) {
struct commit_graph *g;

g = ctx->r->objects->commit_graph;
g = ctx.r->objects->commit_graph;

/* We have changed-paths already. Keep them in the next graph */
if (g && g->bloom_filter_settings) {
ctx->changed_paths = 1;
ctx.changed_paths = 1;

/* don't propagate the hash_version unless unspecified */
if (bloom_settings.hash_version == -1)
Expand All @@ -2586,116 +2585,123 @@ int write_commit_graph(struct object_directory *odb,

bloom_settings.hash_version = bloom_settings.hash_version == 2 ? 2 : 1;

if (ctx->split) {
struct commit_graph *g = ctx->r->objects->commit_graph;
if (ctx.split) {
struct commit_graph *g = ctx.r->objects->commit_graph;

while (g) {
ctx->num_commit_graphs_before++;
ctx.num_commit_graphs_before++;
g = g->base_graph;
}

if (ctx->num_commit_graphs_before) {
ALLOC_ARRAY(ctx->commit_graph_filenames_before, ctx->num_commit_graphs_before);
i = ctx->num_commit_graphs_before;
g = ctx->r->objects->commit_graph;
if (ctx.num_commit_graphs_before) {
ALLOC_ARRAY(ctx.commit_graph_filenames_before, ctx.num_commit_graphs_before);
i = ctx.num_commit_graphs_before;
g = ctx.r->objects->commit_graph;

while (g) {
ctx->commit_graph_filenames_before[--i] = xstrdup(g->filename);
ctx.commit_graph_filenames_before[--i] = xstrdup(g->filename);
g = g->base_graph;
}
}

if (ctx->opts)
replace = ctx->opts->split_flags & COMMIT_GRAPH_SPLIT_REPLACE;
if (ctx.opts)
replace = ctx.opts->split_flags & COMMIT_GRAPH_SPLIT_REPLACE;
}

ctx->approx_nr_objects = repo_approximate_object_count(the_repository);
ctx.approx_nr_objects = repo_approximate_object_count(the_repository);

if (ctx->append && ctx->r->objects->commit_graph) {
struct commit_graph *g = ctx->r->objects->commit_graph;
if (ctx.append && ctx.r->objects->commit_graph) {
struct commit_graph *g = ctx.r->objects->commit_graph;
for (i = 0; i < g->num_commits; i++) {
struct object_id oid;
oidread(&oid, g->chunk_oid_lookup + st_mult(g->hash_len, i),
the_repository->hash_algo);
oid_array_append(&ctx->oids, &oid);
oid_array_append(&ctx.oids, &oid);
}
}

if (pack_indexes) {
ctx->order_by_pack = 1;
if ((res = fill_oids_from_packs(ctx, pack_indexes)))
ctx.order_by_pack = 1;
if ((res = fill_oids_from_packs(&ctx, pack_indexes)))
goto cleanup;
}

if (commits) {
if ((res = fill_oids_from_commits(ctx, commits)))
if ((res = fill_oids_from_commits(&ctx, commits)))
goto cleanup;
}

if (!pack_indexes && !commits) {
ctx->order_by_pack = 1;
fill_oids_from_all_packs(ctx);
ctx.order_by_pack = 1;
fill_oids_from_all_packs(&ctx);
}

close_reachable(ctx);
close_reachable(&ctx);

copy_oids_to_commits(ctx);
copy_oids_to_commits(&ctx);

if (ctx->commits.nr >= GRAPH_EDGE_LAST_MASK) {
if (ctx.commits.nr >= GRAPH_EDGE_LAST_MASK) {
error(_("too many commits to write graph"));
res = -1;
goto cleanup;
}

if (!ctx->commits.nr && !replace)
if (!ctx.commits.nr && !replace)
goto cleanup;

if (ctx->split) {
split_graph_merge_strategy(ctx);
if (ctx.split) {
split_graph_merge_strategy(&ctx);

if (!replace)
merge_commit_graphs(ctx);
merge_commit_graphs(&ctx);
} else
ctx->num_commit_graphs_after = 1;
ctx.num_commit_graphs_after = 1;

ctx->trust_generation_numbers = validate_mixed_generation_chain(ctx->r->objects->commit_graph);
ctx.trust_generation_numbers = validate_mixed_generation_chain(ctx.r->objects->commit_graph);

compute_topological_levels(ctx);
if (ctx->write_generation_data)
compute_generation_numbers(ctx);
compute_topological_levels(&ctx);
if (ctx.write_generation_data)
compute_generation_numbers(&ctx);

if (ctx->changed_paths)
compute_bloom_filters(ctx);
if (ctx.changed_paths)
compute_bloom_filters(&ctx);

res = write_commit_graph_file(ctx);
res = write_commit_graph_file(&ctx);

if (ctx->changed_paths)
if (ctx.changed_paths)
deinit_bloom_filters();

if (ctx->split)
mark_commit_graphs(ctx);
if (ctx.split)
mark_commit_graphs(&ctx);

expire_commit_graphs(ctx);
expire_commit_graphs(&ctx);

cleanup:
free(ctx->graph_name);
free(ctx->base_graph_name);
free(ctx->commits.list);
oid_array_clear(&ctx->oids);
free(ctx.graph_name);
free(ctx.base_graph_name);
free(ctx.commits.list);
oid_array_clear(&ctx.oids);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Jeff King wrote (reply to this):

On Thu, May 15, 2025 at 01:11:47PM +0000, Johannes Schindelin via GitGitGadget wrote:

> The code is a bit too hard to reason about to fully assess whether the
> `fill_commit_graph_info()` function is called at all after
> `write_commit_graph()` returns (and hence the stack variable
> `topo_levels` goes out of context).
> 
> Let's simply make sure that the stack address is no longer used at that
> stage, thereby making the code quite a bit easier to reason about.

Yep, I think this is a good practice in general. If the topo_levels
member is never used outside of writing, I wonder if it could live in a
separate data structure. But that is a much bigger refactor that I don't
think we need to tackle here.

> diff --git a/commit-graph.c b/commit-graph.c
> index 9f0115dac9b5..d052c1bf15c5 100644
> --- a/commit-graph.c
> +++ b/commit-graph.c
> @@ -2683,6 +2683,15 @@ cleanup:
>  	oid_array_clear(&ctx.oids);
>  	clear_topo_level_slab(&topo_levels);
>  
> +	if (ctx.r->objects->commit_graph) {
> +		struct commit_graph *g = ctx.r->objects->commit_graph;
> +
> +		while (g) {
> +			g->topo_levels = NULL;
> +			g = g->base_graph;
> +		}
> +	}

This just clears the pointers to the local variable. Looks good.

-Peff

clear_topo_level_slab(&topo_levels);

for (i = 0; i < ctx->num_commit_graphs_before; i++)
free(ctx->commit_graph_filenames_before[i]);
free(ctx->commit_graph_filenames_before);
if (ctx.r->objects->commit_graph) {
struct commit_graph *g = ctx.r->objects->commit_graph;

for (i = 0; i < ctx->num_commit_graphs_after; i++) {
free(ctx->commit_graph_filenames_after[i]);
free(ctx->commit_graph_hash_after[i]);
while (g) {
g->topo_levels = NULL;
g = g->base_graph;
}
}
free(ctx->commit_graph_filenames_after);
free(ctx->commit_graph_hash_after);

free(ctx);
for (i = 0; i < ctx.num_commit_graphs_before; i++)
free(ctx.commit_graph_filenames_before[i]);
free(ctx.commit_graph_filenames_before);

for (i = 0; i < ctx.num_commit_graphs_after; i++) {
free(ctx.commit_graph_filenames_after[i]);
free(ctx.commit_graph_hash_after[i]);
}
free(ctx.commit_graph_filenames_after);
free(ctx.commit_graph_hash_after);

return res;
}
Expand Down
2 changes: 1 addition & 1 deletion help.c
Original file line number Diff line number Diff line change
Expand Up @@ -214,7 +214,7 @@ void exclude_cmds(struct cmdnames *cmds, struct cmdnames *excludes)
else if (cmp == 0) {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Jeff King wrote (reply to this):

On Thu, May 15, 2025 at 01:11:45PM +0000, Johannes Schindelin via GitGitGadget wrote:

> From: Johannes Schindelin <johannes.schindelin@gmx.de>
> 
> While `if (i <= 0) ... else if (i > 0) ...` is technically equivalent to
> `if (i <= 0) ... else ...`, the latter is vastly easier to read because
> it avoids writing out a condition that is unnecessary. Let's drop such
> unnecessary conditions.
> 
> Pointed out by CodeQL.

Yeah, I'd agree that it is easier (otherwise, you are left wondering if
there is an "else" case you are missing).

>  help.c             | 2 +-
>  transport-helper.c | 2 +-

Both spots look good to me.

-Peff

ei++;
free(cmds->names[ci++]);
} else if (cmp > 0)
} else
ei++;
}

Expand Down
55 changes: 13 additions & 42 deletions read-cache.c
Original file line number Diff line number Diff line change
Expand Up @@ -1117,48 +1117,19 @@ static int has_dir_name(struct index_state *istate,
*
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Jeff King wrote (reply to this):

On Thu, May 15, 2025 at 01:11:43PM +0000, Johannes Schindelin via GitGitGadget wrote:

> One thing that might be non-obvious to readers (or to analyzers like
> CodeQL) is that the function essentially does nothing when the Git index
> is empty, and in particular that it does not look at the value of
> `len_eq_last` (which would be uninitialized at that point).
> 
> Let's make this much easier to understand, by returning early if the Git
> index is empty, and by avoiding empty `else` blocks.

OK, so we return early, skipping not only what's in the conditional
that you're touching here, but also the "for(;;)" loop below.

And in that one, we'll look for the next slash (and break if none).
We'll check the name up to that stage via index_name_stage_pos(). And
obviously that will not find a match if there are no index entries. So
we'd do nothing and loop again, looking for the next slash, until we
eventually hit the end.

So yeah, I agree if there are no index entries we can bail immediately.

> This commit changes indentation and is hence best viewed using
> `--ignore-space-change`.

Yeah. I was puzzled at first by the amount of dropped code, but they are
all comments that say "fall through to the code below".

So I think the change here is correct. We are losing some comments that
could be helpful, but I'm not familiar enough with those code to say
whether they would be. Just reading what you've left makes sense to me
on its own.

-Peff

* Compare the entry's full path with the last path in the index.
*/
if (istate->cache_nr > 0) {
cmp_last = strcmp_offset(name,
istate->cache[istate->cache_nr - 1]->name,
&len_eq_last);
if (cmp_last > 0) {
if (name[len_eq_last] != '/') {
/*
* The entry sorts AFTER the last one in the
* index.
*
* If there were a conflict with "file", then our
* name would start with "file/" and the last index
* entry would start with "file" but not "file/".
*
* The next character after common prefix is
* not '/', so there can be no conflict.
*/
return retval;
} else {
/*
* The entry sorts AFTER the last one in the
* index, and the next character after common
* prefix is '/'.
*
* Either the last index entry is a file in
* conflict with this entry, or it has a name
* which sorts between this entry and the
* potential conflicting file.
*
* In both cases, we fall through to the loop
* below and let the regular search code handle it.
*/
}
} else if (cmp_last == 0) {
/*
* The entry exactly matches the last one in the
* index, but because of multiple stage and CE_REMOVE
* items, we fall through and let the regular search
* code handle it.
*/
}
}
if (!istate->cache_nr)
return 0;

cmp_last = strcmp_offset(name,
istate->cache[istate->cache_nr - 1]->name,
&len_eq_last);
if (cmp_last > 0 && name[len_eq_last] != '/')
/*
* The entry sorts AFTER the last one in the
* index and their paths have no common prefix,
* so there cannot be a F/D conflict.
*/
return 0;

for (;;) {
size_t len;
Expand Down
9 changes: 6 additions & 3 deletions sequencer.c
Original file line number Diff line number Diff line change
Expand Up @@ -2600,9 +2600,12 @@ static int is_command(enum todo_command command, const char **bol)
const char nick = todo_command_info[command].c;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Junio C Hamano wrote (reply to this):

"Johannes Schindelin via GitGitGadget" <gitgitgadget@gmail.com>
writes:

> From: Johannes Schindelin <johannes.schindelin@gmx.de>
>
> In 3e81bccdf3 (sequencer: factor out todo command name parsing,
> 2019-06-27), a `return` statement was introduced that basically was a
> long sequence of conditions, combined with `&&`, except for the last
> condition which is not really a condition but an assignment.
>
> The point of this construct was to return 1 (i.e. `true`) from the
> function if all of those conditions held true, and also assign the `bol`
> pointer to the end of the parsed command.

True, as the value of 'p' cannot be NULL at that point where it is
stored to the pointer variable bol points at.  The second paragraph
above does convey what the long expression really wants to achieve.

> Some static analyzers are really unhappy about such constructs. And
> human readers are at least puzzled, if not confused, by seeing a single
> `=` inside a chain of conditions where they would have expected to see
> `==` instead and, based on experience, immediately suspect a typo.

Yes.  Good thing to get rid of.

>
> Let's help all of this by turning this into the more verbose, more
> readable form of an `if` construct that both assigns the pointer as well
> as returns 1 if all of the conditions hold true.

> Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
> ---
>  sequencer.c | 9 ++++++---
>  1 file changed, 6 insertions(+), 3 deletions(-)
>
> diff --git a/sequencer.c b/sequencer.c
> index b5c4043757e9..e5e3bc6fa5ea 100644
> --- a/sequencer.c
> +++ b/sequencer.c
> @@ -2600,9 +2600,12 @@ static int is_command(enum todo_command command, const char **bol)
>  	const char nick = todo_command_info[command].c;
>  	const char *p = *bol;
>  
> -	return (skip_prefix(p, str, &p) || (nick && *p++ == nick)) &&
> -		(*p == ' ' || *p == '\t' || *p == '\n' || *p == '\r' || !*p) &&
> -		(*bol = p);
> +	if ((skip_prefix(p, str, &p) || (nick && *p++ == nick)) &&
> +	    (*p == ' ' || *p == '\t' || *p == '\n' || *p == '\r' || !*p)) {
> +		*bol = p;
> +		return 1;
> +	}
> +	return 0;
>  }

Perfect.  That's quite a natural way to express the intention.



>  
>  static int check_label_or_ref_arg(enum todo_command command, const char *arg)

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Jeff King wrote (reply to this):

On Thu, May 15, 2025 at 01:11:49PM +0000, Johannes Schindelin via GitGitGadget wrote:

> Let's help all of this by turning this into the more verbose, more
> readable form of an `if` construct that both assigns the pointer as well
> as returns 1 if all of the conditions hold true.

I see Junio already reviewed this one, but just so I can say I read all
of the patches: yes, I also agree the result is easier to understand. :)

-Peff

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Phillip Wood wrote (reply to this):

Hi Johannes

Thanks for cleaning this up - I'm not sure why I didn't just write something like this in the first place.

Best Wishes

Phillip

On 15/05/2025 14:11, Johannes Schindelin via GitGitGadget wrote:
> From: Johannes Schindelin <johannes.schindelin@gmx.de>
> > In 3e81bccdf3 (sequencer: factor out todo command name parsing,
> 2019-06-27), a `return` statement was introduced that basically was a
> long sequence of conditions, combined with `&&`, except for the last
> condition which is not really a condition but an assignment.
> > The point of this construct was to return 1 (i.e. `true`) from the
> function if all of those conditions held true, and also assign the `bol`
> pointer to the end of the parsed command.
> > Some static analyzers are really unhappy about such constructs. And
> human readers are at least puzzled, if not confused, by seeing a single
> `=` inside a chain of conditions where they would have expected to see
> `==` instead and, based on experience, immediately suspect a typo.
> > Let's help all of this by turning this into the more verbose, more
> readable form of an `if` construct that both assigns the pointer as well
> as returns 1 if all of the conditions hold true.
> > Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
> ---
>   sequencer.c | 9 ++++++---
>   1 file changed, 6 insertions(+), 3 deletions(-)
> > diff --git a/sequencer.c b/sequencer.c
> index b5c4043757e9..e5e3bc6fa5ea 100644
> --- a/sequencer.c
> +++ b/sequencer.c
> @@ -2600,9 +2600,12 @@ static int is_command(enum todo_command command, const char **bol)
>   	const char nick = todo_command_info[command].c;
>   	const char *p = *bol;
>   > -	return (skip_prefix(p, str, &p) || (nick && *p++ == nick)) &&
> -		(*p == ' ' || *p == '\t' || *p == '\n' || *p == '\r' || !*p) &&
> -		(*bol = p);
> +	if ((skip_prefix(p, str, &p) || (nick && *p++ == nick)) &&
> +	    (*p == ' ' || *p == '\t' || *p == '\n' || *p == '\r' || !*p)) {
> +		*bol = p;
> +		return 1;
> +	}
> +	return 0;
>   }
>   >   static int check_label_or_ref_arg(enum todo_command command, const char *arg)

const char *p = *bol;

return (skip_prefix(p, str, &p) || (nick && *p++ == nick)) &&
(*p == ' ' || *p == '\t' || *p == '\n' || *p == '\r' || !*p) &&
(*bol = p);
if ((skip_prefix(p, str, &p) || (nick && *p++ == nick)) &&
(*p == ' ' || *p == '\t' || *p == '\n' || *p == '\r' || !*p)) {
*bol = p;
return 1;
}
return 0;
}

static int check_label_or_ref_arg(enum todo_command command, const char *arg)
Expand Down
Loading
Loading