Skip to content

CodeQL-inspired fixes #1891

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 11 commits into
base: master
Choose a base branch
from
Open

CodeQL-inspired fixes #1891

wants to merge 11 commits into from

Conversation

dscho
Copy link
Member

@dscho dscho commented Mar 24, 2025

CodeQL pointed out a couple of issues, which are addressed in this patch series.

cc: Jeff King peff@peff.net

@dscho dscho self-assigned this Mar 24, 2025
dscho added 4 commits May 15, 2025 12:16
The difference of two unsigned integers is defined to be unsigned, and
therefore it is misleading to check whether it is greater than zero
(instead, the more natural way would be to check whether the difference
is zero or not).

Let's instead avoid the subtraction altogether, and compare the two
operands directly, which makes the code more obvious as a side effect.

Pointed out by CodeQL's rule with the ID
`cpp/unsigned-difference-expression-compared-zero`.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
As pointed out by CodeQL, it is a potentially dangerous practice to
store local variables' addresses in non-local structs. Yet this is
exactly what happens with the `acked_commits` attribute that is used in
`cmd_fetch()`: The pointer to a local variable is assigned to it.

Now, it is Git's convention that `cmd_*()` functions are essentially
only returning just before exiting the process, therefore there is
little danger that this attribute is used after the code flow returns
from that function.

However, code in `cmd_*()` function is often so useful that it gets
lifted into a library function, at which point this issue could become a
real problem.

Let's make sure to clear the `acked_commits` attribute out after it was
used, and before the function returns (at which point the address would
go stale).

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
We do need a context to write the commit graph, but that context is only
needed during the life time of `commit_graph_write()`, therefore it can
easily be a stack variable.

This also helps CodeQL recognize that it is safe to assign the address
of other local variables to the context's fields.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
While 3145ea9 (upload-pack: introduce fetch server command,
2018-03-15) added support for the `fetch` command, from the server's
point of view it is an upload, and hence the `enum` should really be
called `upload_state` instead of `fetch_state`. Likewise, rename its
values.

This also helps unconfuse CodeQL which would otherwise be at sixes or
sevens about having _two_ non-local definitions of the same `enum` with
the same values.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
dscho added 7 commits May 15, 2025 14:23
One thing that might be non-obvious to readers (or to analyzers like
CodeQL) is that the function essentially does nothing when the Git index
is empty, and in particular that it does not look at the value of
`len_eq_last` (which would be uninitialized at that point).

Let's make this much easier to understand, by returning early if the Git
index is empty, and by avoiding empty `else` blocks.

This commit changes indentation and is hence best viewed using
`--ignore-space-change`.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
As pointed out by CodeQL, `branch_get()` may return `NULL`, in which
case `branch_has_merge_config()` would return early, but we can even
avoid enumerating the refs prefixes in that case, saving even more CPU
cycles.

Technically, we should enclose these two statements in an `if (branch)
{...}` block, but the indentation is already quite deep, therefore I
refrained from doing that.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
While `if (i <= 0) ... else if (i > 0) ...` is technically equivalent to
`if (i <= 0) ... else ...`, the latter is vastly easier to read because
it avoids writing out a condition that is unnecessary. Let's drop such
unnecessary conditions.

Pointed out by CodeQL.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
CodeQL reports empty `if` blocks that only contain a comment as "futile
conditional". The comment talks about potential plans to turn this into
a warning, but that seems not to have been necessary. Replace the entire
construct with a concise comment.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
The code is a bit too hard to reason about to fully assess whether the
`fill_commit_graph_info()` function is called at all after
`write_commit_graph()` returns (and hence the stack variable
`topo_levels` goes out of context).

Let's simply make sure that the stack address is no longer used at that
stage, thereby making the code quite a bit easier to reason about.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
In c429bed (bundle-uri: store fetch.bundleCreationToken, 2023-01-31)
code was introduced that assumes that an `sscanf()` call leaves its
output variables unchanged unless the return value indicates success.

However, the POSIX documentation makes no such guarantee:
https://pubs.opengroup.org/onlinepubs/9699919799/functions/sscanf.html

So let's make sure that the output variable `maxCreationToken` is
always well-defined.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
In 3e81bcc (sequencer: factor out todo command name parsing,
2019-06-27), a `return` statement was introduced that basically was a
long sequence of conditions, combined with `&&`, except for the last
condition which is not really a condition but an assignment.

The point of this construct was to return 1 (i.e. `true`) from the
function if all of those conditions held true, and also assign the `bol`
pointer to the end of the parsed command.

Some static analyzers are really unhappy about such constructs. And
human readers are at least puzzled, if not confused, by seeing a single
`=` inside a chain of conditions where they would have expected to see
`==` instead and, based on experience, immediately suspect a typo.

Let's help all of this by turning this into the more verbose, more
readable form of an `if` construct that both assigns the pointer as well
as returns 1 if all of the conditions hold true.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
@dscho
Copy link
Member Author

dscho commented May 15, 2025

/submit

Copy link

gitgitgadget bot commented May 15, 2025

Submitted as pull.1891.git.1747314709.gitgitgadget@gmail.com

To fetch this version into FETCH_HEAD:

git fetch https://github.com/gitgitgadget/git/ pr-1891/dscho/codeql-fixes-v1

To fetch this version to local tag pr-1891/dscho/codeql-fixes-v1:

git fetch --no-tags https://github.com/gitgitgadget/git/ tag pr-1891/dscho/codeql-fixes-v1

@@ -2600,9 +2600,12 @@ static int is_command(enum todo_command command, const char **bol)
const char nick = todo_command_info[command].c;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Junio C Hamano wrote (reply to this):

"Johannes Schindelin via GitGitGadget" <gitgitgadget@gmail.com>
writes:

> From: Johannes Schindelin <johannes.schindelin@gmx.de>
>
> In 3e81bccdf3 (sequencer: factor out todo command name parsing,
> 2019-06-27), a `return` statement was introduced that basically was a
> long sequence of conditions, combined with `&&`, except for the last
> condition which is not really a condition but an assignment.
>
> The point of this construct was to return 1 (i.e. `true`) from the
> function if all of those conditions held true, and also assign the `bol`
> pointer to the end of the parsed command.

True, as the value of 'p' cannot be NULL at that point where it is
stored to the pointer variable bol points at.  The second paragraph
above does convey what the long expression really wants to achieve.

> Some static analyzers are really unhappy about such constructs. And
> human readers are at least puzzled, if not confused, by seeing a single
> `=` inside a chain of conditions where they would have expected to see
> `==` instead and, based on experience, immediately suspect a typo.

Yes.  Good thing to get rid of.

>
> Let's help all of this by turning this into the more verbose, more
> readable form of an `if` construct that both assigns the pointer as well
> as returns 1 if all of the conditions hold true.

> Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
> ---
>  sequencer.c | 9 ++++++---
>  1 file changed, 6 insertions(+), 3 deletions(-)
>
> diff --git a/sequencer.c b/sequencer.c
> index b5c4043757e9..e5e3bc6fa5ea 100644
> --- a/sequencer.c
> +++ b/sequencer.c
> @@ -2600,9 +2600,12 @@ static int is_command(enum todo_command command, const char **bol)
>  	const char nick = todo_command_info[command].c;
>  	const char *p = *bol;
>  
> -	return (skip_prefix(p, str, &p) || (nick && *p++ == nick)) &&
> -		(*p == ' ' || *p == '\t' || *p == '\n' || *p == '\r' || !*p) &&
> -		(*bol = p);
> +	if ((skip_prefix(p, str, &p) || (nick && *p++ == nick)) &&
> +	    (*p == ' ' || *p == '\t' || *p == '\n' || *p == '\r' || !*p)) {
> +		*bol = p;
> +		return 1;
> +	}
> +	return 0;
>  }

Perfect.  That's quite a natural way to express the intention.



>  
>  static int check_label_or_ref_arg(enum todo_command command, const char *arg)

@@ -532,11 +532,13 @@ static int fetch_bundles_by_token(struct repository *r,
*/
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Junio C Hamano wrote (reply to this):

"Johannes Schindelin via GitGitGadget" <gitgitgadget@gmail.com>
writes:

> From: Johannes Schindelin <johannes.schindelin@gmx.de>
>
> In c429bed102 (bundle-uri: store fetch.bundleCreationToken, 2023-01-31)
> code was introduced that assumes that an `sscanf()` call leaves its
> output variables unchanged unless the return value indicates success.
>
> However, the POSIX documentation makes no such guarantee:
> https://pubs.opengroup.org/onlinepubs/9699919799/functions/sscanf.html
>
> So let's make sure that the output variable `maxCreationToken` is
> always well-defined.
>
> diff --git a/bundle-uri.c b/bundle-uri.c
> index 96d2ba726d99..13a42f92387e 100644
> --- a/bundle-uri.c
> +++ b/bundle-uri.c
> @@ -532,11 +532,13 @@ static int fetch_bundles_by_token(struct repository *r,
>  	 */
>  	if (!repo_config_get_value(r,
>  				   "fetch.bundlecreationtoken",
> -				   &creationTokenStr) &&
> -	    sscanf(creationTokenStr, "%"PRIu64, &maxCreationToken) == 1 &&
> -	    bundles.items[0]->creationToken <= maxCreationToken) {
> -		free(bundles.items);
> -		return 0;

The original said "if we successfully parsed and the value of the
token is larger than the token, we are done", which is probably OK,
but the problem is if we were fed garbage and failed to parse it, we
would have smudged maxCreationToken to some unknown value, and the
code path that follows here, which assumes that maxCreationToken is
left as initialized to 0 will be broken.

So the problem is real, but I find the rewritten form a bit hard to
follow.  Namely, when sscanf() failed to grab maxCreationToken, we
never compared it with bundles.items[0]->creationToken, which makes
perfect sense to me, but now...

> +				   &creationTokenStr)) {
> +		if (sscanf(creationTokenStr, "%"PRIu64, &maxCreationToken) != 1)
> +			maxCreationToken = 0;
> +		if (bundles.items[0]->creationToken <= maxCreationToken) {
> +			free(bundles.items);
> +			return 0;
> +		}

... the updated code does use the just-assigned-because-we-failed-to-parse
value 0 in comparison.

I have to wonder if the attached patch is simpler to reason about,
more in line with what the original wanted to do, and more correct?

When we fail to grab the configuration, or when the value we grabbed
from the configuration does not parse, then we reset
maxCreationToken to 0, but otherwise we have a valid
maxCreationToken so use it to see if we should return early.

The cases we do not return early here are either (1) we did not have
usable configured value, in which case maxCreationToken is set to 0
before reaching the loop after this code, or (2) the value of
maxCreationToken we grabbed from the configuration is smaller than
the creationToken in the bundle list, in which case that value is
used when entering the loop.

Thanks.

 bundle-uri.c | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git c/bundle-uri.c w/bundle-uri.c
index f3579e228e..13a43f8e32 100644
--- c/bundle-uri.c
+++ w/bundle-uri.c
@@ -531,11 +531,11 @@ static int fetch_bundles_by_token(struct repository *r,
 	 * is not strictly smaller than the maximum creation token in the
 	 * bundle list, then do not download any bundles.
 	 */
-	if (!repo_config_get_value(r,
-				   "fetch.bundlecreationtoken",
-				   &creationTokenStr) &&
-	    sscanf(creationTokenStr, "%"PRIu64, &maxCreationToken) == 1 &&
-	    bundles.items[0]->creationToken <= maxCreationToken) {
+	if (repo_config_get_value(r, "fetch.bundlecreationtoken",
+				  &creationTokenStr) ||
+	    sscanf(creationTokenStr, "%"PRIu64, &maxCreationToken) != 1)
+		maxCreationToken = 0;
+	else if (bundles.items[0]->creationToken <= maxCreationToken) {
 		free(bundles.items);
 		return 0;
 	}

@@ -2560,6 +2560,7 @@ int cmd_fetch(int argc,
if (server_options.nr)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Jeff King wrote (reply to this):

On Thu, May 15, 2025 at 01:11:40PM +0000, Johannes Schindelin via GitGitGadget wrote:

> From: Johannes Schindelin <johannes.schindelin@gmx.de>
> 
> As pointed out by CodeQL, it is a potentially dangerous practice to
> store local variables' addresses in non-local structs. Yet this is
> exactly what happens with the `acked_commits` attribute that is used in
> `cmd_fetch()`: The pointer to a local variable is assigned to it.
> 
> Now, it is Git's convention that `cmd_*()` functions are essentially
> only returning just before exiting the process, therefore there is
> little danger that this attribute is used after the code flow returns
> from that function.

I was going to say: the real sin here is using a global variable in the
first place, without which gtransport would not survive outside of
cmd_fetch(). But the issue is even worse than that. The acked_commits
variable is inside a conditional block, so the address is stale for the
rest of cmd_fetch(), too!

It doesn't look like we ever examine it after that, but it's hard to
trace, since it's a global. ;)

> diff --git a/builtin/fetch.c b/builtin/fetch.c
> index cda6eaf1fd6e..c1a1434c7096 100644
> --- a/builtin/fetch.c
> +++ b/builtin/fetch.c
> @@ -2560,6 +2560,7 @@ int cmd_fetch(int argc,
>  		if (server_options.nr)
>  			gtransport->server_options = &server_options;
>  		result = transport_fetch_refs(gtransport, NULL);
> +		gtransport->smart_options->acked_commits = NULL;
>  
>  		oidset_iter_init(&acked_commits, &iter);
>  		while ((oid = oidset_iter_next(&iter)))

Here you unset it within that conditional block, which is the right
spot. Looks good.

-Peff

Copy link

gitgitgadget bot commented May 15, 2025

User Jeff King <peff@peff.net> has been added to the cc: list.

@@ -1022,7 +1022,7 @@ static int prepare_to_commit(const char *index_file, const char *prefix,
for (i = 0; i < the_repository->index->cache_nr; i++)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Jeff King wrote (reply to this):

On Thu, May 15, 2025 at 01:11:39PM +0000, Johannes Schindelin via GitGitGadget wrote:

> diff --git a/builtin/commit.c b/builtin/commit.c
> index 66bd91fd523d..fba0dded64a7 100644
> --- a/builtin/commit.c
> +++ b/builtin/commit.c
> @@ -1022,7 +1022,7 @@ static int prepare_to_commit(const char *index_file, const char *prefix,
>  			for (i = 0; i < the_repository->index->cache_nr; i++)
>  				if (ce_intent_to_add(the_repository->index->cache[i]))
>  					ita_nr++;
> -			committable = the_repository->index->cache_nr - ita_nr > 0;
> +			committable = the_repository->index->cache_nr > ita_nr;

I guess it is not possible for ita_nr to be greater than cache_nr, since
we are counting up entries in the loop above. If ita_nr were greater,
the original would wrap around and set committable to true, but yours
would not.

So really, I think the original was equivalent to:

  committable = cache_nr != ita_nr;

but I think ">" probably expresses the intent better (we want to know if
there are any non-ita entries). Though in that case I'd think:

  committable = 0;
  for (i = 0; i < cache_nr; i++) {
	if (!ce_intent_to_add(...) {
		committable = 1;
		break;
	}
  }

would be the most clear, since we do not otherwise care about the actual
number of ita entries. And lets us break out of the loop early.

I dunno if it is worth refactoring further, though. Your patch does the
correct thing and fixes the codeql complaint (which I do think is a
false positive, because ita_nr must be less than cache_nr).

-Peff

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Junio C Hamano wrote (reply to this):

Jeff King <peff@peff.net> writes:

> ... there are any non-ita entries). Though in that case I'd think:
>
>   committable = 0;
>   for (i = 0; i < cache_nr; i++) {
> 	if (!ce_intent_to_add(...) {
> 		committable = 1;
> 		break;
> 	}
>   }
>
> would be the most clear, since we do not otherwise care about the actual
> number of ita entries. And lets us break out of the loop early.

Exactly.  If you focus on the warning too narrowly, the minimal
change in the original patch does look OK, but in the original (even
before Dscho's patch, that is) the intent is unclear, as opposed to
what you showed above.  And the update to squelch false positive
does not improve the clarity of the logic as the above rewrite does.

Thanks.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Jeff King wrote (reply to this):

On Thu, May 15, 2025 at 01:37:00PM -0700, Junio C Hamano wrote:

> Jeff King <peff@peff.net> writes:
> 
> > ... there are any non-ita entries). Though in that case I'd think:
> >
> >   committable = 0;
> >   for (i = 0; i < cache_nr; i++) {
> > 	if (!ce_intent_to_add(...) {
> > 		committable = 1;
> > 		break;
> > 	}
> >   }
> >
> > would be the most clear, since we do not otherwise care about the actual
> > number of ita entries. And lets us break out of the loop early.
> 
> Exactly.  If you focus on the warning too narrowly, the minimal
> change in the original patch does look OK, but in the original (even
> before Dscho's patch, that is) the intent is unclear, as opposed to
> what you showed above.  And the update to squelch false positive
> does not improve the clarity of the logic as the above rewrite does.

OK. If we do want to refactor, I think pulling it into a separate
function is the most descriptive, like:

diff --git a/builtin/commit.c b/builtin/commit.c
index 66bd91fd52..a8d43d223d 100644
--- a/builtin/commit.c
+++ b/builtin/commit.c
@@ -740,6 +740,15 @@ static void change_data_free(void *util, const char *str UNUSED)
 	free(d);
 }
 
+static int has_non_ita_entries(struct index_state *index)
+{
+	int i;
+	for (i = 0; i < index->cache_nr; i++)
+		if (!ce_intent_to_add(index->cache[i]))
+			return 1;
+	return 0;
+}
+
 static int prepare_to_commit(const char *index_file, const char *prefix,
 			     struct commit *current_head,
 			     struct wt_status *s,
@@ -1015,14 +1024,10 @@ static int prepare_to_commit(const char *index_file, const char *prefix,
 			parent = "HEAD^1";
 
 		if (repo_get_oid(the_repository, parent, &oid)) {
-			int i, ita_nr = 0;
-
 			/* TODO: audit for interaction with sparse-index. */
 			ensure_full_index(the_repository->index);
-			for (i = 0; i < the_repository->index->cache_nr; i++)
-				if (ce_intent_to_add(the_repository->index->cache[i]))
-					ita_nr++;
-			committable = the_repository->index->cache_nr - ita_nr > 0;
+			committable =
+				has_non_ita_entries(the_repository->index);
 		} else {
 			/*
 			 * Unless the user did explicitly request a submodule

-Peff

@@ -2509,7 +2509,17 @@ int write_commit_graph(struct object_directory *odb,
const struct commit_graph_opts *opts)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Jeff King wrote (reply to this):

On Thu, May 15, 2025 at 01:11:41PM +0000, Johannes Schindelin via GitGitGadget wrote:

> From: Johannes Schindelin <johannes.schindelin@gmx.de>
> 
> We do need a context to write the commit graph, but that context is only
> needed during the life time of `commit_graph_write()`, therefore it can
> easily be a stack variable.

Yay. I am in favor of using stack variables when possible as a general
rule.

> diff --git a/commit-graph.c b/commit-graph.c
> index 6394752b0b08..9f0115dac9b5 100644
> --- a/commit-graph.c
> +++ b/commit-graph.c
> @@ -2509,7 +2509,17 @@ int write_commit_graph(struct object_directory *odb,
>  		       const struct commit_graph_opts *opts)
>  {
>  	struct repository *r = the_repository;
> -	struct write_commit_graph_context *ctx;
> +	struct write_commit_graph_context ctx = {
> +		.r = r,
> +		.odb = odb,
> +		.append = flags & COMMIT_GRAPH_WRITE_APPEND ? 1 : 0,
> +		.report_progress = flags & COMMIT_GRAPH_WRITE_PROGRESS ? 1 : 0,
> +		.split = flags & COMMIT_GRAPH_WRITE_SPLIT ? 1 : 0,
> +		.opts = opts,
> +		.total_bloom_filter_data_size = 0,
> +		.write_generation_data = (get_configured_generation_version(r) == 2),
> +		.num_generation_data_overflows = 0,
> +	};
>  	uint32_t i;
>  	int res = 0;
>  	int replace = 0;
> @@ -2531,17 +2541,6 @@ int write_commit_graph(struct object_directory *odb,
>  		return 0;
>  	}
>  
> -	CALLOC_ARRAY(ctx, 1);
> -	ctx->r = r;
> -	ctx->odb = odb;
> -	ctx->append = flags & COMMIT_GRAPH_WRITE_APPEND ? 1 : 0;
> -	ctx->report_progress = flags & COMMIT_GRAPH_WRITE_PROGRESS ? 1 : 0;
> -	ctx->split = flags & COMMIT_GRAPH_WRITE_SPLIT ? 1 : 0;
> -	ctx->opts = opts;
> -	ctx->total_bloom_filter_data_size = 0;
> -	ctx->write_generation_data = (get_configured_generation_version(r) == 2);
> -	ctx->num_generation_data_overflows = 0;

OK, this moves the initialization to the top of the function. So to
review this for correctness, we must make sure that we do not change the
values of any of those variables between the two spots (i.e., in the
diff context that is omitted).

Most of it looks fine. Our call to get_configured_generation_version()
now happens earlier, before the call to prepare_repo_settings(). I think
that is OK, because the former calls repo_config_get_int() directly. It
does seem like a potential maintenance problem if that call is ever
rolled into prepare_repo_settings().

So maybe OK, but the smaller change would be to just replace the calloc
with a memset(), and s/->/./ on the subsequent lines.

-Peff

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Junio C Hamano wrote (reply to this):

Jeff King <peff@peff.net> writes:

> So maybe OK, but the smaller change would be to just replace the calloc
> with a memset(), and s/->/./ on the subsequent lines.

True, and it would be a bit easier to merge with other topics in
flight.  The .oid member and parameter are both renamed IIUC.

@@ -1780,16 +1780,16 @@ static void send_shallow_info(struct upload_pack_data *data)
packet_delim(1);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Jeff King wrote (reply to this):

On Thu, May 15, 2025 at 01:11:42PM +0000, Johannes Schindelin via GitGitGadget wrote:

> While 3145ea957d (upload-pack: introduce fetch server command,
> 2018-03-15) added support for the `fetch` command, from the server's
> point of view it is an upload, and hence the `enum` should really be
> called `upload_state` instead of `fetch_state`. Likewise, rename its
> values.
> 
> This also helps unconfuse CodeQL which would otherwise be at sixes or
> sevens about having _two_ non-local definitions of the same `enum` with
> the same values.

It unconfuses me, too. Nice change.

-Peff

@@ -1117,48 +1117,19 @@ static int has_dir_name(struct index_state *istate,
*
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Jeff King wrote (reply to this):

On Thu, May 15, 2025 at 01:11:43PM +0000, Johannes Schindelin via GitGitGadget wrote:

> One thing that might be non-obvious to readers (or to analyzers like
> CodeQL) is that the function essentially does nothing when the Git index
> is empty, and in particular that it does not look at the value of
> `len_eq_last` (which would be uninitialized at that point).
> 
> Let's make this much easier to understand, by returning early if the Git
> index is empty, and by avoiding empty `else` blocks.

OK, so we return early, skipping not only what's in the conditional
that you're touching here, but also the "for(;;)" loop below.

And in that one, we'll look for the next slash (and break if none).
We'll check the name up to that stage via index_name_stage_pos(). And
obviously that will not find a match if there are no index entries. So
we'd do nothing and loop again, looking for the next slash, until we
eventually hit the end.

So yeah, I agree if there are no index entries we can bail immediately.

> This commit changes indentation and is hence best viewed using
> `--ignore-space-change`.

Yeah. I was puzzled at first by the amount of dropped code, but they are
all comments that say "fall through to the code below".

So I think the change here is correct. We are losing some comments that
could be helpful, but I'm not familiar enough with those code to say
whether they would be. Just reading what you've left makes sense to me
on its own.

-Peff

@@ -1728,7 +1728,7 @@ static int do_fetch(struct transport *transport,
if (transport->remote->follow_remote_head != FOLLOW_REMOTE_NEVER)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Jeff King wrote (reply to this):

On Thu, May 15, 2025 at 01:11:44PM +0000, Johannes Schindelin via GitGitGadget wrote:

> As pointed out by CodeQL, `branch_get()` may return `NULL`, in which
> case `branch_has_merge_config()` would return early, but we can even
> avoid enumerating the refs prefixes in that case, saving even more CPU
> cycles.

I am not sure how this patch changes anything with respect to CPU. If
branch is NULL, then branch_has_merge_config(branch) will always return
false.

I think this is just an issue that CodeQL is not looking inside
branch_has_merge_config(), and thus does not realize we will never hit
the rest of the short-circuit conditional (let alone the body) in that
case?

Still may be worth dealing with, but it makes me a little sad to have to
add an extra redundant check (one place is not a big deal, but as a
general pattern).

-Peff

@@ -214,7 +214,7 @@ void exclude_cmds(struct cmdnames *cmds, struct cmdnames *excludes)
else if (cmp == 0) {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Jeff King wrote (reply to this):

On Thu, May 15, 2025 at 01:11:45PM +0000, Johannes Schindelin via GitGitGadget wrote:

> From: Johannes Schindelin <johannes.schindelin@gmx.de>
> 
> While `if (i <= 0) ... else if (i > 0) ...` is technically equivalent to
> `if (i <= 0) ... else ...`, the latter is vastly easier to read because
> it avoids writing out a condition that is unnecessary. Let's drop such
> unnecessary conditions.
> 
> Pointed out by CodeQL.

Yeah, I'd agree that it is easier (otherwise, you are left wondering if
there is an "else" case you are missing).

>  help.c             | 2 +-
>  transport-helper.c | 2 +-

Both spots look good to me.

-Peff

@@ -102,25 +102,11 @@ void tr2_update_final_timers(void)
struct tr2_timer *t_final = &final_timer_block.timer[tid];
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Jeff King wrote (reply to this):

On Thu, May 15, 2025 at 01:11:46PM +0000, Johannes Schindelin via GitGitGadget wrote:

> CodeQL reports empty `if` blocks that only contain a comment as "futile
> conditional". The comment talks about potential plans to turn this into
> a warning, but that seems not to have been necessary. Replace the entire
> construct with a concise comment.

OK...

> -		if (t->recursion_count) {
> -			/*
> -			 * The current thread is exiting with
> -			 * timer[tid] still running.
> -			 *
> -			 * Technically, this is a bug, but I'm going
> -			 * to ignore it.
> -			 *
> -			 * I don't think it is worth calling die()
> -			 * for.  I don't think it is worth killing the
> -			 * process for this bookkeeping error.  We
> -			 * might want to call warning(), but I'm going
> -			 * to wait on that.
> -			 *
> -			 * The downside here is that total_ns won't
> -			 * include the current open interval (now -
> -			 * start_ns).  I can live with that.
> -			 */
> -		}
> +		/*
> +		 * `t->recursion_count` could technically be non-zero, which
> +		 * would constitute a bug. Reporting the bug would potentially
> +		 * cause an infinite recursion, though, so let's ignore it.
> +		 */

The original doesn't talk about infinite recursion at all, though I can
well believe that would be the case, having run into trace->die->trace
types of bugs before. Did you trace out the actual path of recursion? If
so, it might be worth summarizing it.

Obviously the code change itself cannot hurt anything, as it was a noop.

-Peff

free(ctx.graph_name);
free(ctx.base_graph_name);
free(ctx.commits.list);
oid_array_clear(&ctx.oids);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Jeff King wrote (reply to this):

On Thu, May 15, 2025 at 01:11:47PM +0000, Johannes Schindelin via GitGitGadget wrote:

> The code is a bit too hard to reason about to fully assess whether the
> `fill_commit_graph_info()` function is called at all after
> `write_commit_graph()` returns (and hence the stack variable
> `topo_levels` goes out of context).
> 
> Let's simply make sure that the stack address is no longer used at that
> stage, thereby making the code quite a bit easier to reason about.

Yep, I think this is a good practice in general. If the topo_levels
member is never used outside of writing, I wonder if it could live in a
separate data structure. But that is a much bigger refactor that I don't
think we need to tackle here.

> diff --git a/commit-graph.c b/commit-graph.c
> index 9f0115dac9b5..d052c1bf15c5 100644
> --- a/commit-graph.c
> +++ b/commit-graph.c
> @@ -2683,6 +2683,15 @@ cleanup:
>  	oid_array_clear(&ctx.oids);
>  	clear_topo_level_slab(&topo_levels);
>  
> +	if (ctx.r->objects->commit_graph) {
> +		struct commit_graph *g = ctx.r->objects->commit_graph;
> +
> +		while (g) {
> +			g->topo_levels = NULL;
> +			g = g->base_graph;
> +		}
> +	}

This just clears the pointers to the local variable. Looks good.

-Peff

@@ -532,11 +532,13 @@ static int fetch_bundles_by_token(struct repository *r,
*/
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Jeff King wrote (reply to this):

On Thu, May 15, 2025 at 01:11:48PM +0000, Johannes Schindelin via GitGitGadget wrote:

> In c429bed102 (bundle-uri: store fetch.bundleCreationToken, 2023-01-31)
> code was introduced that assumes that an `sscanf()` call leaves its
> output variables unchanged unless the return value indicates success.
> 
> However, the POSIX documentation makes no such guarantee:
> https://pubs.opengroup.org/onlinepubs/9699919799/functions/sscanf.html
> 
> So let's make sure that the output variable `maxCreationToken` is
> always well-defined.

Definitely an issue, but...why are we using sscanf() at all?

Wouldn't strtoul() be the usual thing in our code base? Or even just
repo_config_get_ulong()? The behavior of the latter would differ in that
we'd complain about a garbage value in fetch.bundlecreationtoken, but
wouldn't that be a good thing?

-Peff

@@ -2600,9 +2600,12 @@ static int is_command(enum todo_command command, const char **bol)
const char nick = todo_command_info[command].c;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Jeff King wrote (reply to this):

On Thu, May 15, 2025 at 01:11:49PM +0000, Johannes Schindelin via GitGitGadget wrote:

> Let's help all of this by turning this into the more verbose, more
> readable form of an `if` construct that both assigns the pointer as well
> as returns 1 if all of the conditions hold true.

I see Junio already reviewed this one, but just so I can say I read all
of the patches: yes, I also agree the result is easier to understand. :)

-Peff

Copy link

gitgitgadget bot commented May 15, 2025

On the Git mailing list, Jeff King wrote (reply to this):

On Thu, May 15, 2025 at 01:11:38PM +0000, Johannes Schindelin via GitGitGadget wrote:

> CodeQL [https://codeql.github.com/] pointed out a couple of issues, which
> are addressed in this patch series.
> 
> Johannes Schindelin (11):
>   commit: simplify code
>   fetch: carefully clear local variable's address after use
>   commit-graph: avoid malloc'ing a local variable
>   upload-pack: rename `enum` to reflect the operation
>   has_dir_name(): make code more obvious
>   fetch: avoid unnecessary work when there is no current branch
>   Avoid redundant conditions
>   trace2: avoid "futile conditional"
>   commit-graph: avoid using stale stack addresses
>   bundle-uri: avoid using undefined output of `sscanf()`
>   sequencer: stop pretending that an assignment is a condition

I read through all of these and didn't find anything incorrect. I did
leave a few comments that might or might not be worth following up on.
Thanks for fixing these.

-Peff

Copy link

gitgitgadget bot commented May 15, 2025

On the Git mailing list, Junio C Hamano wrote (reply to this):

Jeff King <peff@peff.net> writes:

> On Thu, May 15, 2025 at 01:11:38PM +0000, Johannes Schindelin via GitGitGadget wrote:
>
>> CodeQL [https://codeql.github.com/] pointed out a couple of issues, which
>> are addressed in this patch series.
>> 
>> Johannes Schindelin (11):
>>   commit: simplify code
>>   fetch: carefully clear local variable's address after use
>>   commit-graph: avoid malloc'ing a local variable
>>   upload-pack: rename `enum` to reflect the operation
>>   has_dir_name(): make code more obvious
>>   fetch: avoid unnecessary work when there is no current branch
>>   Avoid redundant conditions
>>   trace2: avoid "futile conditional"
>>   commit-graph: avoid using stale stack addresses
>>   bundle-uri: avoid using undefined output of `sscanf()`
>>   sequencer: stop pretending that an assignment is a condition
>
> I read through all of these and didn't find anything incorrect. I did
> leave a few comments that might or might not be worth following up on.
> Thanks for fixing these.

Yup, I also looked at them and didn't see any incorrect updates.

Copy link

gitgitgadget bot commented May 15, 2025

This patch series was integrated into seen via git@6fbd4fe.

@gitgitgadget gitgitgadget bot added the seen label May 15, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant