-
Notifications
You must be signed in to change notification settings - Fork 140
CodeQL-inspired fixes #1891
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
CodeQL-inspired fixes #1891
Conversation
The difference of two unsigned integers is defined to be unsigned, and therefore it is misleading to check whether it is greater than zero (instead, the more natural way would be to check whether the difference is zero or not). Let's instead avoid the subtraction altogether, and compare the two operands directly, which makes the code more obvious as a side effect. Pointed out by CodeQL's rule with the ID `cpp/unsigned-difference-expression-compared-zero`. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
As pointed out by CodeQL, it is a potentially dangerous practice to store local variables' addresses in non-local structs. Yet this is exactly what happens with the `acked_commits` attribute that is used in `cmd_fetch()`: The pointer to a local variable is assigned to it. Now, it is Git's convention that `cmd_*()` functions are essentially only returning just before exiting the process, therefore there is little danger that this attribute is used after the code flow returns from that function. However, code in `cmd_*()` function is often so useful that it gets lifted into a library function, at which point this issue could become a real problem. Let's make sure to clear the `acked_commits` attribute out after it was used, and before the function returns (at which point the address would go stale). Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
We do need a context to write the commit graph, but that context is only needed during the life time of `commit_graph_write()`, therefore it can easily be a stack variable. This also helps CodeQL recognize that it is safe to assign the address of other local variables to the context's fields. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
While 3145ea9 (upload-pack: introduce fetch server command, 2018-03-15) added support for the `fetch` command, from the server's point of view it is an upload, and hence the `enum` should really be called `upload_state` instead of `fetch_state`. Likewise, rename its values. This also helps unconfuse CodeQL which would otherwise be at sixes or sevens about having _two_ non-local definitions of the same `enum` with the same values. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
One thing that might be non-obvious to readers (or to analyzers like CodeQL) is that the function essentially does nothing when the Git index is empty, and in particular that it does not look at the value of `len_eq_last` (which would be uninitialized at that point). Let's make this much easier to understand, by returning early if the Git index is empty, and by avoiding empty `else` blocks. This commit changes indentation and is hence best viewed using `--ignore-space-change`. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
As pointed out by CodeQL, `branch_get()` may return `NULL`, in which case `branch_has_merge_config()` would return early, but we can even avoid enumerating the refs prefixes in that case, saving even more CPU cycles. Technically, we should enclose these two statements in an `if (branch) {...}` block, but the indentation is already quite deep, therefore I refrained from doing that. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
While `if (i <= 0) ... else if (i > 0) ...` is technically equivalent to `if (i <= 0) ... else ...`, the latter is vastly easier to read because it avoids writing out a condition that is unnecessary. Let's drop such unnecessary conditions. Pointed out by CodeQL. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
CodeQL reports empty `if` blocks that only contain a comment as "futile conditional". The comment talks about potential plans to turn this into a warning, but that seems not to have been necessary. Replace the entire construct with a concise comment. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
The code is a bit too hard to reason about to fully assess whether the `fill_commit_graph_info()` function is called at all after `write_commit_graph()` returns (and hence the stack variable `topo_levels` goes out of context). Let's simply make sure that the stack address is no longer used at that stage, thereby making the code quite a bit easier to reason about. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
In c429bed (bundle-uri: store fetch.bundleCreationToken, 2023-01-31) code was introduced that assumes that an `sscanf()` call leaves its output variables unchanged unless the return value indicates success. However, the POSIX documentation makes no such guarantee: https://pubs.opengroup.org/onlinepubs/9699919799/functions/sscanf.html So let's make sure that the output variable `maxCreationToken` is always well-defined. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
In 3e81bcc (sequencer: factor out todo command name parsing, 2019-06-27), a `return` statement was introduced that basically was a long sequence of conditions, combined with `&&`, except for the last condition which is not really a condition but an assignment. The point of this construct was to return 1 (i.e. `true`) from the function if all of those conditions held true, and also assign the `bol` pointer to the end of the parsed command. Some static analyzers are really unhappy about such constructs. And human readers are at least puzzled, if not confused, by seeing a single `=` inside a chain of conditions where they would have expected to see `==` instead and, based on experience, immediately suspect a typo. Let's help all of this by turning this into the more verbose, more readable form of an `if` construct that both assigns the pointer as well as returns 1 if all of the conditions hold true. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
/submit |
Submitted as pull.1891.git.1747314709.gitgitgadget@gmail.com To fetch this version into
To fetch this version to local tag
|
@@ -2600,9 +2600,12 @@ static int is_command(enum todo_command command, const char **bol) | |||
const char nick = todo_command_info[command].c; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the Git mailing list, Junio C Hamano wrote (reply to this):
"Johannes Schindelin via GitGitGadget" <gitgitgadget@gmail.com>
writes:
> From: Johannes Schindelin <johannes.schindelin@gmx.de>
>
> In 3e81bccdf3 (sequencer: factor out todo command name parsing,
> 2019-06-27), a `return` statement was introduced that basically was a
> long sequence of conditions, combined with `&&`, except for the last
> condition which is not really a condition but an assignment.
>
> The point of this construct was to return 1 (i.e. `true`) from the
> function if all of those conditions held true, and also assign the `bol`
> pointer to the end of the parsed command.
True, as the value of 'p' cannot be NULL at that point where it is
stored to the pointer variable bol points at. The second paragraph
above does convey what the long expression really wants to achieve.
> Some static analyzers are really unhappy about such constructs. And
> human readers are at least puzzled, if not confused, by seeing a single
> `=` inside a chain of conditions where they would have expected to see
> `==` instead and, based on experience, immediately suspect a typo.
Yes. Good thing to get rid of.
>
> Let's help all of this by turning this into the more verbose, more
> readable form of an `if` construct that both assigns the pointer as well
> as returns 1 if all of the conditions hold true.
> Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
> ---
> sequencer.c | 9 ++++++---
> 1 file changed, 6 insertions(+), 3 deletions(-)
>
> diff --git a/sequencer.c b/sequencer.c
> index b5c4043757e9..e5e3bc6fa5ea 100644
> --- a/sequencer.c
> +++ b/sequencer.c
> @@ -2600,9 +2600,12 @@ static int is_command(enum todo_command command, const char **bol)
> const char nick = todo_command_info[command].c;
> const char *p = *bol;
>
> - return (skip_prefix(p, str, &p) || (nick && *p++ == nick)) &&
> - (*p == ' ' || *p == '\t' || *p == '\n' || *p == '\r' || !*p) &&
> - (*bol = p);
> + if ((skip_prefix(p, str, &p) || (nick && *p++ == nick)) &&
> + (*p == ' ' || *p == '\t' || *p == '\n' || *p == '\r' || !*p)) {
> + *bol = p;
> + return 1;
> + }
> + return 0;
> }
Perfect. That's quite a natural way to express the intention.
>
> static int check_label_or_ref_arg(enum todo_command command, const char *arg)
@@ -532,11 +532,13 @@ static int fetch_bundles_by_token(struct repository *r, | |||
*/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the Git mailing list, Junio C Hamano wrote (reply to this):
"Johannes Schindelin via GitGitGadget" <gitgitgadget@gmail.com>
writes:
> From: Johannes Schindelin <johannes.schindelin@gmx.de>
>
> In c429bed102 (bundle-uri: store fetch.bundleCreationToken, 2023-01-31)
> code was introduced that assumes that an `sscanf()` call leaves its
> output variables unchanged unless the return value indicates success.
>
> However, the POSIX documentation makes no such guarantee:
> https://pubs.opengroup.org/onlinepubs/9699919799/functions/sscanf.html
>
> So let's make sure that the output variable `maxCreationToken` is
> always well-defined.
>
> diff --git a/bundle-uri.c b/bundle-uri.c
> index 96d2ba726d99..13a42f92387e 100644
> --- a/bundle-uri.c
> +++ b/bundle-uri.c
> @@ -532,11 +532,13 @@ static int fetch_bundles_by_token(struct repository *r,
> */
> if (!repo_config_get_value(r,
> "fetch.bundlecreationtoken",
> - &creationTokenStr) &&
> - sscanf(creationTokenStr, "%"PRIu64, &maxCreationToken) == 1 &&
> - bundles.items[0]->creationToken <= maxCreationToken) {
> - free(bundles.items);
> - return 0;
The original said "if we successfully parsed and the value of the
token is larger than the token, we are done", which is probably OK,
but the problem is if we were fed garbage and failed to parse it, we
would have smudged maxCreationToken to some unknown value, and the
code path that follows here, which assumes that maxCreationToken is
left as initialized to 0 will be broken.
So the problem is real, but I find the rewritten form a bit hard to
follow. Namely, when sscanf() failed to grab maxCreationToken, we
never compared it with bundles.items[0]->creationToken, which makes
perfect sense to me, but now...
> + &creationTokenStr)) {
> + if (sscanf(creationTokenStr, "%"PRIu64, &maxCreationToken) != 1)
> + maxCreationToken = 0;
> + if (bundles.items[0]->creationToken <= maxCreationToken) {
> + free(bundles.items);
> + return 0;
> + }
... the updated code does use the just-assigned-because-we-failed-to-parse
value 0 in comparison.
I have to wonder if the attached patch is simpler to reason about,
more in line with what the original wanted to do, and more correct?
When we fail to grab the configuration, or when the value we grabbed
from the configuration does not parse, then we reset
maxCreationToken to 0, but otherwise we have a valid
maxCreationToken so use it to see if we should return early.
The cases we do not return early here are either (1) we did not have
usable configured value, in which case maxCreationToken is set to 0
before reaching the loop after this code, or (2) the value of
maxCreationToken we grabbed from the configuration is smaller than
the creationToken in the bundle list, in which case that value is
used when entering the loop.
Thanks.
bundle-uri.c | 10 +++++-----
1 file changed, 5 insertions(+), 5 deletions(-)
diff --git c/bundle-uri.c w/bundle-uri.c
index f3579e228e..13a43f8e32 100644
--- c/bundle-uri.c
+++ w/bundle-uri.c
@@ -531,11 +531,11 @@ static int fetch_bundles_by_token(struct repository *r,
* is not strictly smaller than the maximum creation token in the
* bundle list, then do not download any bundles.
*/
- if (!repo_config_get_value(r,
- "fetch.bundlecreationtoken",
- &creationTokenStr) &&
- sscanf(creationTokenStr, "%"PRIu64, &maxCreationToken) == 1 &&
- bundles.items[0]->creationToken <= maxCreationToken) {
+ if (repo_config_get_value(r, "fetch.bundlecreationtoken",
+ &creationTokenStr) ||
+ sscanf(creationTokenStr, "%"PRIu64, &maxCreationToken) != 1)
+ maxCreationToken = 0;
+ else if (bundles.items[0]->creationToken <= maxCreationToken) {
free(bundles.items);
return 0;
}
@@ -2560,6 +2560,7 @@ int cmd_fetch(int argc, | |||
if (server_options.nr) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the Git mailing list, Jeff King wrote (reply to this):
On Thu, May 15, 2025 at 01:11:40PM +0000, Johannes Schindelin via GitGitGadget wrote:
> From: Johannes Schindelin <johannes.schindelin@gmx.de>
>
> As pointed out by CodeQL, it is a potentially dangerous practice to
> store local variables' addresses in non-local structs. Yet this is
> exactly what happens with the `acked_commits` attribute that is used in
> `cmd_fetch()`: The pointer to a local variable is assigned to it.
>
> Now, it is Git's convention that `cmd_*()` functions are essentially
> only returning just before exiting the process, therefore there is
> little danger that this attribute is used after the code flow returns
> from that function.
I was going to say: the real sin here is using a global variable in the
first place, without which gtransport would not survive outside of
cmd_fetch(). But the issue is even worse than that. The acked_commits
variable is inside a conditional block, so the address is stale for the
rest of cmd_fetch(), too!
It doesn't look like we ever examine it after that, but it's hard to
trace, since it's a global. ;)
> diff --git a/builtin/fetch.c b/builtin/fetch.c
> index cda6eaf1fd6e..c1a1434c7096 100644
> --- a/builtin/fetch.c
> +++ b/builtin/fetch.c
> @@ -2560,6 +2560,7 @@ int cmd_fetch(int argc,
> if (server_options.nr)
> gtransport->server_options = &server_options;
> result = transport_fetch_refs(gtransport, NULL);
> + gtransport->smart_options->acked_commits = NULL;
>
> oidset_iter_init(&acked_commits, &iter);
> while ((oid = oidset_iter_next(&iter)))
Here you unset it within that conditional block, which is the right
spot. Looks good.
-Peff
User |
@@ -1022,7 +1022,7 @@ static int prepare_to_commit(const char *index_file, const char *prefix, | |||
for (i = 0; i < the_repository->index->cache_nr; i++) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the Git mailing list, Jeff King wrote (reply to this):
On Thu, May 15, 2025 at 01:11:39PM +0000, Johannes Schindelin via GitGitGadget wrote:
> diff --git a/builtin/commit.c b/builtin/commit.c
> index 66bd91fd523d..fba0dded64a7 100644
> --- a/builtin/commit.c
> +++ b/builtin/commit.c
> @@ -1022,7 +1022,7 @@ static int prepare_to_commit(const char *index_file, const char *prefix,
> for (i = 0; i < the_repository->index->cache_nr; i++)
> if (ce_intent_to_add(the_repository->index->cache[i]))
> ita_nr++;
> - committable = the_repository->index->cache_nr - ita_nr > 0;
> + committable = the_repository->index->cache_nr > ita_nr;
I guess it is not possible for ita_nr to be greater than cache_nr, since
we are counting up entries in the loop above. If ita_nr were greater,
the original would wrap around and set committable to true, but yours
would not.
So really, I think the original was equivalent to:
committable = cache_nr != ita_nr;
but I think ">" probably expresses the intent better (we want to know if
there are any non-ita entries). Though in that case I'd think:
committable = 0;
for (i = 0; i < cache_nr; i++) {
if (!ce_intent_to_add(...) {
committable = 1;
break;
}
}
would be the most clear, since we do not otherwise care about the actual
number of ita entries. And lets us break out of the loop early.
I dunno if it is worth refactoring further, though. Your patch does the
correct thing and fixes the codeql complaint (which I do think is a
false positive, because ita_nr must be less than cache_nr).
-Peff
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the Git mailing list, Junio C Hamano wrote (reply to this):
Jeff King <peff@peff.net> writes:
> ... there are any non-ita entries). Though in that case I'd think:
>
> committable = 0;
> for (i = 0; i < cache_nr; i++) {
> if (!ce_intent_to_add(...) {
> committable = 1;
> break;
> }
> }
>
> would be the most clear, since we do not otherwise care about the actual
> number of ita entries. And lets us break out of the loop early.
Exactly. If you focus on the warning too narrowly, the minimal
change in the original patch does look OK, but in the original (even
before Dscho's patch, that is) the intent is unclear, as opposed to
what you showed above. And the update to squelch false positive
does not improve the clarity of the logic as the above rewrite does.
Thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the Git mailing list, Jeff King wrote (reply to this):
On Thu, May 15, 2025 at 01:37:00PM -0700, Junio C Hamano wrote:
> Jeff King <peff@peff.net> writes:
>
> > ... there are any non-ita entries). Though in that case I'd think:
> >
> > committable = 0;
> > for (i = 0; i < cache_nr; i++) {
> > if (!ce_intent_to_add(...) {
> > committable = 1;
> > break;
> > }
> > }
> >
> > would be the most clear, since we do not otherwise care about the actual
> > number of ita entries. And lets us break out of the loop early.
>
> Exactly. If you focus on the warning too narrowly, the minimal
> change in the original patch does look OK, but in the original (even
> before Dscho's patch, that is) the intent is unclear, as opposed to
> what you showed above. And the update to squelch false positive
> does not improve the clarity of the logic as the above rewrite does.
OK. If we do want to refactor, I think pulling it into a separate
function is the most descriptive, like:
diff --git a/builtin/commit.c b/builtin/commit.c
index 66bd91fd52..a8d43d223d 100644
--- a/builtin/commit.c
+++ b/builtin/commit.c
@@ -740,6 +740,15 @@ static void change_data_free(void *util, const char *str UNUSED)
free(d);
}
+static int has_non_ita_entries(struct index_state *index)
+{
+ int i;
+ for (i = 0; i < index->cache_nr; i++)
+ if (!ce_intent_to_add(index->cache[i]))
+ return 1;
+ return 0;
+}
+
static int prepare_to_commit(const char *index_file, const char *prefix,
struct commit *current_head,
struct wt_status *s,
@@ -1015,14 +1024,10 @@ static int prepare_to_commit(const char *index_file, const char *prefix,
parent = "HEAD^1";
if (repo_get_oid(the_repository, parent, &oid)) {
- int i, ita_nr = 0;
-
/* TODO: audit for interaction with sparse-index. */
ensure_full_index(the_repository->index);
- for (i = 0; i < the_repository->index->cache_nr; i++)
- if (ce_intent_to_add(the_repository->index->cache[i]))
- ita_nr++;
- committable = the_repository->index->cache_nr - ita_nr > 0;
+ committable =
+ has_non_ita_entries(the_repository->index);
} else {
/*
* Unless the user did explicitly request a submodule
-Peff
@@ -2509,7 +2509,17 @@ int write_commit_graph(struct object_directory *odb, | |||
const struct commit_graph_opts *opts) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the Git mailing list, Jeff King wrote (reply to this):
On Thu, May 15, 2025 at 01:11:41PM +0000, Johannes Schindelin via GitGitGadget wrote:
> From: Johannes Schindelin <johannes.schindelin@gmx.de>
>
> We do need a context to write the commit graph, but that context is only
> needed during the life time of `commit_graph_write()`, therefore it can
> easily be a stack variable.
Yay. I am in favor of using stack variables when possible as a general
rule.
> diff --git a/commit-graph.c b/commit-graph.c
> index 6394752b0b08..9f0115dac9b5 100644
> --- a/commit-graph.c
> +++ b/commit-graph.c
> @@ -2509,7 +2509,17 @@ int write_commit_graph(struct object_directory *odb,
> const struct commit_graph_opts *opts)
> {
> struct repository *r = the_repository;
> - struct write_commit_graph_context *ctx;
> + struct write_commit_graph_context ctx = {
> + .r = r,
> + .odb = odb,
> + .append = flags & COMMIT_GRAPH_WRITE_APPEND ? 1 : 0,
> + .report_progress = flags & COMMIT_GRAPH_WRITE_PROGRESS ? 1 : 0,
> + .split = flags & COMMIT_GRAPH_WRITE_SPLIT ? 1 : 0,
> + .opts = opts,
> + .total_bloom_filter_data_size = 0,
> + .write_generation_data = (get_configured_generation_version(r) == 2),
> + .num_generation_data_overflows = 0,
> + };
> uint32_t i;
> int res = 0;
> int replace = 0;
> @@ -2531,17 +2541,6 @@ int write_commit_graph(struct object_directory *odb,
> return 0;
> }
>
> - CALLOC_ARRAY(ctx, 1);
> - ctx->r = r;
> - ctx->odb = odb;
> - ctx->append = flags & COMMIT_GRAPH_WRITE_APPEND ? 1 : 0;
> - ctx->report_progress = flags & COMMIT_GRAPH_WRITE_PROGRESS ? 1 : 0;
> - ctx->split = flags & COMMIT_GRAPH_WRITE_SPLIT ? 1 : 0;
> - ctx->opts = opts;
> - ctx->total_bloom_filter_data_size = 0;
> - ctx->write_generation_data = (get_configured_generation_version(r) == 2);
> - ctx->num_generation_data_overflows = 0;
OK, this moves the initialization to the top of the function. So to
review this for correctness, we must make sure that we do not change the
values of any of those variables between the two spots (i.e., in the
diff context that is omitted).
Most of it looks fine. Our call to get_configured_generation_version()
now happens earlier, before the call to prepare_repo_settings(). I think
that is OK, because the former calls repo_config_get_int() directly. It
does seem like a potential maintenance problem if that call is ever
rolled into prepare_repo_settings().
So maybe OK, but the smaller change would be to just replace the calloc
with a memset(), and s/->/./ on the subsequent lines.
-Peff
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the Git mailing list, Junio C Hamano wrote (reply to this):
Jeff King <peff@peff.net> writes:
> So maybe OK, but the smaller change would be to just replace the calloc
> with a memset(), and s/->/./ on the subsequent lines.
True, and it would be a bit easier to merge with other topics in
flight. The .oid member and parameter are both renamed IIUC.
@@ -1780,16 +1780,16 @@ static void send_shallow_info(struct upload_pack_data *data) | |||
packet_delim(1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the Git mailing list, Jeff King wrote (reply to this):
On Thu, May 15, 2025 at 01:11:42PM +0000, Johannes Schindelin via GitGitGadget wrote:
> While 3145ea957d (upload-pack: introduce fetch server command,
> 2018-03-15) added support for the `fetch` command, from the server's
> point of view it is an upload, and hence the `enum` should really be
> called `upload_state` instead of `fetch_state`. Likewise, rename its
> values.
>
> This also helps unconfuse CodeQL which would otherwise be at sixes or
> sevens about having _two_ non-local definitions of the same `enum` with
> the same values.
It unconfuses me, too. Nice change.
-Peff
@@ -1117,48 +1117,19 @@ static int has_dir_name(struct index_state *istate, | |||
* |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the Git mailing list, Jeff King wrote (reply to this):
On Thu, May 15, 2025 at 01:11:43PM +0000, Johannes Schindelin via GitGitGadget wrote:
> One thing that might be non-obvious to readers (or to analyzers like
> CodeQL) is that the function essentially does nothing when the Git index
> is empty, and in particular that it does not look at the value of
> `len_eq_last` (which would be uninitialized at that point).
>
> Let's make this much easier to understand, by returning early if the Git
> index is empty, and by avoiding empty `else` blocks.
OK, so we return early, skipping not only what's in the conditional
that you're touching here, but also the "for(;;)" loop below.
And in that one, we'll look for the next slash (and break if none).
We'll check the name up to that stage via index_name_stage_pos(). And
obviously that will not find a match if there are no index entries. So
we'd do nothing and loop again, looking for the next slash, until we
eventually hit the end.
So yeah, I agree if there are no index entries we can bail immediately.
> This commit changes indentation and is hence best viewed using
> `--ignore-space-change`.
Yeah. I was puzzled at first by the amount of dropped code, but they are
all comments that say "fall through to the code below".
So I think the change here is correct. We are losing some comments that
could be helpful, but I'm not familiar enough with those code to say
whether they would be. Just reading what you've left makes sense to me
on its own.
-Peff
@@ -1728,7 +1728,7 @@ static int do_fetch(struct transport *transport, | |||
if (transport->remote->follow_remote_head != FOLLOW_REMOTE_NEVER) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the Git mailing list, Jeff King wrote (reply to this):
On Thu, May 15, 2025 at 01:11:44PM +0000, Johannes Schindelin via GitGitGadget wrote:
> As pointed out by CodeQL, `branch_get()` may return `NULL`, in which
> case `branch_has_merge_config()` would return early, but we can even
> avoid enumerating the refs prefixes in that case, saving even more CPU
> cycles.
I am not sure how this patch changes anything with respect to CPU. If
branch is NULL, then branch_has_merge_config(branch) will always return
false.
I think this is just an issue that CodeQL is not looking inside
branch_has_merge_config(), and thus does not realize we will never hit
the rest of the short-circuit conditional (let alone the body) in that
case?
Still may be worth dealing with, but it makes me a little sad to have to
add an extra redundant check (one place is not a big deal, but as a
general pattern).
-Peff
@@ -214,7 +214,7 @@ void exclude_cmds(struct cmdnames *cmds, struct cmdnames *excludes) | |||
else if (cmp == 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the Git mailing list, Jeff King wrote (reply to this):
On Thu, May 15, 2025 at 01:11:45PM +0000, Johannes Schindelin via GitGitGadget wrote:
> From: Johannes Schindelin <johannes.schindelin@gmx.de>
>
> While `if (i <= 0) ... else if (i > 0) ...` is technically equivalent to
> `if (i <= 0) ... else ...`, the latter is vastly easier to read because
> it avoids writing out a condition that is unnecessary. Let's drop such
> unnecessary conditions.
>
> Pointed out by CodeQL.
Yeah, I'd agree that it is easier (otherwise, you are left wondering if
there is an "else" case you are missing).
> help.c | 2 +-
> transport-helper.c | 2 +-
Both spots look good to me.
-Peff
@@ -102,25 +102,11 @@ void tr2_update_final_timers(void) | |||
struct tr2_timer *t_final = &final_timer_block.timer[tid]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the Git mailing list, Jeff King wrote (reply to this):
On Thu, May 15, 2025 at 01:11:46PM +0000, Johannes Schindelin via GitGitGadget wrote:
> CodeQL reports empty `if` blocks that only contain a comment as "futile
> conditional". The comment talks about potential plans to turn this into
> a warning, but that seems not to have been necessary. Replace the entire
> construct with a concise comment.
OK...
> - if (t->recursion_count) {
> - /*
> - * The current thread is exiting with
> - * timer[tid] still running.
> - *
> - * Technically, this is a bug, but I'm going
> - * to ignore it.
> - *
> - * I don't think it is worth calling die()
> - * for. I don't think it is worth killing the
> - * process for this bookkeeping error. We
> - * might want to call warning(), but I'm going
> - * to wait on that.
> - *
> - * The downside here is that total_ns won't
> - * include the current open interval (now -
> - * start_ns). I can live with that.
> - */
> - }
> + /*
> + * `t->recursion_count` could technically be non-zero, which
> + * would constitute a bug. Reporting the bug would potentially
> + * cause an infinite recursion, though, so let's ignore it.
> + */
The original doesn't talk about infinite recursion at all, though I can
well believe that would be the case, having run into trace->die->trace
types of bugs before. Did you trace out the actual path of recursion? If
so, it might be worth summarizing it.
Obviously the code change itself cannot hurt anything, as it was a noop.
-Peff
free(ctx.graph_name); | ||
free(ctx.base_graph_name); | ||
free(ctx.commits.list); | ||
oid_array_clear(&ctx.oids); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the Git mailing list, Jeff King wrote (reply to this):
On Thu, May 15, 2025 at 01:11:47PM +0000, Johannes Schindelin via GitGitGadget wrote:
> The code is a bit too hard to reason about to fully assess whether the
> `fill_commit_graph_info()` function is called at all after
> `write_commit_graph()` returns (and hence the stack variable
> `topo_levels` goes out of context).
>
> Let's simply make sure that the stack address is no longer used at that
> stage, thereby making the code quite a bit easier to reason about.
Yep, I think this is a good practice in general. If the topo_levels
member is never used outside of writing, I wonder if it could live in a
separate data structure. But that is a much bigger refactor that I don't
think we need to tackle here.
> diff --git a/commit-graph.c b/commit-graph.c
> index 9f0115dac9b5..d052c1bf15c5 100644
> --- a/commit-graph.c
> +++ b/commit-graph.c
> @@ -2683,6 +2683,15 @@ cleanup:
> oid_array_clear(&ctx.oids);
> clear_topo_level_slab(&topo_levels);
>
> + if (ctx.r->objects->commit_graph) {
> + struct commit_graph *g = ctx.r->objects->commit_graph;
> +
> + while (g) {
> + g->topo_levels = NULL;
> + g = g->base_graph;
> + }
> + }
This just clears the pointers to the local variable. Looks good.
-Peff
@@ -532,11 +532,13 @@ static int fetch_bundles_by_token(struct repository *r, | |||
*/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the Git mailing list, Jeff King wrote (reply to this):
On Thu, May 15, 2025 at 01:11:48PM +0000, Johannes Schindelin via GitGitGadget wrote:
> In c429bed102 (bundle-uri: store fetch.bundleCreationToken, 2023-01-31)
> code was introduced that assumes that an `sscanf()` call leaves its
> output variables unchanged unless the return value indicates success.
>
> However, the POSIX documentation makes no such guarantee:
> https://pubs.opengroup.org/onlinepubs/9699919799/functions/sscanf.html
>
> So let's make sure that the output variable `maxCreationToken` is
> always well-defined.
Definitely an issue, but...why are we using sscanf() at all?
Wouldn't strtoul() be the usual thing in our code base? Or even just
repo_config_get_ulong()? The behavior of the latter would differ in that
we'd complain about a garbage value in fetch.bundlecreationtoken, but
wouldn't that be a good thing?
-Peff
@@ -2600,9 +2600,12 @@ static int is_command(enum todo_command command, const char **bol) | |||
const char nick = todo_command_info[command].c; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the Git mailing list, Jeff King wrote (reply to this):
On Thu, May 15, 2025 at 01:11:49PM +0000, Johannes Schindelin via GitGitGadget wrote:
> Let's help all of this by turning this into the more verbose, more
> readable form of an `if` construct that both assigns the pointer as well
> as returns 1 if all of the conditions hold true.
I see Junio already reviewed this one, but just so I can say I read all
of the patches: yes, I also agree the result is easier to understand. :)
-Peff
On the Git mailing list, Jeff King wrote (reply to this): On Thu, May 15, 2025 at 01:11:38PM +0000, Johannes Schindelin via GitGitGadget wrote:
> CodeQL [https://codeql.github.com/] pointed out a couple of issues, which
> are addressed in this patch series.
>
> Johannes Schindelin (11):
> commit: simplify code
> fetch: carefully clear local variable's address after use
> commit-graph: avoid malloc'ing a local variable
> upload-pack: rename `enum` to reflect the operation
> has_dir_name(): make code more obvious
> fetch: avoid unnecessary work when there is no current branch
> Avoid redundant conditions
> trace2: avoid "futile conditional"
> commit-graph: avoid using stale stack addresses
> bundle-uri: avoid using undefined output of `sscanf()`
> sequencer: stop pretending that an assignment is a condition
I read through all of these and didn't find anything incorrect. I did
leave a few comments that might or might not be worth following up on.
Thanks for fixing these.
-Peff |
On the Git mailing list, Junio C Hamano wrote (reply to this): Jeff King <peff@peff.net> writes:
> On Thu, May 15, 2025 at 01:11:38PM +0000, Johannes Schindelin via GitGitGadget wrote:
>
>> CodeQL [https://codeql.github.com/] pointed out a couple of issues, which
>> are addressed in this patch series.
>>
>> Johannes Schindelin (11):
>> commit: simplify code
>> fetch: carefully clear local variable's address after use
>> commit-graph: avoid malloc'ing a local variable
>> upload-pack: rename `enum` to reflect the operation
>> has_dir_name(): make code more obvious
>> fetch: avoid unnecessary work when there is no current branch
>> Avoid redundant conditions
>> trace2: avoid "futile conditional"
>> commit-graph: avoid using stale stack addresses
>> bundle-uri: avoid using undefined output of `sscanf()`
>> sequencer: stop pretending that an assignment is a condition
>
> I read through all of these and didn't find anything incorrect. I did
> leave a few comments that might or might not be worth following up on.
> Thanks for fixing these.
Yup, I also looked at them and didn't see any incorrect updates. |
This patch series was integrated into seen via git@6fbd4fe. |
CodeQL pointed out a couple of issues, which are addressed in this patch series.
cc: Jeff King peff@peff.net