-
Notifications
You must be signed in to change notification settings - Fork 4
Initial draft defining syntax, semantics of controlling expressions #65
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Initial draft defining syntax, semantics of controlling expressions #65
Conversation
We describe a subset of the C constant-expression syntax for use in controlling expressions. Expression evaluation itself follows Fortran arithmetic expression semantics. Note that the tables are a bit terse as we try to keep the line length less than the 75-character limit for J3 papers.
This is pretty rough, but I wanted to produce something earlier rather than later. I have to go off for a day or so and work on other assignments for classes. |
Co-authored-by: Patrick Fasano <patrick@patrickfasano.com>
Co-authored-by: Patrick Fasano <patrick@patrickfasano.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @gklimowicz for making a start on this tricky area!
Initial set of feedback:
instances of ID or ID (args) will all have been replaced with their | ||
expansions. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This last sentence falsely implies there will be no instances of ID after expansion. This is misleading and actually quite common, with code like:
#if ___GNUC___
which is shorthand to test whether ___GNUC___
is defined to a non-zero value.
This works because of 6.10.2-13 (emphasis added):
Prior to evaluation, macro invocations in the list of preprocessing tokens that will become the controlling constant expression are replaced (except for those macro names modified by the defined unary operator), just as in normal text. If the token defined is generated as a result of this replacement process or use of the defined unary operator does not match one of the two specified forms prior to macro replacement, the behavior is undefined. After all replacements due to macro expansion and evaluations of defined macro expressions, has_include expressions, has_embed expressions, and has_c_attribute expressions have been performed, all remaining identifiers other than true (including those lexically identical to keywords such as false) are replaced with the pp-number 0, true is replaced with pp-number 1, and then each preprocessing token is converted into a token.
We'll need similar rules (ignoring the C23 features we are not keeping) to explain the replacement of any ID with 0 after expansion.
| ID | The expansion of the object-like macro ID | | ||
| ID (args) | The expansion of the function-like macro ID | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand why are ID and ID(args) listed as primary expressions here. The paragraph immediately above has just explained that macros have already been expanded away during evaluation of conditional expressions, so macro invocations are NOT Primary expressions in the post-expansion expression grammar.
Listing them here "for completeness" is not helpful, it's just plain wrong. No conditional expression evaluation whatsoever is performed until after macros are completely expanded, and the pre-expansion text may look nothing like a valid conditional expression.
Here is a valid input example demonstrating what I mean:
#define LPAREN (
#define RPAREN )
#define ONE_PLUS 1 +
#if ONE_PLUS ZERO * LPAREN ONE_PLUS 4 RPAREN
integer :: tada
#endif
Pre-expansion, the list of tokens in the expression above looks like:
#if ID ID * ID ID WHOLE_NUMBER ID
post-expansion it looks like this:
#if WHOLE_NUMBER + WHOLE_NUMBER * ( WHOLE_NUMBER + WHOLE_NUMBER )
So wildly different that it's not useful to talk about grammar of the conditional expressions prior to expansion (aside from the bare minimum required to delineate arguments in FLM invocations).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's only describe the post-expansion grammar, and not the pre-expansion grammar. There is no such pre-expansion grammar. Dan points out that undefined ID replacement has to be done after processing of ##
tokens.
| | defined | defined ID | nonassoc | 1 if the identifier | | ||
| | | | | has a #defined value, | | ||
| | | | | 0 otherwise | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On further thought, this is just plain wrong.
defined
cannot appear in this table, because it needs to be applied AFTER macro expansion and BEFORE ID replacement with zero. Hence the defined
operator (as in CPP) must be resolved and replaced before this post-expansion grammar is applied.
| ID | The expansion of the object-like macro ID | | ||
| ID (args) | The expansion of the function-like macro ID | | ||
| WHOLE_NUMBER | Decimal value of WHOLE_NUMBER | | ||
| ( expr ) | Parenthesized expressions | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Parenthesized expressions are listed in the operator table below, so listing them here is redundant.
After macro expansion and ID-replacement, I believe the only "primaries" left in valid conditional expressions should be WHOLE_NUMBER, and the operators in the table below combining them (which includes defined
as an operator).
In short, I suggest we delete this "primary table" entirely and replace it with a statement to that effect.
|
||
| Prec | Op | Syntax | Assoc'y | Evaluation Semantics | | ||
|------+---------+--------------+----------+----------------------------| | ||
| low | ? : | e1 ? e2 : e3 | right | conditional-expr | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll note in passing that CPP also allows the comma operator in conditional expressions (at lower priority than conditional-expression), although it's pretty pointless in preprocessor expressions and I'm not aware of any compelling use cases.
For this reason we (implicitly) omitted it from the requirements doc in 25-114r2. I'm only raising it now in case someone has a compelling argument to include it (something other than strict compatibility with CPP), otherwise I'm fine dropping it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left them out on purpose, but didn't have a strong reason to do so.
I may be missing a subtlety here.
In general, conditional expression evaluation is side-effect free. So, elaborating
#if (my_complicated_expression, my_other_expression)
results in only the value of my_other_expression
affecting the #if
. my_complicated_expression
may be evaluated, but its result is thrown away.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's also the slippery slope in the C grammar, where the CPP conditional expressions start at conditional-expression, which I think can unfold down to primary-expression, which then includes the (
expression )
which brings in the whole barnyard of comma-expressions and assignment-expression.
So I just chopped those rules out of the grammar.
I'd like to take credit but you pinged the wrong person :) |
Co-authored-by: Dan Bonachea <dobonachea@lbl.gov>
Co-authored-by: Dan Bonachea <dobonachea@lbl.gov>
Co-authored-by: Patrick Fasano <patrick@patrickfasano.com>
Co-authored-by: Dan Bonachea <dobonachea@lbl.gov>
| | ¦¦ | e1 || e2 | left | Fortran .OR. | | ||
|------+---------+--------------+----------+----------------------------| | ||
| | && | e1 && e2 | left | Fortran .AND. | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the 5/12 call we resolved these should be short-circuit evaluation as in CPP, to allow things like:
#if x && 1/x
#endif
which means they are NOT simply Fortran .OR. / .AND.
Co-authored-by: Dan Bonachea <dobonachea@lbl.gov>
Co-authored-by: Dan Bonachea <dobonachea@lbl.gov>
| | defined | defined ID | nonassoc | 1 if the identifier | | ||
| | | | | has a #defined value, | | ||
| | | | | 0 otherwise | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On further thought, this is just plain wrong.
defined
cannot appear in this table, because it needs to be applied AFTER macro expansion and BEFORE ID replacement with zero. Hence the defined
operator (as in CPP) must be resolved and replaced before this post-expansion grammar is applied.
There remains other work to do here, but let's make sure this change is in. Co-authored-by: Dan Bonachea <dobonachea@lbl.gov>
Co-authored-by: Dan Bonachea <dobonachea@lbl.gov>
We describe a subset of the C constant-expression syntax for use in controlling expressions. Expression evaluation itself follows Fortran arithmetic expression semantics.
Note that the tables are a bit terse as we try to keep the line length less than the 75-character limit for J3 papers.