lake: incremental updates 0531#22968
Conversation
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
There was a problem hiding this comment.
Code Review
This pull request introduces comprehensive documentation for Amazon SQS (S3) integration, including new guides for the SQS (S3) IAM Role data source and integration tasks, as well as a detailed guide on Data Protection Policies (masking and row access policies). It also updates several SQL reference files and guides to clarify regular expression pattern matching for staged files. The review feedback highlights opportunities to better align with the style guide by rewriting passive voice sentences into the active voice, addressing the user in the second person ('you'), applying sentence case to headings, and consistently formatting function names, keywords, and paths with backticks.
|
|
||
| # Amazon SQS (S3) - IAM Role | ||
|
|
||
| This page describes how to create an `Amazon SQS (S3) - IAM Role` data source. This data source stores the configuration required to access an Amazon SQS queue and the corresponding S3 bucket, and is used for consuming S3 object creation events delivered from Amazon S3 to SQS. |
There was a problem hiding this comment.
Rewrite to avoid passive voice ("is used for", "delivered from") and write in the second person ("you") as recommended by the style guide.
| This page describes how to create an `Amazon SQS (S3) - IAM Role` data source. This data source stores the configuration required to access an Amazon SQS queue and the corresponding S3 bucket, and is used for consuming S3 object creation events delivered from Amazon S3 to SQS. | |
| This page describes how to create an `Amazon SQS (S3) - IAM Role` data source. This data source stores the configuration required to access an Amazon SQS queue and the corresponding S3 bucket, and helps you consume S3 object creation events that Amazon S3 delivers to SQS. |
References
- Avoid passive voice and write in second person. (link)
|
|
||
| This page describes how to create an `Amazon SQS (S3) - IAM Role` data source. This data source stores the configuration required to access an Amazon SQS queue and the corresponding S3 bucket, and is used for consuming S3 object creation events delivered from Amazon S3 to SQS. | ||
|
|
||
| `Amazon SQS (S3) - IAM Role` only stores the connection and authorization information required for SQS (S3) ingestion. It does not consume messages by itself. The actual process of reading SQS messages, parsing S3 ObjectCreated events, and writing data into {{{ .lake }}} is performed by an [Amazon SQS (S3) Integration Task](/tidb-cloud-lake/guides/integrate-with-amazon-sqs-s3.md). |
There was a problem hiding this comment.
Avoid passive voice ("is performed by") and use backticks for the event name ObjectCreated to maintain consistency and adhere to the style guide.
| `Amazon SQS (S3) - IAM Role` only stores the connection and authorization information required for SQS (S3) ingestion. It does not consume messages by itself. The actual process of reading SQS messages, parsing S3 ObjectCreated events, and writing data into {{{ .lake }}} is performed by an [Amazon SQS (S3) Integration Task](/tidb-cloud-lake/guides/integrate-with-amazon-sqs-s3.md). | |
| An [Amazon SQS (S3) Integration Task](/tidb-cloud-lake/guides/integrate-with-amazon-sqs-s3.md) performs the actual process of reading SQS messages, parsing S3 `ObjectCreated` events, and writing data into {{{ .lake }}}. |
References
- Avoid passive voice overuse and use backticks for code snippets/event names. (link)
| > | ||
| > SQS (S3) ingestion uses the AssumeRole model. You do not need to provide AWS Access Key or Secret Key to {{{ .lake }}}. Instead, create an IAM Role in your AWS account and allow {{{ .lake }}} platform roles to obtain temporary credentials through `sts:AssumeRole` in the role trust policy. | ||
|
|
||
| ## AWS-Side Configuration Overview |
There was a problem hiding this comment.
Use sentence case for headings.
| ## AWS-Side Configuration Overview | |
| ## AWS-side configuration overview |
References
- Use sentence case for headings. (link)
| - SQS permissions are scoped to the target queue ARN. | ||
| - S3 permissions are scoped to the target bucket and object ARN. | ||
| - By default, this policy does not require S3 write or delete permissions. | ||
| - If a future SQS (S3) integration task enables **PURGE** or **Clean Up Original Files**, meaning source objects are deleted after successful ingestion, grant `s3:DeleteObject` on the target object path. |
There was a problem hiding this comment.
Avoid passive voice ("are deleted") by rewriting to active voice.
| - If a future SQS (S3) integration task enables **PURGE** or **Clean Up Original Files**, meaning source objects are deleted after successful ingestion, grant `s3:DeleteObject` on the target object path. | |
| - If a future SQS (S3) integration task enables **PURGE** or **Clean Up Original Files**, which deletes source objects after successful ingestion, grant `s3:DeleteObject` on the target object path. |
References
- Avoid passive voice overuse. (link)
|
|
||
| Both policies are transparent to applications: no code changes, no extra views, no data duplication. | ||
|
|
||
| ## Choosing the Right Policy |
There was a problem hiding this comment.
Use sentence case for headings.
| ## Choosing the Right Policy | |
| ## Choosing the right policy |
References
- Use sentence case for headings. (link)
| > **Note:** | ||
| > | ||
| > When changing storage configuration, existing history tables will be dropped and recreated. |
There was a problem hiding this comment.
Avoid passive voice ("will be dropped and recreated") by rewriting to active voice.
| > **Note:** | |
| > | |
| > When changing storage configuration, existing history tables will be dropped and recreated. | |
| > **Note:** | |
| > | |
| > When you change the storage configuration, {{{ .lake }}} drops and recreates existing history tables. |
References
- Avoid passive voice overuse. (link)
| ## Runtime Behavior by Task Type | ||
|
|
||
| - S3 tasks can run once or continuously poll for new files. | ||
| - SQS (S3) tasks continuously poll the SQS queue, consume S3 object creation events, and write data into the target table until manually stopped. |
There was a problem hiding this comment.
Avoid passive voice ("manually stopped") and write in the second person ("you").
| - SQS (S3) tasks continuously poll the SQS queue, consume S3 object creation events, and write data into the target table until manually stopped. | |
| - SQS (S3) tasks continuously poll the SQS queue, consume S3 object creation events, and write data into the target table until you stop them manually. |
References
- Avoid passive voice and write in second person. (link)
|
|
||
| - Amazon S3 tasks are designed for file import scenarios and mainly focus on file path patterns, file formats, and ingestion behavior. | ||
| - S3 tasks are designed for file import scenarios and mainly focus on file path patterns, file formats, and ingestion behavior. | ||
| - SQS (S3) tasks are designed for S3 event-driven data ingestion and mainly focus on the SQS queue, S3 event filters, IAM Role, and target table. |
There was a problem hiding this comment.
Avoid passive voice ("are designed for") by using active voice ("enable").
| - SQS (S3) tasks are designed for S3 event-driven data ingestion and mainly focus on the SQS queue, S3 event filters, IAM Role, and target table. | |
| - SQS (S3) tasks enable S3 event-driven data ingestion and mainly focus on the SQS queue, S3 event filters, IAM Role, and target table. |
References
- Avoid passive voice overuse. (link)
| | Privilege | Description | | ||
| |:----------|:------------| | ||
| | CREATE MASKING POLICY | Required to create or replace a masking policy. Typically granted on `*.*`. | | ||
| | CREATE MASKING POLICY | Required to create a masking policy. Typically granted on `*.*`. | |
There was a problem hiding this comment.
Avoid passive voice ("Typically granted on") and write in the second person ("you").
| | CREATE MASKING POLICY | Required to create a masking policy. Typically granted on `*.*`. | | |
| | CREATE MASKING POLICY | Required to create a masking policy. You typically grant this privilege on `*.*`. | |
References
- Avoid passive voice and write in second person. (link)
| | [FeiShuBot](/tidb-cloud-lake/guides/feishubot.md) | Stores a FeiShu bot webhook and message template for task failure notifications and similar scenarios. | | ||
|
|
||
| Not every data source corresponds to an integration task. For example, `FeiShuBot` is used for notification configuration, while `AWS - Credentials` and `MySQL - Credentials` are referenced by actual data import or synchronization tasks. | ||
| Not every data source corresponds to an integration task. For example, `FeiShuBot` is used for notification configuration, while `Amazon S3 - Credentials`, `Amazon SQS (S3) - IAM Role`, `MySQL - Credentials`, and `PostgreSQL - Credentials` are referenced by actual import, synchronization, or event-consuming tasks. |
There was a problem hiding this comment.
Avoid passive voice ("is used for", "are referenced by") and write in the second person ("you").
| Not every data source corresponds to an integration task. For example, `FeiShuBot` is used for notification configuration, while `Amazon S3 - Credentials`, `Amazon SQS (S3) - IAM Role`, `MySQL - Credentials`, and `PostgreSQL - Credentials` are referenced by actual import, synchronization, or event-consuming tasks. | |
| Not every data source corresponds to an integration task. For example, you use `FeiShuBot` for notification configuration, while actual import, synchronization, or event-consuming tasks reference `Amazon S3 - Credentials`, `Amazon SQS (S3) - IAM Role`, `MySQL - Credentials`, and `PostgreSQL - Credentials`. |
References
- Avoid passive voice and write in second person. (link)
Replace instances of 'AWS - Credentials' with 'Amazon S3 - Credentials' in tidb-cloud-lake/guides/integrate-with-amazon-s3.md to clarify the credential type; updates link text and data source field descriptions.
|
@lilin90: The following test failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
What is changed, added or deleted? (Required)
Incremental updates till 05/31/2026
Which TiDB version(s) do your changes apply to? (Required)
Tips for choosing the affected version(s):
By default, CHOOSE MASTER ONLY so your changes will be applied to the next TiDB major or minor releases. If your PR involves a product feature behavior change or a compatibility change, CHOOSE THE AFFECTED RELEASE BRANCH(ES) AND MASTER.
For details, see tips for choosing the affected versions.
What is the related PR or file link(s)?
Do your changes match any of the following descriptions?