Skip to content

lake: incremental updates 0531#22968

Open
lilin90 wants to merge 2 commits into
pingcap:feature/preview-cloud-lakefrom
lilin90:update-0531
Open

lake: incremental updates 0531#22968
lilin90 wants to merge 2 commits into
pingcap:feature/preview-cloud-lakefrom
lilin90:update-0531

Conversation

@lilin90
Copy link
Copy Markdown
Member

@lilin90 lilin90 commented May 29, 2026

What is changed, added or deleted? (Required)

Incremental updates till 05/31/2026

Which TiDB version(s) do your changes apply to? (Required)

Tips for choosing the affected version(s):

By default, CHOOSE MASTER ONLY so your changes will be applied to the next TiDB major or minor releases. If your PR involves a product feature behavior change or a compatibility change, CHOOSE THE AFFECTED RELEASE BRANCH(ES) AND MASTER.

For details, see tips for choosing the affected versions.

  • master (the latest development version)
  • v9.0 (TiDB 9.0 versions)
  • v8.5 (TiDB 8.5 versions)
  • v8.1 (TiDB 8.1 versions)
  • v7.5 (TiDB 7.5 versions)
  • v7.1 (TiDB 7.1 versions)
  • v6.5 (TiDB 6.5 versions)
  • v6.1 (TiDB 6.1 versions)

What is the related PR or file link(s)?

  • This PR is translated from:
  • Other reference link(s):

Do your changes match any of the following descriptions?

  • Delete files
  • Change aliases
  • Need modification after applied to another branch
  • Might cause conflicts after applied to another branch

@lilin90 lilin90 self-assigned this May 29, 2026
@lilin90 lilin90 added translation/no-need No need to translate this PR. area/tidb-cloud This PR relates to the area of TiDB Cloud. labels May 29, 2026
@ti-chi-bot ti-chi-bot Bot added the size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. label May 29, 2026
@ti-chi-bot
Copy link
Copy Markdown

ti-chi-bot Bot commented May 29, 2026

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please ask for approval from lilin90. For more information see the Code Review Process.
Please ensure that each of them provides their approval before proceeding.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces comprehensive documentation for Amazon SQS (S3) integration, including new guides for the SQS (S3) IAM Role data source and integration tasks, as well as a detailed guide on Data Protection Policies (masking and row access policies). It also updates several SQL reference files and guides to clarify regular expression pattern matching for staged files. The review feedback highlights opportunities to better align with the style guide by rewriting passive voice sentences into the active voice, addressing the user in the second person ('you'), applying sentence case to headings, and consistently formatting function names, keywords, and paths with backticks.


# Amazon SQS (S3) - IAM Role

This page describes how to create an `Amazon SQS (S3) - IAM Role` data source. This data source stores the configuration required to access an Amazon SQS queue and the corresponding S3 bucket, and is used for consuming S3 object creation events delivered from Amazon S3 to SQS.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

low

Rewrite to avoid passive voice ("is used for", "delivered from") and write in the second person ("you") as recommended by the style guide.

Suggested change
This page describes how to create an `Amazon SQS (S3) - IAM Role` data source. This data source stores the configuration required to access an Amazon SQS queue and the corresponding S3 bucket, and is used for consuming S3 object creation events delivered from Amazon S3 to SQS.
This page describes how to create an `Amazon SQS (S3) - IAM Role` data source. This data source stores the configuration required to access an Amazon SQS queue and the corresponding S3 bucket, and helps you consume S3 object creation events that Amazon S3 delivers to SQS.
References
  1. Avoid passive voice and write in second person. (link)


This page describes how to create an `Amazon SQS (S3) - IAM Role` data source. This data source stores the configuration required to access an Amazon SQS queue and the corresponding S3 bucket, and is used for consuming S3 object creation events delivered from Amazon S3 to SQS.

`Amazon SQS (S3) - IAM Role` only stores the connection and authorization information required for SQS (S3) ingestion. It does not consume messages by itself. The actual process of reading SQS messages, parsing S3 ObjectCreated events, and writing data into {{{ .lake }}} is performed by an [Amazon SQS (S3) Integration Task](/tidb-cloud-lake/guides/integrate-with-amazon-sqs-s3.md).
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

low

Avoid passive voice ("is performed by") and use backticks for the event name ObjectCreated to maintain consistency and adhere to the style guide.

Suggested change
`Amazon SQS (S3) - IAM Role` only stores the connection and authorization information required for SQS (S3) ingestion. It does not consume messages by itself. The actual process of reading SQS messages, parsing S3 ObjectCreated events, and writing data into {{{ .lake }}} is performed by an [Amazon SQS (S3) Integration Task](/tidb-cloud-lake/guides/integrate-with-amazon-sqs-s3.md).
An [Amazon SQS (S3) Integration Task](/tidb-cloud-lake/guides/integrate-with-amazon-sqs-s3.md) performs the actual process of reading SQS messages, parsing S3 `ObjectCreated` events, and writing data into {{{ .lake }}}.
References
  1. Avoid passive voice overuse and use backticks for code snippets/event names. (link)

>
> SQS (S3) ingestion uses the AssumeRole model. You do not need to provide AWS Access Key or Secret Key to {{{ .lake }}}. Instead, create an IAM Role in your AWS account and allow {{{ .lake }}} platform roles to obtain temporary credentials through `sts:AssumeRole` in the role trust policy.

## AWS-Side Configuration Overview
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

low

Use sentence case for headings.

Suggested change
## AWS-Side Configuration Overview
## AWS-side configuration overview
References
  1. Use sentence case for headings. (link)

- SQS permissions are scoped to the target queue ARN.
- S3 permissions are scoped to the target bucket and object ARN.
- By default, this policy does not require S3 write or delete permissions.
- If a future SQS (S3) integration task enables **PURGE** or **Clean Up Original Files**, meaning source objects are deleted after successful ingestion, grant `s3:DeleteObject` on the target object path.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

low

Avoid passive voice ("are deleted") by rewriting to active voice.

Suggested change
- If a future SQS (S3) integration task enables **PURGE** or **Clean Up Original Files**, meaning source objects are deleted after successful ingestion, grant `s3:DeleteObject` on the target object path.
- If a future SQS (S3) integration task enables **PURGE** or **Clean Up Original Files**, which deletes source objects after successful ingestion, grant `s3:DeleteObject` on the target object path.
References
  1. Avoid passive voice overuse. (link)


Both policies are transparent to applications: no code changes, no extra views, no data duplication.

## Choosing the Right Policy
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

low

Use sentence case for headings.

Suggested change
## Choosing the Right Policy
## Choosing the right policy
References
  1. Use sentence case for headings. (link)

Comment on lines +119 to +121
> **Note:**
>
> When changing storage configuration, existing history tables will be dropped and recreated.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

low

Avoid passive voice ("will be dropped and recreated") by rewriting to active voice.

Suggested change
> **Note:**
>
> When changing storage configuration, existing history tables will be dropped and recreated.
> **Note:**
>
> When you change the storage configuration, {{{ .lake }}} drops and recreates existing history tables.
References
  1. Avoid passive voice overuse. (link)

## Runtime Behavior by Task Type

- S3 tasks can run once or continuously poll for new files.
- SQS (S3) tasks continuously poll the SQS queue, consume S3 object creation events, and write data into the target table until manually stopped.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

low

Avoid passive voice ("manually stopped") and write in the second person ("you").

Suggested change
- SQS (S3) tasks continuously poll the SQS queue, consume S3 object creation events, and write data into the target table until manually stopped.
- SQS (S3) tasks continuously poll the SQS queue, consume S3 object creation events, and write data into the target table until you stop them manually.
References
  1. Avoid passive voice and write in second person. (link)


- Amazon S3 tasks are designed for file import scenarios and mainly focus on file path patterns, file formats, and ingestion behavior.
- S3 tasks are designed for file import scenarios and mainly focus on file path patterns, file formats, and ingestion behavior.
- SQS (S3) tasks are designed for S3 event-driven data ingestion and mainly focus on the SQS queue, S3 event filters, IAM Role, and target table.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

low

Avoid passive voice ("are designed for") by using active voice ("enable").

Suggested change
- SQS (S3) tasks are designed for S3 event-driven data ingestion and mainly focus on the SQS queue, S3 event filters, IAM Role, and target table.
- SQS (S3) tasks enable S3 event-driven data ingestion and mainly focus on the SQS queue, S3 event filters, IAM Role, and target table.
References
  1. Avoid passive voice overuse. (link)

| Privilege | Description |
|:----------|:------------|
| CREATE MASKING POLICY | Required to create or replace a masking policy. Typically granted on `*.*`. |
| CREATE MASKING POLICY | Required to create a masking policy. Typically granted on `*.*`. |
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

low

Avoid passive voice ("Typically granted on") and write in the second person ("you").

Suggested change
| CREATE MASKING POLICY | Required to create a masking policy. Typically granted on `*.*`. |
| CREATE MASKING POLICY | Required to create a masking policy. You typically grant this privilege on `*.*`. |
References
  1. Avoid passive voice and write in second person. (link)

| [FeiShuBot](/tidb-cloud-lake/guides/feishubot.md) | Stores a FeiShu bot webhook and message template for task failure notifications and similar scenarios. |

Not every data source corresponds to an integration task. For example, `FeiShuBot` is used for notification configuration, while `AWS - Credentials` and `MySQL - Credentials` are referenced by actual data import or synchronization tasks.
Not every data source corresponds to an integration task. For example, `FeiShuBot` is used for notification configuration, while `Amazon S3 - Credentials`, `Amazon SQS (S3) - IAM Role`, `MySQL - Credentials`, and `PostgreSQL - Credentials` are referenced by actual import, synchronization, or event-consuming tasks.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

low

Avoid passive voice ("is used for", "are referenced by") and write in the second person ("you").

Suggested change
Not every data source corresponds to an integration task. For example, `FeiShuBot` is used for notification configuration, while `Amazon S3 - Credentials`, `Amazon SQS (S3) - IAM Role`, `MySQL - Credentials`, and `PostgreSQL - Credentials` are referenced by actual import, synchronization, or event-consuming tasks.
Not every data source corresponds to an integration task. For example, you use `FeiShuBot` for notification configuration, while actual import, synchronization, or event-consuming tasks reference `Amazon S3 - Credentials`, `Amazon SQS (S3) - IAM Role`, `MySQL - Credentials`, and `PostgreSQL - Credentials`.
References
  1. Avoid passive voice and write in second person. (link)

Replace instances of 'AWS - Credentials' with 'Amazon S3 - Credentials' in tidb-cloud-lake/guides/integrate-with-amazon-s3.md to clarify the credential type; updates link text and data source field descriptions.
@ti-chi-bot
Copy link
Copy Markdown

ti-chi-bot Bot commented May 29, 2026

@lilin90: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
pull-verify 6f972d6 link true /test pull-verify

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/tidb-cloud This PR relates to the area of TiDB Cloud. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. translation/no-need No need to translate this PR.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant