Skip to content

Text attributes data truncation#893

Open
fogelito wants to merge 20 commits into
mainfrom
upsert-text-truncation
Open

Text attributes data truncation#893
fogelito wants to merge 20 commits into
mainfrom
upsert-text-truncation

Conversation

@fogelito

@fogelito fogelito commented Jun 17, 2026

Copy link
Copy Markdown
Contributor

Summary by CodeRabbit

Release Notes

  • New Features

    • TEXT, MEDIUMTEXT, and LONGTEXT columns now enforce byte-length limits for UTF-8 content, preventing silent data truncation and providing clear error messages when capacity is exceeded.
  • Tests

    • Added comprehensive test coverage for byte-length validation across document creation, validation, and updates.

@coderabbitai

coderabbitai Bot commented Jun 17, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

Warning

Review limit reached

@fogelito, we couldn't start this review because you've reached your PR review rate limit.

More reviews will be available in 22 minutes and 29 seconds. Learn how PR review limits work.

Your organization has used up its prepaid credits, and credit purchases are no longer available. Enable the review add-on in the billing tab to keep reviews running — you're only billed for reviews past your plan's rate limits ($0.25/file).

⌛ How to resolve this issue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

To avoid repeated limits, reduce automatic review volume by pausing incremental auto-reviews earlier, using label-based review opt-in, excluding WIP or generated PR titles, or requesting reviews manually when the PR is ready. If your team needs uninterrupted high-volume reviews, an organization admin can enable usage-based credits.

🚦 How do rate limits work?

CodeRabbit enforces per-developer PR review limits for each organization. Most developers receive the normal plan refill rate.

For paid Pro and Pro+ PR reviews, CodeRabbit uses adaptive limits for sustained high-volume activity. When a developer's recent PR review activity reaches the 95th percentile or higher among CodeRabbit users, the refill rate gradually slows as usage increases. The highest same-day bursts are limited more strictly.

Please see our Fair Usage Limits Policy for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 1244f4e9-ea49-4bf5-9912-07b0df30eb05

📥 Commits

Reviewing files that changed from the base of the PR and between cf959b1 and 8c7b131.

📒 Files selected for processing (4)
  • src/Database/Adapter/SQLite.php
  • src/Database/Validator/Structure.php
  • tests/e2e/Adapter/Scopes/DocumentTests.php
  • tests/unit/Validator/StructureTest.php
📝 Walkthrough

Walkthrough

Three new Database constants define TEXT-family byte capacities. A new ByteLength validator enforces strlen-based limits. Structure::checkForInvalidAttributeValues() adds dedicated cases for VAR_TEXT, VAR_MEDIUMTEXT, and VAR_LONGTEXT, each pairing a Text validator with ByteLength. Attribute::checkType(), MariaDB::getSQLType(), and SQLite byte-ceiling constants replace hardcoded literals with the new constants. Unit and E2E tests cover rejection of over-capacity multibyte strings and acceptance of in-limit values.

Changes

Byte-safe TEXT validation

Layer / File(s) Summary
Byte-size constants and ByteLength validator
src/Database/Database.php, src/Database/Validator/ByteLength.php
Database gains MAX_TEXT_BYTES, MAX_MEDIUMTEXT_BYTES, and MAX_LONGTEXT_BYTES constants. New ByteLength validator stores a max byte cap and enforces it via strlen, returning invalid for non-strings or strings exceeding the configured limit.
Validator and adapter integration
src/Database/Validator/Structure.php, src/Database/Validator/Attribute.php, src/Database/Adapter/MariaDB.php, src/Database/Adapter/SQLite.php
Structure::checkForInvalidAttributeValues() splits VAR_TEXT/VAR_MEDIUMTEXT/VAR_LONGTEXT into dedicated cases, each applying a Text validator and a ByteLength validator capped at the matching MAX_*_BYTES constant. Attribute::checkType() replaces hardcoded byte thresholds with the same constants. MariaDB::getSQLType() and SQLite byte-ceiling constants reference the constants instead of literals.
Unit tests
tests/unit/Validator/StructureTest.php
Adds testTextByteSafeValidationTooBig, testTextByteSafeValidationMultibyte, and testTextByteSafeValidationValid asserting byte-length enforcement for a VAR_TEXT attribute whose declared size exceeds the effective 65,535-byte capacity. Annotates the existing testTextValidation boundary value.
E2E tests
tests/e2e/Adapter/Scopes/DocumentTests.php
Adds testTextByteTruncationCreate, testTextByteTruncationValid, and testTextByteTruncationUpdate confirming that over-capacity multibyte emoji strings throw StructureException on createDocument() and updateDocument(), while a 65,535-byte ASCII value round-trips unchanged.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

  • utopia-php/database#788: Adds VAR_TEXT/VAR_MEDIUMTEXT/VAR_LONGTEXT mappings in MariaDB::getSQLType(), which this PR further modifies to use the new Database::MAX_*TEXT_BYTES thresholds.

Suggested reviewers

  • abnegate
  • ArnabChatterjee20k

Poem

🐰 Hop, hop, bytes in a row,
No emoji too large shall overflow!
ByteLength guards the TEXT with care,
65535 bytes — not one more dare.
The rabbit checks strlen and grins,
utf8mb4 safety wins! 🎉

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 21.05% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'Text attributes data truncation' clearly describes the main change: the PR adds byte-length validation and prevention of truncation for TEXT-family column types, directly addressing the core objective of enforcing text truncation boundaries.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch upsert-text-truncation

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (2)
tests/unit/Validator/StructureTest.php (2)

412-413: 💤 Low value

Minor: Multibyte test lacks error message assertion.

For consistency with the ASCII overflow test on line 409, consider asserting the error message for the multibyte test as well.

🔍 Add assertion for consistency
         // Multi-byte content over the limit is rejected the same way.
         $multibyte = \str_repeat('📝', 20000);
         $this->assertEquals(false, $validator->isValid(new Document($base + ['blocks_json' => $multibyte])));
+        $this->assertEquals('Invalid document structure: Attribute "blocks_json" has invalid type. Value must be a valid string and no longer than 16383 chars', $validator->getDescription());
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/unit/Validator/StructureTest.php` around lines 412 - 413, The multibyte
overflow test starting at the line with str_repeat and '📝' is missing an
assertion for the error message, unlike the ASCII overflow test on line 409.
After the assertEquals call that validates isValid returns false for the
multibyte Document, add an additional assertion to check the error message
returned by the validator, following the same pattern as the ASCII overflow test
for consistency.

995-1047: 💤 Low value

Consider adding test case for byte-safe limit edge.

While testTextByteSafeValidation comprehensively tests the 16,383-character byte-safe limit for legacy attributes, testTextValidation could benefit from a similar edge-case test. Currently it tests 16,383 chars (pass) and 65,536 chars (fail due to declared size), but doesn't verify that 16,384 chars would fail due to the byte-safe limit.

🧪 Optional test case to add

Add this test case between the existing assertions to verify byte-safe enforcement:

        $this->assertEquals(false, $validator->isValid(new Document([
            '$collection' => ID::custom('posts'),
            'title' => 'Demo Title',
            'description' => 'Demo description',
            'rating' => 5,
            'price' => 1.99,
            'published' => true,
            'tags' => ['dog', 'cat', 'mouse'],
            'feedback' => 'team@appwrite.io',
            'text_field' => \str_repeat('a', 16384),
            '$createdAt' => '2000-04-01T12:00:00.000+00:00',
            '$updatedAt' => '2000-04-01T12:00:00.000+00:00'
        ])));

        $this->assertEquals('Invalid document structure: Attribute "text_field" has invalid type. Value must be a valid string and no longer than 16383 chars', $validator->getDescription());
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/unit/Validator/StructureTest.php` around lines 995 - 1047, The
testTextValidation method should include an additional test case to verify the
byte-safe character limit edge case. Add a new assertion between the existing
test cases that validates a Document with text_field set to \str_repeat('a',
16384) should fail (assertEquals false), and verify the corresponding error
message from getDescription indicates the byte-safe limit of 16383 characters,
similar to the pattern already established in the method with the 65536
character test case.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/Database/Database.php`:
- Around line 118-121: The MAX_LONGTEXT_BYTES constant in the Database class is
set to a value that exceeds the 32-bit signed integer maximum, causing it to be
converted to a float in 32-bit PHP environments. This breaks the intdiv call in
downstream code that expects integer arguments. Fix this by explicitly casting
MAX_LONGTEXT_BYTES to int using the (int) type cast operator to ensure it
remains an integer on all PHP platforms, including 32-bit environments.

---

Nitpick comments:
In `@tests/unit/Validator/StructureTest.php`:
- Around line 412-413: The multibyte overflow test starting at the line with
str_repeat and '📝' is missing an assertion for the error message, unlike the
ASCII overflow test on line 409. After the assertEquals call that validates
isValid returns false for the multibyte Document, add an additional assertion to
check the error message returned by the validator, following the same pattern as
the ASCII overflow test for consistency.
- Around line 995-1047: The testTextValidation method should include an
additional test case to verify the byte-safe character limit edge case. Add a
new assertion between the existing test cases that validates a Document with
text_field set to \str_repeat('a', 16384) should fail (assertEquals false), and
verify the corresponding error message from getDescription indicates the
byte-safe limit of 16383 characters, similar to the pattern already established
in the method with the 65536 character test case.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 338e4613-aa3d-423d-af38-75776ac9f9df

📥 Commits

Reviewing files that changed from the base of the PR and between cfba533 and cbdc3cf.

📒 Files selected for processing (4)
  • src/Database/Database.php
  • src/Database/Validator/Structure.php
  • tests/e2e/Adapter/Scopes/DocumentTests.php
  • tests/unit/Validator/StructureTest.php

Comment thread src/Database/Database.php Outdated
@greptile-apps

greptile-apps Bot commented Jun 17, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

This PR adds byte-length validation for VAR_TEXT, VAR_MEDIUMTEXT, and VAR_LONGTEXT attributes to prevent silent data truncation when multi-byte (utf8mb4) content exceeds a MySQL/MariaDB column's physical byte capacity. It also centralises the three TEXT-family byte ceilings as named constants on Database and replaces all previous magic-number comparisons throughout the adapters and validators.

  • A new ByteLength validator is introduced and applied in Structure as two guards per text attribute: one for the user-declared size and one for the physical column cap, so legacy attributes whose $size exceeds the column ceiling are correctly capped at the column limit rather than the (potentially larger) declared value.
  • MariaDB::getSQLType() now uses Database::MAX_*_BYTES when selecting the TEXT-family column type for VAR_STRING, and SQLite's MySQL-emulation introspection output is updated to match.
  • Unit and E2E test coverage is added for over-capacity emoji rejection, full-capacity ASCII round-trips, and update-path validation.

Confidence Score: 4/5

The core byte-length validation logic is sound and well-tested, but the implementation warrants a careful second look before merging given its cross-adapter reach.

The ByteLength validator correctly measures bytes via strlen and the dual-validator pattern in Structure handles the legacy over-declared-size scenario cleanly. The concern is that Structure is adapter-agnostic while the new byte caps are MySQL/MariaDB-specific constraints — adapters whose storage does not use TEXT-family byte limits (Postgres, Mongo, Redis) now enforce the same 65,535-byte ceiling even though their backing store would happily accept larger values. Similarly, array TEXT attributes are stored as JSON by the SQL adapters but still have the TEXT byte limit applied to each element. These cross-cutting issues affect correctness on non-MySQL adapters and could silently reject data that was previously accepted.

src/Database/Validator/Structure.php — the adapter-agnostic placement of MySQL-specific byte caps deserves the most scrutiny before merge.

Important Files Changed

Filename Overview
src/Database/Validator/ByteLength.php New validator measuring actual byte length via strlen(); isValid(0) treats max=0 as unlimited but getDescription() still prints "no longer than 0 bytes", which is misleading for non-string values on attributes with a missing size.
src/Database/Validator/Structure.php Replaces character-based Text validator with ByteLength for VAR_TEXT/MEDIUMTEXT/LONGTEXT; both a declared-size and a column-type-max ByteLength are added per attribute, correctly capping legacy over-declared sizes at the column physical limit.
src/Database/Database.php Added three public constants (MAX_TEXT_BYTES, MAX_MEDIUMTEXT_BYTES, MAX_LONGTEXT_BYTES) that centralise the MySQL/MariaDB TEXT-family byte ceilings; straightforward constant extraction.
src/Database/Adapter/MariaDB.php VAR_STRING column-type thresholds now reference the new Database constants instead of magic numbers; behaviour is unchanged, only the source of the values changes.
src/Database/Adapter/SQLite.php Private constants removed; PRAGMA-based characterMaximumLength emulation now uses Database::MAX_*_BYTES constants; minor extra whitespace in string concatenations but no logic change.
src/Database/Validator/Attribute.php Size-cap checks for VAR_TEXT, VAR_MEDIUMTEXT, and VAR_LONGTEXT now reference the new constants; pure refactor with no behaviour change.
tests/e2e/Adapter/Scopes/DocumentTests.php Three new E2E tests cover over-capacity emoji rejection on create/update and full-capacity ASCII round-trip; skipped for schema-less adapters; collections created under FUNCTION names and not explicitly deleted.
tests/unit/Validator/StructureTest.php Adds unit tests for legacy over-declared size, multibyte rejection, valid exact-fit, null optional pass, and sized mediumtext; updates error message assertions to match new ByteLength wording.

Reviews (11): Last reviewed commit: "tests" | Re-trigger Greptile

Comment thread src/Database/Validator/Structure.php Outdated
Comment thread src/Database/Validator/Structure.php Outdated
Comment thread src/Database/Validator/Structure.php Outdated
Comment thread src/Database/Validator/Structure.php Outdated
Comment thread tests/unit/Validator/StructureTest.php Outdated
Comment thread src/Database/Adapter/MariaDB.php
Comment thread src/Database/Validator/Structure.php
Comment thread tests/e2e/Adapter/Base.php Outdated

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (1)
tests/unit/Validator/StructureTest.php (1)

441-445: ⚡ Quick win

Assert the exact failure reason in the multibyte overflow test.

This test currently validates only false; asserting the description too will lock the byte-limit contract and catch regressions in failure source.

Suggested test hardening
 $multibyte = \str_repeat('📝', 20000);
 $this->assertEquals(false, $validator->isValid(new Document($base + ['text' => $multibyte])));
+$this->assertEquals(
+    'Invalid document structure: Attribute "text" has invalid type. Value must be a valid string no longer than 65535 bytes',
+    $validator->getDescription()
+);
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/unit/Validator/StructureTest.php` around lines 441 - 445, The multibyte
overflow test in the test method currently only asserts that the validator
returns false but does not verify the specific failure reason. Strengthen this
test by adding an additional assertion that checks the failure description or
error message from the validator after the isValid call with the multibyte
document. This will ensure that the specific byte-limit contract is properly
validated and catch any regressions where the failure source changes. Reference
the validator's method to retrieve the error description (commonly methods like
getError, getFailureDescription, or similar) to assert the expected error
message alongside the false result.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@phpunit.xml`:
- Line 10: The stopOnFailure attribute in the phpunit.xml configuration is set
to "true", which causes the test suite to halt immediately upon encountering the
first failure. This masks subsequent failures and reduces diagnostic value in CI
environments. Change stopOnFailure from "true" to "false" to allow the full test
suite to execute and report all failures in a single run, providing better
visibility into the overall test health.

In `@src/Database/Validator/ByteLength.php`:
- Line 53: The file is missing a trailing newline at the end of the file, which
violates PSR-12 standards that require a single blank line at the end of every
file. Add a single newline character after the closing brace at the end of the
ByteLength.php file to satisfy the Pint code style checker and resolve the
single_blank_line_at_eof CI failure.

In `@tests/e2e/Adapter/Base.php`:
- Around line 26-27: Uncomment the two trait usage statements in the Base class
by removing the leading // from the lines containing CollectionTests and
CustomDocumentTypeTests. These commented-out trait usages need to be re-enabled
so that their test methods are included in the test discovery and execution,
restoring the full E2E coverage that these traits provide to the adapter tests.

---

Nitpick comments:
In `@tests/unit/Validator/StructureTest.php`:
- Around line 441-445: The multibyte overflow test in the test method currently
only asserts that the validator returns false but does not verify the specific
failure reason. Strengthen this test by adding an additional assertion that
checks the failure description or error message from the validator after the
isValid call with the multibyte document. This will ensure that the specific
byte-limit contract is properly validated and catch any regressions where the
failure source changes. Reference the validator's method to retrieve the error
description (commonly methods like getError, getFailureDescription, or similar)
to assert the expected error message alongside the false result.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 7d8392e1-0183-4343-9aef-8defb270cf4f

📥 Commits

Reviewing files that changed from the base of the PR and between 24cb977 and 3dc94f9.

📒 Files selected for processing (6)
  • phpunit.xml
  • src/Database/Validator/ByteLength.php
  • src/Database/Validator/Structure.php
  • tests/e2e/Adapter/Base.php
  • tests/e2e/Adapter/Scopes/DocumentTests.php
  • tests/unit/Validator/StructureTest.php

Comment thread phpunit.xml Outdated
Comment thread src/Database/Validator/ByteLength.php Outdated
Comment thread tests/e2e/Adapter/Base.php Outdated
@fogelito fogelito changed the title Upsert text truncation Text attributes data truncation Jun 18, 2026
Comment thread src/Database/Adapter/SQLite.php Outdated
Comment on lines +44 to +46
private const MARIADB_TEXT_BYTES = '' . Database::MAX_TEXT_BYTES;
private const MARIADB_MEDIUMTEXT_BYTES = '' . Database::MAX_MEDIUMTEXT_BYTES;
private const MARIADB_LONGTEXT_BYTES = '' . Database::MAX_LONGTEXT_BYTES;

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it need to be a string?

case Database::VAR_LONGTEXT:
$validators[] = new Text($size, min: 0);
$validators[] = new ByteLength(Database::MAX_LONGTEXT_BYTES);
break;

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need both?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants