feat(ingestion): implementar fundação do módulo de ingestão com TimescaleDB e dual-write Discord#318
Conversation
…caleDB e dual-write Discord
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Repository YAML (base), Central YAML (inherited) Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (4)
🚧 Files skipped from review as they are similar to previous changes (3)
📝 WalkthroughWalkthroughA new Laravel ingestion module extracts Discord ETL into a TimescaleDB-backed app. MessageReceivedEvent now dispatches Possibly related PRs
Suggested reviewers
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 12
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In
`@app-modules/ingestion/database/migrations/2026_06_09_000000_create_timescaledb_tables.php`:
- Around line 18-20: The migration's up() currently calls
Schema::connection('timescaledb')->dropIfExists('voice_messages'),
->dropIfExists('messages') and ->dropIfExists('raw_payloads'), which will
destroy existing data when re-run; remove these dropIfExists calls from the up()
method (or move them to down()) so the migration is idempotent, or wrap them
with an explicit environment guard (e.g., check app environment like
app()->environment('local') or a config flag) so drops only occur in local/dev;
update the migration file to reference the same
Schema::connection('timescaledb') block but without unguarded dropIfExists calls
and ensure down() performs safe teardown if needed.
In `@app-modules/ingestion/src/Actions/TransformDiscordMessage.php`:
- Line 32: The current TransformDiscordMessage sets a new random UUID for the
'id' field which breaks cross-database correlation; change the assignment in
TransformDiscordMessage so 'id' is derived deterministically from the Discord
provider ID (provider_message_id) instead of Str::uuid(): either set 'id' =>
$this->provider_message_id (if that value is globally unique) or generate a
name-based UUID (e.g., UUID v5) from provider_message_id using your UUID library
(e.g., Ramsey\Uuid::uuid5) so the same input yields the same id across Postgres
and TimescaleDB; update the code that builds the array with the 'id' key and
keep the provider_message_id field intact.
- Line 20: The conditional in TransformDiscordMessage that directly accesses
$data['id'] and $data['author']['id'] can trigger "undefined array key"
warnings; update the check to use safe access (e.g., null coalescing or
Laravel's data_get/Arr::get) so you read $data['id'] and $data['author']['id']
safely before passing them to blank(), and keep the same logical behavior (treat
missing keys as blank).
In
`@app-modules/ingestion/src/Console/Commands/BackfillPostgresToTimescaleCommand.php`:
- Around line 20-25: The code reads the CLI option into $chunkSize and passes it
to DB::chunk without validating it; ensure the
BackfillPostgresToTimescaleCommand validates the chunk size after computing
$chunkSize = (int) $this->option('chunk') and before calling
DB::connection(...)->table('messages')->orderBy('id')->chunk(...): if $chunkSize
< 1, fail fast by emitting a clear error (e.g., $this->error('Invalid --chunk
value; must be >= 1')) and return a non-zero exit code (or throw an
InvalidArgumentException) so the chunk() call never receives 0/negative values.
- Around line 23-58: The current use of
DB::connection('pgsql')->table('messages')->orderBy('id')->chunk(...) can skip
rows if the source table is being written during backfill; replace the chunk()
pattern with a stable window + cursor loop: capture a fixed upper bound (e.g.,
$maxId = DB::connection('pgsql')->table('messages')->max('id')) before paging,
then loop using a deterministic cursor (e.g., where('id', '>',
$lastId)->where('id', '<=', $maxId)->orderBy('id')->limit($chunkSize)), build
$payloads and upsert into timescaledb, and set $lastId to the last processed
message id before the next iteration; update the code around the chunk usage in
BackfillPostgresToTimescaleCommand to implement this cursor-based pagination and
remove chunk().
In `@app-modules/ingestion/src/Listeners/ProcessRawDiscordMessage.php`:
- Line 31: ProcessRawDiscordMessage calls new
TransformDiscordMessage()->execute($record) without idempotency or error
handling, so if execute() throws after creating RawPayload a retry will
duplicate records; update ProcessRawDiscordMessage to either perform an
idempotency check (e.g., look for existing RawPayload by unique message id or a
deterministic hash before calling TransformDiscordMessage->execute) or wrap the
call in a try-catch that logs the error and prevents duplicate creation (for
example, ensure TransformDiscordMessage::execute returns a status/throws only on
unrecoverable errors and that RawPayload creation is gated by a uniqueness
check), and add logging to record failures for debugging.
In `@app-modules/ingestion/src/Models/Message.php`:
- Line 18: The Message model currently uses a permissive protected $guarded = []
which allows mass assignment of all attributes; replace this with an explicit
protected $fillable = [...] on the Message class listing only the columns that
should be mass-assignable (e.g. the specific attributes for a message like
sender_id, recipient_id, body/content, etc. as applicable in your schema),
remove or comment out $guarded, and ensure any fields that must never be
mass-assigned (ids, timestamps, foreign keys you control) are omitted from the
$fillable array so mass assignment is restricted to safe attributes.
- Around line 28-30: The WHERE clause omits sent_at when it's null because the
code uses isset($this->sent_at); change the check to detect presence of the
property (e.g., property_exists($this, 'sent_at') or array_key_exists('sent_at',
get_object_vars($this))) and always add a sent_at predicate to the query in the
Message model: if $this->sent_at === null call $query->whereNull('sent_at') else
call $query->where('sent_at', '=', $this->sent_at); update the code that
currently uses isset($this->sent_at) and the $query->where(...) lines
accordingly so TimescaleDB always receives a time-column condition.
In `@app-modules/ingestion/src/Models/RawPayload.php`:
- Line 18: The model RawPayload currently sets protected $guarded = [] which
permits mass assignment of all attributes; replace this with a protected
$fillable = ['provider', 'event_type', 'payload'] declaration so only those
fields can be mass-assigned and sensitive fields like id, created_at, updated_at
remain protected. Locate the RawPayload class and remove or replace the $guarded
property with the $fillable array as specified.
In `@composer.json`:
- Line 26: The dependency constraint for he4rt/ingestion in composer.json is
unbounded (">=1"); update the version requirement to a bounded,
semver-compatible range such as "^1.0" (which equals ">=1.0.0 <2.0.0") to
prevent pulling future major releases—replace the existing ">=1" constraint for
the "he4rt/ingestion" package with "^1.0" (or another strict range like "~1.0"
if preferred).
In `@config/database.php`:
- Line 116: Replace the hardcoded 'sslmode' => 'prefer' entry so TLS behavior is
driven by an environment variable (e.g. DB_SSLMODE) and defaults to a secure
mode for non-local environments; update the database config in
config/database.php to read env('DB_SSLMODE', (env('APP_ENV') === 'local' ?
'prefer' : 'require')) (or similar) and use that value for the 'sslmode' key so
local dev can still fall back but staging/production default to 'require' or
stricter.
In `@docker-compose.yml`:
- Line 30: The timescaledb service using image timescale/timescaledb-ha:pg16 is
mounting the named volume he4rtbot-timescaledb to /var/lib/postgresql which does
not match this image’s default PGDATA; update the docker-compose service that
references image timescale/timescaledb-ha:pg16 to either (a) mount
he4rtbot-timescaledb to /home/postgres/pgdata/data, or (b) set the PGDATA
environment variable to /var/lib/postgresql and ensure the named volume mounts
to that exact path—adjust the volume mount target and/or PGDATA in the service
definition accordingly so data persists across container recreation.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository YAML (base), Central YAML (inherited)
Review profile: CHILL
Plan: Pro
Run ID: 025e8413-81e0-4a4c-821d-806212054a4c
⛔ Files ignored due to path filters (2)
composer.lockis excluded by!**/*.lockpackage-lock.jsonis excluded by!**/package-lock.json
📒 Files selected for processing (20)
app-modules/bot-discord/src/Events/MessageReceivedEvent.phpapp-modules/ingestion/composer.jsonapp-modules/ingestion/database/factories/.gitkeepapp-modules/ingestion/database/migrations/.gitkeepapp-modules/ingestion/database/migrations/2026_06_09_000000_create_timescaledb_tables.phpapp-modules/ingestion/database/seeders/.gitkeepapp-modules/ingestion/phpstan.ignore.neonapp-modules/ingestion/phpstan.neonapp-modules/ingestion/src/Actions/TransformDiscordMessage.phpapp-modules/ingestion/src/Console/Commands/BackfillPostgresToTimescaleCommand.phpapp-modules/ingestion/src/Listeners/ProcessRawDiscordMessage.phpapp-modules/ingestion/src/Models/Message.phpapp-modules/ingestion/src/Models/RawPayload.phpapp-modules/ingestion/src/Providers/IngestionServiceProvider.phpapp-modules/ingestion/tests/Feature/.gitkeepapp-modules/ingestion/tests/Feature/DualWriteAndBackfillTest.phpapp-modules/ingestion/tests/Unit/.gitkeepcomposer.jsonconfig/database.phpdocker-compose.yml
| Schema::connection('timescaledb')->dropIfExists('voice_messages'); | ||
| Schema::connection('timescaledb')->dropIfExists('messages'); | ||
| Schema::connection('timescaledb')->dropIfExists('raw_payloads'); |
There was a problem hiding this comment.
Data loss risk: migration drops existing tables.
dropIfExists in up() destroys all data if the migration is re-run. Migrations should be idempotent. Remove these drops or guard them with environment checks.
🛡️ Proposed fix
DB::connection('timescaledb')->statement('CREATE EXTENSION IF NOT EXISTS timescaledb CASCADE;');
- Schema::connection('timescaledb')->dropIfExists('voice_messages');
- Schema::connection('timescaledb')->dropIfExists('messages');
- Schema::connection('timescaledb')->dropIfExists('raw_payloads');
-
Schema::connection('timescaledb')->create('raw_payloads', function (Blueprint $table): void {Alternatively, if these drops are intentional for local development only, add a guard:
DB::connection('timescaledb')->statement('CREATE EXTENSION IF NOT EXISTS timescaledb CASCADE;');
+ if (app()->environment('local', 'testing')) {
Schema::connection('timescaledb')->dropIfExists('voice_messages');
Schema::connection('timescaledb')->dropIfExists('messages');
Schema::connection('timescaledb')->dropIfExists('raw_payloads');
+ }
Schema::connection('timescaledb')->create('raw_payloads', function (Blueprint $table): void {📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| Schema::connection('timescaledb')->dropIfExists('voice_messages'); | |
| Schema::connection('timescaledb')->dropIfExists('messages'); | |
| Schema::connection('timescaledb')->dropIfExists('raw_payloads'); | |
| DB::connection('timescaledb')->statement('CREATE EXTENSION IF NOT EXISTS timescaledb CASCADE;'); | |
| Schema::connection('timescaledb')->create('raw_payloads', function (Blueprint $table): void { | |
| // raw_payloads table definition... | |
| }); | |
| // ... rest of migration code |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In
`@app-modules/ingestion/database/migrations/2026_06_09_000000_create_timescaledb_tables.php`
around lines 18 - 20, The migration's up() currently calls
Schema::connection('timescaledb')->dropIfExists('voice_messages'),
->dropIfExists('messages') and ->dropIfExists('raw_payloads'), which will
destroy existing data when re-run; remove these dropIfExists calls from the up()
method (or move them to down()) so the migration is idempotent, or wrap them
with an explicit environment guard (e.g., check app environment like
app()->environment('local') or a config flag) so drops only occur in local/dev;
update the migration file to reference the same
Schema::connection('timescaledb') block but without unguarded dropIfExists calls
and ensure down() performs safe teardown if needed.
| } | ||
|
|
||
| $extraColumns = [ | ||
| 'id' => Str::uuid()->toString(), |
There was a problem hiding this comment.
New UUID breaks cross-database correlation.
Generating a fresh UUID means the same Discord message has different IDs in Postgres (legacy) and TimescaleDB. This prevents joining or correlating records during dual-write. Use provider_message_id as the stable identifier or derive a deterministic UUID.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@app-modules/ingestion/src/Actions/TransformDiscordMessage.php` at line 32,
The current TransformDiscordMessage sets a new random UUID for the 'id' field
which breaks cross-database correlation; change the assignment in
TransformDiscordMessage so 'id' is derived deterministically from the Discord
provider ID (provider_message_id) instead of Str::uuid(): either set 'id' =>
$this->provider_message_id (if that value is globally unique) or generate a
name-based UUID (e.g., UUID v5) from provider_message_id using your UUID library
(e.g., Ramsey\Uuid::uuid5) so the same input yields the same id across Postgres
and TimescaleDB; update the code that builds the array with the 'id' key and
keep the provider_message_id field intact.
| $chunkSize = (int) $this->option('chunk'); | ||
| $processed = 0; | ||
|
|
||
| DB::connection('pgsql')->table('messages') | ||
| ->orderBy('id') | ||
| ->chunk($chunkSize, function ($messages) use (&$processed): void { |
There was a problem hiding this comment.
Validate --chunk before calling chunk().
Line 20 allows 0/negative values, which can break execution at Line 25. Enforce >= 1 and fail fast with a clear error.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In
`@app-modules/ingestion/src/Console/Commands/BackfillPostgresToTimescaleCommand.php`
around lines 20 - 25, The code reads the CLI option into $chunkSize and passes
it to DB::chunk without validating it; ensure the
BackfillPostgresToTimescaleCommand validates the chunk size after computing
$chunkSize = (int) $this->option('chunk') and before calling
DB::connection(...)->table('messages')->orderBy('id')->chunk(...): if $chunkSize
< 1, fail fast by emitting a clear error (e.g., $this->error('Invalid --chunk
value; must be >= 1')) and return a non-zero exit code (or throw an
InvalidArgumentException) so the chunk() call never receives 0/negative values.
| DB::connection('pgsql')->table('messages') | ||
| ->orderBy('id') | ||
| ->chunk($chunkSize, function ($messages) use (&$processed): void { | ||
| $payloads = []; | ||
|
|
||
| foreach ($messages as $msg) { | ||
| $payloads[] = [ | ||
| 'id' => $msg->id, | ||
| 'tenant_id' => $msg->tenant_id, | ||
| 'external_identity_id' => $msg->external_identity_id, | ||
| 'provider_message_id' => $msg->provider_message_id, | ||
| 'channel_id' => $msg->channel_id, | ||
| 'content' => $msg->content, | ||
| 'obtained_experience' => $msg->obtained_experience, | ||
| 'metadata' => $msg->metadata, | ||
| 'reactions_count' => $msg->reactions_count, | ||
| 'reactions_total' => $msg->reactions_total, | ||
| 'kind' => $msg->kind, | ||
| 'raw_message_type' => $msg->raw_message_type, | ||
| 'source_kind' => $msg->source_kind, | ||
| 'is_pinned' => $msg->is_pinned, | ||
| 'mentions_everyone' => $msg->mentions_everyone, | ||
| 'mention_role_count' => $msg->mention_role_count, | ||
| 'edited_at' => $msg->edited_at, | ||
| 'reply_to_provider_message_id' => $msg->reply_to_provider_message_id, | ||
| 'sent_at' => $msg->sent_at, | ||
| 'created_at' => $msg->created_at, | ||
| 'updated_at' => $msg->updated_at, | ||
| ]; | ||
| } | ||
|
|
||
| DB::connection('timescaledb')->table('messages')->upsert($payloads, ['id', 'sent_at']); | ||
|
|
||
| $processed += count($messages); | ||
| $this->info(sprintf('Processed %d messages...', $processed)); | ||
| }); |
There was a problem hiding this comment.
chunk() on a mutating source can lose rows during backfill.
Lines 23-58 use offset pagination (chunk) while the legacy table may still receive writes during dual-write. This can skip records, causing incomplete migration even with upsert. Use a stable window (captured upper bound) plus deterministic cursor strategy.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In
`@app-modules/ingestion/src/Console/Commands/BackfillPostgresToTimescaleCommand.php`
around lines 23 - 58, The current use of
DB::connection('pgsql')->table('messages')->orderBy('id')->chunk(...) can skip
rows if the source table is being written during backfill; replace the chunk()
pattern with a stable window + cursor loop: capture a fixed upper bound (e.g.,
$maxId = DB::connection('pgsql')->table('messages')->max('id')) before paging,
then loop using a deterministic cursor (e.g., where('id', '>',
$lastId)->where('id', '<=', $maxId)->orderBy('id')->limit($chunkSize)), build
$payloads and upsert into timescaledb, and set $lastId to the last processed
message id before the next iteration; update the code around the chunk usage in
BackfillPostgresToTimescaleCommand to implement this cursor-based pagination and
remove chunk().
| if (isset($this->sent_at)) { | ||
| $query->where('sent_at', '=', $this->sent_at); | ||
| } |
There was a problem hiding this comment.
Incomplete WHERE clause for TimescaleDB composite key updates.
isset($this->sent_at) returns false when sent_at is null, omitting it from the update's WHERE clause. TimescaleDB hypertables require the time column in every WHERE clause for updates. If sent_at is missing or null, the update will fail or target the wrong partition.
🛡️ Proposed fix
- if (isset($this->sent_at)) {
+ if (!is_null($this->sent_at)) {
$query->where('sent_at', '=', $this->sent_at);
+ } else {
+ throw new \RuntimeException('sent_at is required for TimescaleDB updates');
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| if (isset($this->sent_at)) { | |
| $query->where('sent_at', '=', $this->sent_at); | |
| } | |
| if (!is_null($this->sent_at)) { | |
| $query->where('sent_at', '=', $this->sent_at); | |
| } else { | |
| throw new \RuntimeException('sent_at is required for TimescaleDB updates'); | |
| } |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@app-modules/ingestion/src/Models/Message.php` around lines 28 - 30, The WHERE
clause omits sent_at when it's null because the code uses isset($this->sent_at);
change the check to detect presence of the property (e.g.,
property_exists($this, 'sent_at') or array_key_exists('sent_at',
get_object_vars($this))) and always add a sent_at predicate to the query in the
Message model: if $this->sent_at === null call $query->whereNull('sent_at') else
call $query->where('sent_at', '=', $this->sent_at); update the code that
currently uses isset($this->sent_at) and the $query->where(...) lines
accordingly so TimescaleDB always receives a time-column condition.
| { | ||
| use HasUuids; | ||
|
|
||
| protected $guarded = []; |
There was a problem hiding this comment.
Mass assignment vulnerability.
$guarded = [] allows assignment of any attribute, including id, created_at, and updated_at. Define $fillable = ['provider', 'event_type', 'payload'] instead.
🔒 Proposed fix
- protected $guarded = [];
+ protected $fillable = ['provider', 'event_type', 'payload'];📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| protected $guarded = []; | |
| protected $fillable = ['provider', 'event_type', 'payload']; |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@app-modules/ingestion/src/Models/RawPayload.php` at line 18, The model
RawPayload currently sets protected $guarded = [] which permits mass assignment
of all attributes; replace this with a protected $fillable = ['provider',
'event_type', 'payload'] declaration so only those fields can be mass-assigned
and sensitive fields like id, created_at, updated_at remain protected. Locate
the RawPayload class and remove or replace the $guarded property with the
$fillable array as specified.
| "he4rt/gamification": ">=1", | ||
| "he4rt/he4rt-core": ">=1", | ||
| "he4rt/identity": ">=1", | ||
| "he4rt/ingestion": ">=1", |
There was a problem hiding this comment.
🧩 Analysis chain
🌐 Web query:
In Composer version constraints, what are the upgrade risks of using >=1versus^1.0 for internal Laravel packages?
💡 Result:
When defining version constraints in Composer, using >=1 versus ^1.0 for internal Laravel packages introduces significant differences in risk, stability, and predictability. The primary upgrade risk of using >=1 is that it is an unbound constraint [1]. By failing to specify an upper bound, you allow Composer to install any future version of the package, including major releases that contain breaking changes [1]. If an internal package undergoes a major architectural change or removes functionality, your application or other dependent packages may break immediately upon the next composer update [1]. This lack of an upper limit violates the principles of Semantic Versioning (SemVer), where major version increments are explicitly intended to signal breaking changes [1][2]. In contrast, the ^1.0 constraint (the caret operator) is widely considered the best practice for libraries and internal packages because it respects SemVer [3][4][5]. The ^1.0 constraint is equivalent to >=1.0.0 <2.0.0 [3][6]. It allows Composer to automatically install bug fixes and new, backward-compatible features while preventing the installation of major versions that could break your application [3][1][7]. Key Risks and Comparisons: - Unbound Constraints (>=1): Extremely risky. Automatically accepting future major versions leaves you vulnerable to unexpected breaking changes that you have not vetted [1]. - Caret Operator (^1.0): Recommended. It provides a balance between receiving necessary updates (patches/minors) and maintaining stability by locking to the current major version [3][7]. - Pre-1.0 Behavior: Be aware that for versions below 1.0.0, the ^ operator behaves more strictly to protect against instability, as SemVer often allows breaking changes in early, non-stable versions [3][2][8]. For example, ^0.3.0 is equivalent to >=0.3.0 <0.4.0 [3][8]. For internal Laravel packages, you should always define an upper bound to ensure that your application dependencies remain predictable and stable [1]. Relying on ^1.0 allows you to safely manage updates while maintaining control over when you are ready to migrate to a new major version [1].
Citations:
- 1: http://getcomposer.org/doc/faqs/why-are-unbound-version-constraints-a-bad-idea.md
- 2: https://developer.happyr.com/always-use-caret-instead-of-tilde
- 3: https://getcomposer.org/doc/articles/versions.md
- 4: https://www.iwader.co.uk/posts/2016/02/how-composer-version-constraints-work/
- 5: https://naderman.de/slippy/slides/2017-07-13-T3DD17-Composer-Best-Practices.pdf
- 6: https://github.com/composer/composer/blob/main/doc/articles/versions.md
- 7: https://dev.to/timoschinkel/good-practices-when-working-with-composer-5a1c
- 8: https://blog.jawira.com/2025/05/27/pre-release-composer.html
Pin he4rt/ingestion with a bounded constraint
In composer.json line 26, ">=1" is unbounded and can pull future major releases; change it to a compatible range like ^1.0 (equivalent to >=1.0.0 <2.0.0).
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@composer.json` at line 26, The dependency constraint for he4rt/ingestion in
composer.json is unbounded (">=1"); update the version requirement to a bounded,
semver-compatible range such as "^1.0" (which equals ">=1.0.0 <2.0.0") to
prevent pulling future major releases—replace the existing ">=1" constraint for
the "he4rt/ingestion" package with "^1.0" (or another strict range like "~1.0"
if preferred).
| 'prefix' => '', | ||
| 'prefix_indexes' => true, | ||
| 'search_path' => 'public', | ||
| 'sslmode' => 'prefer', |
There was a problem hiding this comment.
Do not hardcode TLS downgrade (sslmode=prefer) on ingestion DB connection.
Line 116 permits plaintext fallback. Make SSL mode environment-driven and default to require (or stricter) for non-local environments.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@config/database.php` at line 116, Replace the hardcoded 'sslmode' => 'prefer'
entry so TLS behavior is driven by an environment variable (e.g. DB_SSLMODE) and
defaults to a secure mode for non-local environments; update the database config
in config/database.php to read env('DB_SSLMODE', (env('APP_ENV') === 'local' ?
'prefer' : 'require')) (or similar) and use that value for the 'sslmode' key so
local dev can still fall back but staging/production default to 'require' or
stricter.
…o, pgdata path e try-catch)
feat(ingestion): implementar fundação do módulo de ingestão com TimescaleDB e dual-write Discord
Resolve: #299 (Fase 1 — somente Discord)
Contexto
Hoje cada módulo
integration-*escreve diretamente no Postgres transacional usado por login, painéis, gamification e identity. Com ~3.4 GB de dados de atividade acumulados e novos provedores planejados (Instagram, WhatsApp), essa coabitação é insustentável.Este PR inicializa o módulo
ingestioncomo futuro ponto único de entrada para dados de todos os provedores, apoiado por uma instância dedicada de TimescaleDB. O escopo é deliberadamente limitado ao Discord — Twitch, GitHub e Dev.to virão em PRs separados após o time alinhar onde os DTOs de cada provedor devem morar (integration-*ouingestion).O que mudou
Infraestrutura
docker-compose.ymlhe4rtbot-timescaledb(TimescaleDB HA na porta5436) com volume e healthcheckconfig/database.phptimescaledbapontando para a nova instânciacomposer.jsonhe4rt/ingestioncomo dependência do móduloNovo módulo:
app-modules/ingestion/database/migrations/…_create_timescaledb_tables.phpraw_payloads(cofre append-only), hypertablemessages(particionada porsent_at) e hypertablevoice_messages(particionada poroccurred_at)Providers/IngestionServiceProvider.phpdiscord.message.receivede registra o comando de backfillListeners/ProcessRawDiscordMessage.phpingestion— salva o JSON original emraw_payloads, depois roda o TransformActions/TransformDiscordMessage.phpDiscordMessageDTO::fromDump()+toDatabase()existente para garantir paridade campo a campo com o insert legado no PostgresModels/RawPayload.phptimescaledbModels/Message.phpid+sent_at) exigida pelo TimescaleDBConsole/Commands/BackfillPostgresToTimescaleCommand.phpupsert) do histórico de mensagens do Postgres → TimescaleDBtests/Feature/DualWriteAndBackfillTest.phpArquivos modificados
MessageReceivedEvent.phpevent('discord.message.received', ['raw_payload' => …])no topo do handler — persistência legada intacta (dual-write)Arquitetura
Ambos os caminhos rodam de forma independente. O Postgres legado continua sendo a fonte da verdade até o backfill atingir paridade (Decisão 10).
O que NÃO está neste PR
voice_messagesfoi criada para adiantar a estrutura, mas o listener e o ETL de voz virão na próxima fase).external_identity_id→external_account_id)Como testar
Notas de deploy
Warning
Produção requer uma instância de TimescaleDB e workers escutando a fila
ingestion. O dashboard e o sistema de XP continuam lendo do Postgres legado — nenhuma mudança de comportamento para o usuário final.