Conversation
|
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
WalkthroughThe PR stops writing per-tick schedule timestamps to the DB (deprecating Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes 🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Address CodeRabbit review on PR #3476: 1. Engine — split fromTimestamp from lastScheduleTime `RegisterScheduleInstanceParams.fromTimestamp` was doing double duty as both the next-cron-slot anchor and the "previous fire time" embedded in the next worker job's payload. On skipped ticks (inactive schedule, dev env disconnected, etc.) the engine still re-registered with `fromTimestamp = scheduleTimestamp`, so a long pause/disconnect would quietly overwrite the real last-fire time with a stream of skipped slots — defeating the workerCatalog accuracy benefit for the very case it was meant to handle. `RegisterScheduleInstanceParams` now has a separate optional `lastScheduleTime` field. `fromTimestamp` advances on every tick; `lastScheduleTime` only advances on real fires. After a skip the re-registration carries forward the existing `params.lastScheduleTime` (or the legacy `instance.lastScheduledTimestamp` fallback). 2. ScheduleListPresenter — guard cron back-calculation `previousScheduledTimestamp(...)` calls cron-parser, which throws on malformed expressions. A single bad row would have failed the whole schedules-list response. Wrapped per-row in try/catch so a degraded row falls back to `lastRun: undefined` instead of taking down the page. 3. scheduleEngine integration test — anchor on observed values The test compared against a precomputed "next minute boundary" Date, which is flaky if the test setup straddles a minute boundary. Switched to deriving expectations from the first observed `exactScheduleTime` and asserting relative invariants (60s gap, second lastTimestamp equals first timestamp).
ScheduleListPresenter was deriving "Last run" via cron's previous slot even for deactivated schedules, causing the dashboard to show "1 minute ago" for a schedule that was actually deactivated months ago. Gate the cron back-calc on `schedule.active`. Inactive schedules now show "–" in the Last run cell, matching the semantic that they aren't firing. Per Devin review on PR #3476.
Recovery (Redis job missing for an instance) was re-registering with no lastScheduleTime, so the post-recovery first fire fell through the fallback chain to instance.lastScheduledTimestamp — which is frozen at the value last written before this PR stopped writing the column. For schedules that fired heavily after deploy then suffered a Redis loss, that meant payload.lastTimestamp would report a stale pre-deploy timestamp on the first fire post-recovery. Compute lastScheduleTime as the cron expression's previous slot (pure cron math, no DB read — recovery fan-outs must not add load to hot tables). Guarded against the cron-prev predating the instance's createdAt and against cron-parser throwing on malformed expressions. For continuously-running schedules this equals the actual last fire time; for long-paused or recently-edited schedules it's the same approximation the dashboard "Last run" cell accepts. Per Devin and CodeRabbit review on PR #3476.
ScheduleListPresenter was guarding the cron-derived lastRun against schedule.createdAt, which only proves the row exists. If a schedule was edited (cron changed, timezone changed) or deactivated and then reactivated, the cron's previous slot might predate the most recent config change but still pass the createdAt guard — surfacing a "Last run" the schedule never actually fired at under the current configuration. Switch the guard to schedule.updatedAt. Prisma's @updatedat bumps on every row update (cron change, timezone change, activate toggle, etc.), so the cron-derived slot is only shown once it's strictly after the most recent config change — meaning the current config has been in effect long enough to have actually fired at that slot. Per CodeRabbit review on PR #3476.
Each scheduled-task tick previously issued 3 Prisma UPDATEs against TaskSchedule.lastRunTriggeredAt, TaskScheduleInstance.lastScheduledTimestamp, and TaskScheduleInstance.nextScheduledTimestamp. All three were pure denormalization — every value can be derived without persisting. Engine - Drop the three per-tick prisma.update calls. - Refactor registerNextTaskScheduleInstance to take a fromTimestamp arg instead of reading instance.lastScheduledTimestamp from the DB. - Add optional lastScheduleTime to the schedule worker payload so the previous fire time travels forward via Redis. payload.lastTimestamp is now sourced from the worker payload, not a DB column. First-ever fires still report undefined so customer "first-run" sentinel patterns keep working. - For in-flight Redis jobs enqueued before this change (which lack lastScheduleTime), fall back to instance.lastScheduledTimestamp once. After those drain, the column is never read again. Schema - Mark the three columns @deprecated via triple-slash Prisma docstrings. No migration — columns remain in place so revert is code-only. They can be dropped in a follow-up once the rollout is stable. Webapp - ScheduleListPresenter derives the dashboard "Last run" cell from the cron expression's previous slot, gated on schedule.createdAt so brand-new schedules show "–". UI is best-effort; runs page is the source of truth. - API responses (api.v1.schedules.*) already compute nextRun from cron; no public API change. lastTimestamp on the SDK payload retains Date | undefined semantics — no SDK change either. Tests - scheduleEngine integration test asserts first-fire lastTimestamp is undefined and the second fire carries the previous fire's timestamp exactly. - scheduleRecovery tests no longer assert against the deprecated nextScheduledTimestamp column; presence of the worker job is the source of truth. References - New references/scheduled-tasks project with declarative schedules at multiple cadences plus three validators (first-fire-detector, interval-validator, upcoming-validator) that throw on FAIL — used for E2E-verifying the worker-payload flow. Refs TRI-8891
Address CodeRabbit review on PR #3476: 1. Engine — split fromTimestamp from lastScheduleTime `RegisterScheduleInstanceParams.fromTimestamp` was doing double duty as both the next-cron-slot anchor and the "previous fire time" embedded in the next worker job's payload. On skipped ticks (inactive schedule, dev env disconnected, etc.) the engine still re-registered with `fromTimestamp = scheduleTimestamp`, so a long pause/disconnect would quietly overwrite the real last-fire time with a stream of skipped slots — defeating the workerCatalog accuracy benefit for the very case it was meant to handle. `RegisterScheduleInstanceParams` now has a separate optional `lastScheduleTime` field. `fromTimestamp` advances on every tick; `lastScheduleTime` only advances on real fires. After a skip the re-registration carries forward the existing `params.lastScheduleTime` (or the legacy `instance.lastScheduledTimestamp` fallback). 2. ScheduleListPresenter — guard cron back-calculation `previousScheduledTimestamp(...)` calls cron-parser, which throws on malformed expressions. A single bad row would have failed the whole schedules-list response. Wrapped per-row in try/catch so a degraded row falls back to `lastRun: undefined` instead of taking down the page. 3. scheduleEngine integration test — anchor on observed values The test compared against a precomputed "next minute boundary" Date, which is flaky if the test setup straddles a minute boundary. Switched to deriving expectations from the first observed `exactScheduleTime` and asserting relative invariants (60s gap, second lastTimestamp equals first timestamp).
ScheduleListPresenter was deriving "Last run" via cron's previous slot even for deactivated schedules, causing the dashboard to show "1 minute ago" for a schedule that was actually deactivated months ago. Gate the cron back-calc on `schedule.active`. Inactive schedules now show "–" in the Last run cell, matching the semantic that they aren't firing. Per Devin review on PR #3476.
Recovery (Redis job missing for an instance) was re-registering with no lastScheduleTime, so the post-recovery first fire fell through the fallback chain to instance.lastScheduledTimestamp — which is frozen at the value last written before this PR stopped writing the column. For schedules that fired heavily after deploy then suffered a Redis loss, that meant payload.lastTimestamp would report a stale pre-deploy timestamp on the first fire post-recovery. Compute lastScheduleTime as the cron expression's previous slot (pure cron math, no DB read — recovery fan-outs must not add load to hot tables). Guarded against the cron-prev predating the instance's createdAt and against cron-parser throwing on malformed expressions. For continuously-running schedules this equals the actual last fire time; for long-paused or recently-edited schedules it's the same approximation the dashboard "Last run" cell accepts. Per Devin and CodeRabbit review on PR #3476.
Audit the schedule engine's logger.info calls and demote anything that
fires per-tick or per-instance to logger.debug. The previous mix would
emit ~3 info lines per fire ("Calculated next schedule timestamp",
"Triggering scheduled task", "Successfully triggered scheduled task")
which scales linearly with schedule volume.
Demoted to debug:
- "Calculated next schedule timestamp" — every tick (re-register after
every fire)
- "Triggering scheduled task" — every fire
- "Successfully triggered scheduled task" — every fire
- "Recovering schedule" — per-instance in the recovery loop, fan-out
potential during recovery storms
- "Job already exists for instance" — per-instance recovery
- "No job found for instance, registering next run" — per-instance
recovery
Kept at info (lifecycle / per-event, fires once):
- Worker startup / disabled / shutdown
- "Recovering schedules in environment" (per recovery call, not per
instance)
- "No instances found for environment" (empty recovery summary)
ScheduleListPresenter was guarding the cron-derived lastRun against schedule.createdAt, which only proves the row exists. If a schedule was edited (cron changed, timezone changed) or deactivated and then reactivated, the cron's previous slot might predate the most recent config change but still pass the createdAt guard — surfacing a "Last run" the schedule never actually fired at under the current configuration. Switch the guard to schedule.updatedAt. Prisma's @updatedat bumps on every row update (cron change, timezone change, activate toggle, etc.), so the cron-derived slot is only shown once it's strictly after the most recent config change — meaning the current config has been in effect long enough to have actually fired at that slot. Per CodeRabbit review on PR #3476.
8ba6318 to
cd5e629
Compare
…mpat Add an integration test for the path where an in-flight Redis job enqueued by the old engine (no `lastScheduleTime` in its payload) is dequeued by the new engine. The new engine must report the value persisted at `instance.lastScheduledTimestamp` as `payload.lastTimestamp` rather than reporting `undefined`, so customers don't see a one-fire gap in their lastTimestamp during the rollout. The existing integration test exercises the fresh-schedule path and the worker payload flow on subsequent fires. This new test specifically exercises the DB-column fallback in the `params.lastScheduleTime ?? instance.lastScheduledTimestamp ?? undefined` chain — the bridge that handles the legacy queue at deploy time.
External callers of registerNextTaskScheduleInstance — the deploy-time
declarative schedule sync, schedule upsert (cron change / activate),
and recovery — all called with just `{ instanceId }`, no
lastScheduleTime. That replaced the in-flight Redis job's payload with
one having lastScheduleTime: undefined, so the next fire fell back to
instance.lastScheduledTimestamp (a column this PR stops writing). On
every subsequent app deploy, customers would have seen one stale fire
per schedule, in perpetuity — the staleness compounding as the
unmaintained DB column drifted further from reality.
Move the cron-prev derivation inside registerNextTaskScheduleInstance
itself: when the caller doesn't pass lastScheduleTime, derive from the
cron expression's previous slot (guarded against the slot predating
the instance's createdAt — preserves first-fire `undefined` semantics
for brand-new schedules). Internal callers (after-fire, after-skip)
keep passing explicit values so their carry-forward semantics are
unchanged.
Also drops the duplicate cron-prev block from `#recoverTaskScheduleInstance`
— recovery now relies on the centralized fallback inside register.
Two new tests:
- "should derive lastScheduleTime from cron when external callers omit
it" — exercises the deploy/upsert pattern on a long-running instance.
- "should leave lastScheduleTime undefined for brand-new schedules" —
preserves the first-run sentinel (`if (!payload.lastTimestamp)`)
customers rely on.
Per Devin review on PR #3476.
Summary
Each scheduled-task tick previously issued 3 Prisma
UPDATEs againstTaskSchedule.lastRunTriggeredAt,TaskScheduleInstance.lastScheduledTimestamp,and
TaskScheduleInstance.nextScheduledTimestamp. All three were puredenormalization — every value can be derived without persisting.
After this PR
TaskScheduleandTaskScheduleInstancebecome near read-only:writes happen only on schedule create / update / delete (rare admin actions),
so the per-tick autovacuum churn on these hot tables disappears.
Design
The previous fire time travels forward through the schedule worker payload,
not through the database. Concretely:
schedule.triggerScheduledTaskworker payload gains an optionallastScheduleTime: z.coerce.date().optional()field.lastScheduleTime = scheduleTimestamp(the just-fired time).payload.lastTimestampis sourced fromparams.lastScheduleTimedirectly. No DB round-trip, no cron-derivationdrift across DST boundaries, no caveats around recently-edited cron
expressions.
payload.lastTimestampkeeps itsDate | undefinedSDK shape. First-everfires still report
undefined, so customerif (!payload.lastTimestamp)first-run patterns keep working.
For Redis jobs that were enqueued before this change (which lack
lastScheduleTimein their payload), the engine falls back toinstance.lastScheduledTimestamponce. Once those drain, the column isnever read again. Revert is code-only; the columns stay in place and can
be dropped in a follow-up once the rollout is stable.
Files
internal-packages/schedule-engine/*— engine refactor,workerCatalogschema field,
TriggerScheduleParamsextension, tests updated to asserton the worker-payload flow rather than DB readbacks.
internal-packages/database/prisma/schema.prisma—/// @deprecatedtriple-slash docstrings on the three columns. No migration.
apps/webapp/app/presenters/v3/ScheduleListPresenter.server.ts— dropsthe
lastRunTriggeredAtPrisma select; "Last run" cell is approximatedfrom the cron expression's previous slot, gated on
schedule.createdAtso brand-new schedules show "–". UI is best-effort; the runs page is the
source of truth.
apps/webapp/app/v3/utils/calculateNextSchedule.server.ts— adds apreviousScheduledTimestamphelper for the UI cell above. Public APIresponses (
api.v1.schedules.*) already computenextRunfrom cron anddon't expose
lastTimestamp— no public API change.references/scheduled-tasks/— new reference project with declarativeschedules at multiple cadences and three throw-on-fail validators
(
first-fire-detector,interval-validator,upcoming-validator) forE2E-verifying the worker-payload flow.
Refs TRI-8891
Test plan
pnpm run typecheck --filter @internal/schedule-engine --filter webapppnpm run build --filter @trigger.dev/corepnpm run test --filter @internal/schedule-engine— integration testasserts first-fire
lastTimestamp === undefined, second fire carriesthe previous fire's timestamp exactly.
references/scheduled-tasks:NULLaftermultiple fires.
"lastScheduleTime":"<previous fire timestamp>".TaskRun.payloadand the every-minute task's returned output both confirmlastTimestamp = nullon first fire andlastTimestamp = <prev fire>onsecond fire, exactly 60s apart.
non-first fire.
POST/GET/PUT/activate/deactivate/DELETE) —nextRunrecomputed live from cron + tz onevery response, no reads of deprecated columns.