Feat: Add anonymous telemetry, UI enhancements, and remove DLQ#24
Conversation
- Deleted DLQ API routes and associated types from the client and server. - Removed DLQ message handling from the JobsPage component. - Cleaned up server-side job processing logic to eliminate DLQ dependencies. - Introduced a new ReviewWorkflow to manage job phases and execution. - Updated job schema to include workflow instance ID. - Refactored review job handling to support new workflow structure. - Adjusted tests to reflect changes in job processing and workflow management.
…ng and job management
…ror handling, and improve UI components
There was a problem hiding this comment.
Codra Review
Here are some automated review suggestions for this pull request.
Reviewed commit: fa0dd10a4d
ℹ️ About Codra in GitHub
Your team has set up Codra to review pull requests in this repo. Reviews are triggered when you:
- Open a pull request for review
- Mark a draft as ready
- Comment "@codra-app review"
If Codra has suggestions, it will comment; otherwise it will react with 👍.
Codra can also answer questions or update the PR. Try commenting "@codra-app address that feedback".
Note
34 comments were omitted from this review to reduce noise and respect the configured max_comments limit (10). Showing the most critical issues.
| installationSynced.push(res); | ||
| } | ||
| } | ||
|
|
There was a problem hiding this comment.
Potential data loss during stale repository cleanup
The call to deleteStaleRepoConfigs on line 94 is executed regardless of whether results contained any data. If the GitHub API returns an empty list (due to a transient error, API failure, or empty account), installationSynced will be an empty array. This will likely cause deleteStaleRepoConfigs to delete all existing repository configurations for that installation, leading to catastrophic data loss. A guard clause should be implemented to ensure the sync was successful and the empty state is intentional before deleting records.
| if (results && results.length > 0) { | |
| await deleteStaleRepoConfigs(c.env, String(inst.id), installationSynced); | |
| } else { | |
| // Handle the case where no repos were found - either skip deletion or verify account state | |
| } |
| import type { AppBindings } from '@server/env'; | ||
| import { logger } from './logger'; | ||
|
|
||
| const TELEMETRY_SECRET = 'codra-telemetry-v1-secret-8f9a2b5c'; |
There was a problem hiding this comment.
The 'TELEMETRY_SECRET' is hardcoded in the source code. This is a significant security vulnerability as any user with access to the codebase can see the secret and potentially spoof telemetry data or gain unauthorized access to the telemetry endpoint.
| const TELEMETRY_SECRET = 'codra-telemetry-v1-secret-8f9a2b5c'; | |
| const TELEMETRY_SECRET = env.TELEMETRY_SECRET; |
| setActiveId(childKey); | ||
| }, | ||
| }, | ||
| <> |
There was a problem hiding this comment.
Recursive element nesting in cloneElement
In line 110, the original element el is passed as a child to cloneElement(el, ...). The third argument of cloneElement replaces the children of the cloned element. By passing el itself, you are nesting the element inside itself, creating a recursive structure (e.g., if el is a div, the output becomes <div>...<div>...</div>...</div>). This causes incorrect DOM nesting and unexpected rendering behavior.
| <> | |
| cloneElement( | |
| el, | |
| { | |
| key: childKey, | |
| className: cn("relative z-10", el.props.className), | |
| onMouseEnter: (e: React.MouseEvent) => { | |
| el.props.onMouseEnter?.(e); | |
| setActiveId(childKey); | |
| }, | |
| }, | |
| <> | |
| <div className="pointer-events-none absolute inset-0 z-0"> | |
| <AnimatePresence custom={activeId !== null}> | |
| {activeId !== null ? ( | |
| <motion.div | |
| variants={reduce ? reducedVariants : variants} | |
| initial="initial" | |
| animate="animate" | |
| exit="exit" | |
| custom={activeId !== null} | |
| className="absolute inset-0" | |
| style={{ left: -inset, right: -inset, top: 0, bottom: 0 }} | |
| > | |
| {activeId === childKey ? ( | |
| <motion.div | |
| layoutId={`shared-bg-${uid}`} | |
| transition={reduce ? { duration: 0 } : SPRING_LAYOUT} | |
| className={cn( | |
| "pointer-events-none h-full w-full rounded-lg", | |
| pillClassName, | |
| )} | |
| /> | |
| ) : null} | |
| </motion.div> | |
| ) : null} | |
| </AnimatePresence> | |
| </div> | |
| {el.props.children} | |
| </> | |
| ); |
|
|
||
| .dashboard-sidebar-action:hover, | ||
| .dashboard-sidebar-action:focus-visible { | ||
| background-color: color-mix(in oklch, oklch(100% 0 0) 10%, transparent) !important; |
There was a problem hiding this comment.
Excessive use of !important declarations
The patch introduces numerous '!important' flags across almost every new style rule (lines 729, 730, 736, 740, 765, 766, 770, 777, 783). This is a major anti-pattern that breaks the natural CSS cascade, creates specificity wars, and makes the codebase extremely difficult to maintain or override in the future. Specificity should be managed through a consistent selector strategy rather than forcing overrides.
| background-color: color-mix(in oklch, oklch(100% 0 0) 10%, transparent) !important; | |
| Remove !important and increase selector specificity or reorganize the CSS load order to ensure these styles take precedence naturally. |
| <input | ||
| type="text" | ||
| placeholder="Title or #number…" | ||
| id="pr-search" |
There was a problem hiding this comment.
The search input updates the filters state on every keystroke via onChange. Since the load function and the usePolling hook both depend on filters, every single character typed triggers a recreation of the load callback and resets the polling timer. If the load function is called immediately upon filter changes (which is standard for this pattern), this will result in an excessive number of API requests to the server, potentially leading to rate limiting or performance degradation.
| id="pr-search" | |
| // Use a debounced value for the search filter | |
| const [searchTerm, setSearchTerm] = useState(filters.search); | |
| useEffect(() => { | |
| const handler = setTimeout(() => { | |
| setFilters(f => ({ ...f, search: searchTerm, page: 1 })); | |
| }, 300); | |
| return () => clearTimeout(handler); | |
| }, [searchTerm]); | |
| // In JSX: | |
| <input | |
| value={searchTerm} | |
| onChange={(e) => setSearchTerm(e.target.value)} | |
| ... | |
| /> |
| throw retryableError; | ||
| } | ||
|
|
||
| throw rejected.length === 1 |
There was a problem hiding this comment.
Potential crash in reviewAndPersistFile after hard limit
In reviewAndPersistFile, when isHardLimit is true (lines 683-686), the code now logs a warning and falls through instead of throwing. However, it immediately proceeds to call upsertFileReview (line 689). Since the model failed with a hard limit, the review result is likely undefined or invalid. This will likely cause a runtime error or database constraint violation when attempting to persist a non-existent review result.
| throw rejected.length === 1 | |
| if (isHardLimit) { | |
| logger.warn(`File review hit hard limit...`); | |
| await upsertFileReview(env, job.id, { status: 'failed', error: errorMessage }); | |
| return; // Exit early to avoid using undefined review results | |
| } |
| env: AppBindings, | ||
| jobId: string, | ||
| phase: 'prepare' | 'review' | 'finalize', | ||
| delaySeconds = 0, |
There was a problem hiding this comment.
Risky architectural change to enqueueJobPhase
The function enqueueJobPhase has been changed from a side-effecting async function (sending to a queue) to one that throws a NextPhaseError (line 908). This assumes every single caller of this function is wrapped in a try-catch block that specifically handles NextPhaseError and propagates the 'next_phase' action to the orchestrator. If called in any other context, this will result in uncaught exceptions and jobs being marked as 'queued' in the DB without an actual message being sent to the queue, leading to 'stuck' jobs.
| logger.error('Failed to create workflow', error instanceof Error ? error : new Error(String(error))); | ||
| if (message.attempts >= 3) { | ||
| const id = parseResult.data.jobId ?? parseResult.data.deliveryId; | ||
| if (id) { |
There was a problem hiding this comment.
State inconsistency on database failure
In the error handling block (lines 66-72), if failJob fails to update the database, the code catches the error, logs it, and then proceeds to call message.ack(). This results in the message being permanently removed from the queue while the job remains in a 'pending' or 'processing' state in the database, creating a 'ghost job' that will never be completed or retried.
| if (id) { | |
| try { | |
| await failJob(env, id, '...'); | |
| message.ack(); | |
| } catch (failError) { | |
| logger.error('Critical: Failed to mark job as failed in DB', failError); | |
| message.retry(); // Retry the queue message so the DB update can be attempted again | |
| } |
| }, | ||
| ); | ||
|
|
||
| const installationSynced: string[] = []; |
There was a problem hiding this comment.
Potential runtime crash on non-iterable results
The code iterates over results using a for...of loop on line 87. If results is null or undefined (e.g., due to a failed upstream promise or API error), the application will throw a TypeError: results is not iterable, causing the request to fail and potentially crashing the worker. The code should verify that results is an array before attempting to iterate.
| const installationSynced: string[] = []; | |
| if (!Array.isArray(results)) { | |
| // handle error or skip | |
| continue; | |
| } | |
| for (const res of results) { ... } |
| const result = await runReviewJob(env, currentMessage); | ||
| if (result.action === 'next_phase') { | ||
| currentMessage = { ...currentMessage, phase: result.phase }; | ||
| retries = 0; |
There was a problem hiding this comment.
In the runAndDrain function, the MAX_RETRIES logic is logically dead. When result.action === 'retry' is encountered, the code increments the retries counter and immediately executes a break statement. This terminates the while (currentMessage) loop regardless of the retry count. Consequently, the condition ++retries > MAX_RETRIES can never be true within a single call to runAndDrain, and the 'Max retries exceeded' error will never be thrown.
| retries = 0; | |
| if (result.action === 'retry') { | |
| if (++retries > MAX_RETRIES) throw new Error('Max retries exceeded'); | |
| // Remove break to allow the loop to actually retry the current message | |
| } else { |
Description
This PR introduces anonymous telemetry tracking to help us better understand product usage and improve the platform, while strictly respecting user privacy. Alongside this core feature, we have implemented several UI/UX improvements, including fluid motion components for the sidebar navigation, updated typography, and improved skeleton loaders for better perceived performance.
Recent updates to this PR include:
roleproperty tosystemInstructionin the AI review function (reviewWithGoogle).onMouseEntertype check issue, and further improved UI components.Additionally, this PR includes some important refactoring and housekeeping:
Fixes #19 #20
Type of change
How Has This Been Tested?
Please describe the tests that you ran to verify your changes. Provide instructions so we can reproduce. Please also list any relevant details for your test configuration.
Checklist: