feat(drivers): support dataplane custom driver management by ets · Pull Request #42 · Qualytics/qualytics-cli

ets · 2026-04-05T13:04:38Z

Summary

This PR introduces a complete custom dataplane driver management workflow to the Qualytics CLI, enabling operators to package and deploy third-party JDBC drivers into the Qualytics platform without manual YAML authoring.

New Commands

Command	Description
`qualytics drivers generate`	Introspects a JDBC driver JAR and emits a complete, platform-aligned `DriverDefinition` YAML
`qualytics drivers package`	Bundles all generated driver YAMLs into a `custom-drivers.jar` loadable by the Qualytics dataplane

How It Works

drivers generate

Accepts a JDBC JAR path, a JDBC URL, optional credentials, and extra connection properties
Compiles and runs a Java probe at runtime to interrogate the driver: metadata, supported features, connection capabilities, and SQL dialect hints
Auto-detects the dialectClass via JAR ServiceLoader inspection and a catalog of known Spark built-ins — no manual lookup required
Derives the correct dataSizeLimit for known 32-bit JDBC drivers (Redshift, SQL Server, DB2 → INT_MAX; all others → LONG_MAX)
Emits a fully-populated DriverDefinition YAML to dist/META-INF/jdbc-drivers/<prefix>.yaml and maintains an index file
Optionally invokes an AI assistant to resolve any remaining # TODO fields in the YAML before writing

drivers package

Reads the index of generated driver YAMLs from the dist tree
Zips the entire dist/ structure into a single custom-drivers.jar
The resulting JAR can be deployed alongside the corresponding JDBC JARs for the Qualytics platform to pick up at startup

CLI Integration

A new drivers command group is registered under the top-level qualytics command
The group is surfaced as a single top-line entry in the CLI help — keeping the main help clean while grouping related subcommands logically

Changes

File	Change
`qualytics/cli/generate_driver.py`	New module: Java probe, YAML builder, dialect detection, LLM field resolver, `generate` and `package` commands (~1,600 lines)
`qualytics/qualytics.py`	Register `drivers` command group

Test plan

Run qualytics drivers generate --jar <path>.jar --url jdbc:postgresql://... --user ... --password ... — confirm YAML written to dist/META-INF/jdbc-drivers/postgresql.yaml with correct dialectClass and dataSizeLimit
Run qualytics drivers generate against a Redshift JAR — confirm dataSizeLimit: INT_MAX with correct comment
Run qualytics drivers package after generating — confirm custom-drivers.jar created containing META-INF/jdbc-drivers/ tree
Run qualytics drivers --help — confirm both generate and package subcommands are listed
Run qualytics --help — confirm drivers appears as a single top-level entry

🤖 Generated with Claude Code

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…vers command - Rewrote Java JDBC probe to auto-detect all 14 DriverDefinition fields: NDV(1) added to approxCountDistinctFunction probe; new rowCountQueryStyle probe (INFORMATION_SCHEMA_ROW_COUNT, INFORMATION_SCHEMA_TABLES_WITH_SIZE, ALL_TABLES); all probes emit to JSON for Python consumption. - Rewrote _build_yaml() with canonical key ordering (Identity → SQL dialect → Performance → Schema/catalog → Style selectors → Date arithmetic templates → Connectivity → Spark JdbcDialect → URL construction → Connection spec). Applies "non-default keys only" rule: fields equal to DriverDefinition defaults are omitted. Only insertBatchSize is excluded (write-only); maxPartitionParallelism restored as a TODO field. - Added _derive_url_metadata() returning (port, template, url_components). connectionSpec only marks fields required when actually present in the probe URL — portless drivers (SQLite, MongoDB) no longer get a spurious required port field. - Default output path changed to dist/META-INF/jdbc-drivers/<prefix>.yaml. Directory is auto-created. Index file created/updated after every write (idempotent — no duplicates on re-run). - Added package-drivers top-level command: bundles dist/ into custom-drivers.jar using Python zipfile (no jar tool required). - Removed Source URL from generated YAML header so probe URL does not confuse the LLM when suggesting jdbcUrlTemplate values. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…r and Spark built-ins

…alue with comment Redshift uses a 32-bit JDBC driver and must use INT_MAX, not LONG_MAX. The generator now auto-selects INT_MAX for redshift/sqlserver/db2 prefixes with a matching comment so value and comment are always in sync. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

greptile-apps · 2026-04-05T13:08:21Z

Greptile Summary

This PR fixes a value/comment mismatch in generate_driver.py where the YAML template was emitting dataSizeLimit: LONG_MAX for Redshift, SQL Server, and DB2 even though those drivers use 32-bit JDBC interfaces that require INT_MAX. The generator now auto-selects the correct constant and matching comment for those driver prefixes at generation time.

The core intent of the fix is correct and well-motivated.
The prefix-detection predicate on line 871 uses substring matching (p in prefix.lower()) rather than exact set-membership (prefix.lower() in _int_max_prefixes), which could produce false positives for custom or future driver prefixes that embed "db2", "redshift", or "sqlserver" as substrings.
dataSizeLimit is appended to todo_fields unconditionally (line 879) even when the value is auto-selected as INT_MAX; all other auto-detected fields use detected_fields, making this inconsistent.
The LICENSE copyright year change (2023→2026) is unrelated to the stated fix.

Confidence Score: 4/5

Safe to merge; the core fix is correct and both issues identified are low-risk

No real-world JDBC prefix would trigger the substring false-positive today, and the todo_fields inconsistency has no user-visible impact since the display layer uses its own always_todo list. The fix correctly resolves the stated mismatch between emitted value and comment for 32-bit drivers.

qualytics/cli/generate_driver.py — specifically the prefix-detection predicate on line 871

Important Files Changed

Filename	Overview
qualytics/cli/generate_driver.py	New generate-driver CLI file; INT_MAX/LONG_MAX auto-selection fix is correct but uses substring matching that could produce false positives for non-standard JDBC prefixes
LICENSE	Copyright year updated 2023→2026; unrelated to the stated fix but correct
qualytics/qualytics.py	Wires drivers_app into the main CLI app; no issues

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[generate-driver invoked] --> B[_extract_prefix\nextract jdbc: scheme token]
    B --> C{prefix in 32-bit set?\nredshift / sqlserver / db2}
    C -- Yes --> D[dataSizeLimit = INT_MAX\ncomment: older 32-bit driver]
    C -- No --> E[dataSizeLimit = LONG_MAX\ncomment: TODO review]
    D --> F[_build_yaml assembles YAML]
    E --> F
    F --> G[Write .yaml file to disk]
    G --> H[_collect_todo_fields\nscan YAML for TODO: markers]
    H --> I{LLM-assist\navailable?}
    I -- Yes --> J[LLM fills remaining TODO fields]
    I -- No --> K[Prompt user to review manually]

_{Reviews (1): Last reviewed commit: "fix(generate-driver): use INT_MAX for Re..." | Re-trigger Greptile}

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>

ets and others added 5 commits March 31, 2026 18:18

WIP: pluggable drivers work in progress

9e64185

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

feat(generate-driver): auto-detect dialectClass from JAR ServiceLoade…

36546ea

…r and Spark built-ins

Only add one top-line argument for driver management

83a7382

greptile-apps Bot reviewed Apr 5, 2026

View reviewed changes

Comment thread qualytics/cli/generate_driver.py Outdated

Comment thread qualytics/cli/generate_driver.py

ets changed the title ~~fix(generate-driver): use INT_MAX for Redshift dataSizeLimit, align value with comment~~ feat(drivers): support dataplane custom driver management Apr 5, 2026

ets and others added 5 commits April 5, 2026 09:50

Update qualytics/cli/generate_driver.py

ecc019e

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>

Update qualytics/cli/generate_driver.py

a807a64

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>

linting

6390db0

linting

e10cb36

linting

1a68abc

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(drivers): support dataplane custom driver management#42

feat(drivers): support dataplane custom driver management#42
ets wants to merge 10 commits into
mainfrom
ets/pluggable-drivers

ets commented Apr 5, 2026 •

edited

Loading

Uh oh!

greptile-apps Bot commented Apr 5, 2026

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

ets commented Apr 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

New Commands

How It Works

CLI Integration

Changes

Test plan

Uh oh!

greptile-apps Bot commented Apr 5, 2026

Greptile Summary

Confidence Score: 4/5

Important Files Changed

Flowchart

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

ets commented Apr 5, 2026 •

edited

Loading