Skip to content

feat(drivers): support dataplane custom driver management#42

Open
ets wants to merge 10 commits into
mainfrom
ets/pluggable-drivers
Open

feat(drivers): support dataplane custom driver management#42
ets wants to merge 10 commits into
mainfrom
ets/pluggable-drivers

Conversation

@ets
Copy link
Copy Markdown
Contributor

@ets ets commented Apr 5, 2026

Summary

This PR introduces a complete custom dataplane driver management workflow to the Qualytics CLI, enabling operators to package and deploy third-party JDBC drivers into the Qualytics platform without manual YAML authoring.

New Commands

Command Description
qualytics drivers generate Introspects a JDBC driver JAR and emits a complete, platform-aligned DriverDefinition YAML
qualytics drivers package Bundles all generated driver YAMLs into a custom-drivers.jar loadable by the Qualytics dataplane

How It Works

drivers generate

  • Accepts a JDBC JAR path, a JDBC URL, optional credentials, and extra connection properties
  • Compiles and runs a Java probe at runtime to interrogate the driver: metadata, supported features, connection capabilities, and SQL dialect hints
  • Auto-detects the dialectClass via JAR ServiceLoader inspection and a catalog of known Spark built-ins — no manual lookup required
  • Derives the correct dataSizeLimit for known 32-bit JDBC drivers (Redshift, SQL Server, DB2 → INT_MAX; all others → LONG_MAX)
  • Emits a fully-populated DriverDefinition YAML to dist/META-INF/jdbc-drivers/<prefix>.yaml and maintains an index file
  • Optionally invokes an AI assistant to resolve any remaining # TODO fields in the YAML before writing

drivers package

  • Reads the index of generated driver YAMLs from the dist tree
  • Zips the entire dist/ structure into a single custom-drivers.jar
  • The resulting JAR can be deployed alongside the corresponding JDBC JARs for the Qualytics platform to pick up at startup

CLI Integration

  • A new drivers command group is registered under the top-level qualytics command
  • The group is surfaced as a single top-line entry in the CLI help — keeping the main help clean while grouping related subcommands logically

Changes

File Change
qualytics/cli/generate_driver.py New module: Java probe, YAML builder, dialect detection, LLM field resolver, generate and package commands (~1,600 lines)
qualytics/qualytics.py Register drivers command group

Test plan

  • Run qualytics drivers generate --jar <path>.jar --url jdbc:postgresql://... --user ... --password ... — confirm YAML written to dist/META-INF/jdbc-drivers/postgresql.yaml with correct dialectClass and dataSizeLimit
  • Run qualytics drivers generate against a Redshift JAR — confirm dataSizeLimit: INT_MAX with correct comment
  • Run qualytics drivers package after generating — confirm custom-drivers.jar created containing META-INF/jdbc-drivers/ tree
  • Run qualytics drivers --help — confirm both generate and package subcommands are listed
  • Run qualytics --help — confirm drivers appears as a single top-level entry

🤖 Generated with Claude Code

ets and others added 5 commits March 31, 2026 18:18
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…vers command

- Rewrote Java JDBC probe to auto-detect all 14 DriverDefinition fields:
  NDV(1) added to approxCountDistinctFunction probe; new rowCountQueryStyle
  probe (INFORMATION_SCHEMA_ROW_COUNT, INFORMATION_SCHEMA_TABLES_WITH_SIZE,
  ALL_TABLES); all probes emit to JSON for Python consumption.

- Rewrote _build_yaml() with canonical key ordering (Identity → SQL dialect →
  Performance → Schema/catalog → Style selectors → Date arithmetic templates →
  Connectivity → Spark JdbcDialect → URL construction → Connection spec).
  Applies "non-default keys only" rule: fields equal to DriverDefinition
  defaults are omitted. Only insertBatchSize is excluded (write-only);
  maxPartitionParallelism restored as a TODO field.

- Added _derive_url_metadata() returning (port, template, url_components).
  connectionSpec only marks fields required when actually present in the
  probe URL — portless drivers (SQLite, MongoDB) no longer get a spurious
  required port field.

- Default output path changed to dist/META-INF/jdbc-drivers/<prefix>.yaml.
  Directory is auto-created. Index file created/updated after every write
  (idempotent — no duplicates on re-run).

- Added package-drivers top-level command: bundles dist/ into
  custom-drivers.jar using Python zipfile (no jar tool required).

- Removed Source URL from generated YAML header so probe URL does not
  confuse the LLM when suggesting jdbcUrlTemplate values.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…alue with comment

Redshift uses a 32-bit JDBC driver and must use INT_MAX, not LONG_MAX.
The generator now auto-selects INT_MAX for redshift/sqlserver/db2 prefixes
with a matching comment so value and comment are always in sync.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@greptile-apps
Copy link
Copy Markdown

greptile-apps Bot commented Apr 5, 2026

Greptile Summary

This PR fixes a value/comment mismatch in generate_driver.py where the YAML template was emitting dataSizeLimit: LONG_MAX for Redshift, SQL Server, and DB2 even though those drivers use 32-bit JDBC interfaces that require INT_MAX. The generator now auto-selects the correct constant and matching comment for those driver prefixes at generation time.

  • The core intent of the fix is correct and well-motivated.
  • The prefix-detection predicate on line 871 uses substring matching (p in prefix.lower()) rather than exact set-membership (prefix.lower() in _int_max_prefixes), which could produce false positives for custom or future driver prefixes that embed "db2", "redshift", or "sqlserver" as substrings.
  • dataSizeLimit is appended to todo_fields unconditionally (line 879) even when the value is auto-selected as INT_MAX; all other auto-detected fields use detected_fields, making this inconsistent.
  • The LICENSE copyright year change (2023→2026) is unrelated to the stated fix.

Confidence Score: 4/5

Safe to merge; the core fix is correct and both issues identified are low-risk

No real-world JDBC prefix would trigger the substring false-positive today, and the todo_fields inconsistency has no user-visible impact since the display layer uses its own always_todo list. The fix correctly resolves the stated mismatch between emitted value and comment for 32-bit drivers.

qualytics/cli/generate_driver.py — specifically the prefix-detection predicate on line 871

Important Files Changed

Filename Overview
qualytics/cli/generate_driver.py New generate-driver CLI file; INT_MAX/LONG_MAX auto-selection fix is correct but uses substring matching that could produce false positives for non-standard JDBC prefixes
LICENSE Copyright year updated 2023→2026; unrelated to the stated fix but correct
qualytics/qualytics.py Wires drivers_app into the main CLI app; no issues

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[generate-driver invoked] --> B[_extract_prefix\nextract jdbc: scheme token]
    B --> C{prefix in 32-bit set?\nredshift / sqlserver / db2}
    C -- Yes --> D[dataSizeLimit = INT_MAX\ncomment: older 32-bit driver]
    C -- No --> E[dataSizeLimit = LONG_MAX\ncomment: TODO review]
    D --> F[_build_yaml assembles YAML]
    E --> F
    F --> G[Write .yaml file to disk]
    G --> H[_collect_todo_fields\nscan YAML for TODO: markers]
    H --> I{LLM-assist\navailable?}
    I -- Yes --> J[LLM fills remaining TODO fields]
    I -- No --> K[Prompt user to review manually]
Loading

Reviews (1): Last reviewed commit: "fix(generate-driver): use INT_MAX for Re..." | Re-trigger Greptile

Comment thread qualytics/cli/generate_driver.py Outdated
Comment thread qualytics/cli/generate_driver.py
@ets ets changed the title fix(generate-driver): use INT_MAX for Redshift dataSizeLimit, align value with comment feat(drivers): support dataplane custom driver management Apr 5, 2026
ets and others added 5 commits April 5, 2026 09:50
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant