Add upgrade-drift test: detect schema drift between fresh install and sequential updates

## Overview

This issue tracks the implementation of a Layer 3 upgrade-drift CI test. The goal is to catch cases where `ALTER EXTENSION cat_tools UPDATE` (from 0.2.2 to 0.3.0) produces a different schema than a fresh `CREATE EXTENSION cat_tools`. Any such difference is a bug in the upgrade script.

## Implementation Spec

### Procedure

1. `make install PGUSER=postgres` (installs all SQL files)
2. Create two databases:
   - **fresh**: `CREATE EXTENSION cat_tools` (gets 0.3.0 directly)
   - **upgraded**: `CREATE EXTENSION cat_tools VERSION '0.2.2'` then `ALTER EXTENSION cat_tools UPDATE` (arrives at 0.3.0 via upgrade)
3. For each database: run `unmark-extension.sql` to remove all objects from extension membership (so pg_dump includes them as regular objects)
4. `pg_dump --schema-only --no-owner --no-privileges` on each database
5. Normalize both dumps (strip noise, sort object blocks)
6. Diff — any difference is an upgrade drift bug
7. CI job passes if diff is empty; fails with the diff shown

### Unmarking Extension Objects

```sql
/*
 * Generate ALTER EXTENSION cat_tools DROP ... statements for every
 * object owned by the extension, so pg_dump includes them as regular objects.
 */
SELECT format(
    'ALTER EXTENSION cat_tools DROP %s %s;',
    (pg_identify_object(classid, objid, 0)).type,
    (pg_identify_object(classid, objid, 0)).identity
)
FROM pg_depend
WHERE refobjid = (SELECT oid FROM pg_extension WHERE extname = 'cat_tools')
  AND deptype = 'e'
  AND classid != 'pg_extension'::regclass;
```

`pg_identify_object` returns `(type, schema, name, identity)`; the `identity` field is suitable for use in `ALTER EXTENSION DROP`. Available PG 9.3+. Some object types may need special handling — test on PG 11, PG 12, and PG 18.

### Dump Normalization Script

The normalization script should:

1. Strip pg_dump header boilerplate (lines before first `SET` or `--` section)
2. Strip `SET` statements (search_path etc.)
3. Strip `-- Name: ...; Type: ...; Schema: ...` section comment lines
4. Split remaining content into blocks on blank-line boundaries
5. Within each block, normalize whitespace (collapse runs, trim line ends)
6. Sort blocks lexicographically
7. Rejoin and diff the two outputs

### File Layout

```
test/upgrade-drift/
  PLAN.md                    <- Design doc (write first)
  unmark-extension.sql       <- SQL to generate DROP statements
  run-drift-test.sh          <- Orchestrator: creates DBs, runs full test
  normalize-dump.pl          <- Normalization script (or .sh)
```

### CI Job

```yaml
upgrade-drift-test:
  strategy:
    matrix:
      pg: [11, 12, 18]
  name: Upgrade drift test on PostgreSQL ${{ matrix.pg }}
  runs-on: ubuntu-latest
  container: pgxn/pgxn-tools
  steps:
    - name: Start PostgreSQL ${{ matrix.pg }}
      run: pg-start ${{ matrix.pg }}
    - name: Check out the repo
      uses: actions/checkout@v4
    - name: Install rsync
      run: apt-get install -y rsync
    - name: Install cat_tools
      run: make install PGUSER=postgres
    - name: Run upgrade drift test
      run: test/upgrade-drift/run-drift-test.sh
```

### Known Edge Cases to Address

- Objects intentionally different between fresh/upgraded (allowlist approach — start with empty `test/upgrade-drift/expected-diffs.txt`)
- PG version differences in pg_dump output format
- The `_cat_tools` private schema (included or excluded?)
- Table data (schema-only dump ignores it — note this limitation)
- `prosrc` field in pg_dump output (function bodies) will be byte-for-byte identical if the upgrade script copies them correctly — whitespace differences are signal, not noise

### Prior Analysis Findings

1. `pg_identify_object(classid, objid, 0).identity` gives the right string for `ALTER EXTENSION DROP` without needing to handle each object type separately — but test this assumption for edge cases.
2. Paragraph mode (splitting on blank lines) aligns well with pg_dump's output format.
3. An allowlist of known acceptable diffs is the right pattern for intentional differences.

## Language Choice for Normalization Script (Open Question)

The language for `normalize-dump.pl` (or equivalent) is still open. The leading candidate is **Perl**, but the tradeoffs are:

### Perl (leading candidate)
- **Pro**: Ships with every Debian system (it's a dependency of dpkg itself); available in the `pgxn/pgxn-tools` CI image with zero extra install steps; paragraph mode (`local $/ = ""`) is perfect for block-splitting; strong regex with `/xsm` modifiers
- **Con**: Less readable to developers who aren't Perl users; idiomatic Perl can be cryptic

### Python
- **Pro**: More readable to cold readers; `re.DOTALL` + `re.VERBOSE` cover the same ground as Perl's regex modifiers
- **Con**: Not guaranteed in the CI image (needs `apt-get install -y python3`); potential version and venv headaches on dev machines

### Shell/awk
- **Pro**: Already used in the Makefile; zero new dependencies; `awk RS=""` paragraph mode exists
- **Con**: Set arithmetic and multi-file logic are awkward; limited for the full requirements of both scripts

### Go
- **Pro**: Fast and statically typed
- **Con**: Requires an install and build step; RE2 engine (no lookahead/lookbehind); overkill for these scripts

**Recommendation**: Use Perl unless there is a strong team preference otherwise. If Perl is chosen, keep idioms straightforward — avoid write-only constructions, prefer named captures over positional, and include comments explaining any non-obvious regex.

## Success Criteria

- `test/upgrade-drift/run-drift-test.sh` runs locally (with PG available) and PASSes when fresh install and upgrade produce identical schema, FAILs with a readable diff when they differ
- CI job defined in `ci.yml` and syntactically valid
- `PLAN.md` is complete enough that a new developer understands the full design

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add upgrade-drift test: detect schema drift between fresh install and sequential updates #20

Overview

Implementation Spec

Procedure

Unmarking Extension Objects

Dump Normalization Script

File Layout

CI Job

Known Edge Cases to Address

Prior Analysis Findings

Language Choice for Normalization Script (Open Question)

Perl (leading candidate)

Python

Shell/awk

Go

Success Criteria

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Add upgrade-drift test: detect schema drift between fresh install and sequential updates #20

Description

Overview

Implementation Spec

Procedure

Unmarking Extension Objects

Dump Normalization Script

File Layout

CI Job

Known Edge Cases to Address

Prior Analysis Findings

Language Choice for Normalization Script (Open Question)

Perl (leading candidate)

Python

Shell/awk

Go

Success Criteria

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions