Skip to content

Release firestore-bigquery-export 0.2.5 #2430

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
May 22, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 7 additions & 1 deletion firestore-bigquery-export/CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,9 @@
## Version 0.2.5

fix: keep partition value on delete using old data

docs: improve "Remove stale data" query in guide

## Version 0.2.4

feat: Add bigquery dataset locations and remove duplicates
Expand All @@ -10,7 +16,7 @@ fix: pass full document resource name to bigquery

fix: remove default value on DATABASE_REGION

## Versions 0.2.1
## Version 0.2.1

fix: correct database region params and make mutable

Expand Down
2 changes: 1 addition & 1 deletion firestore-bigquery-export/extension.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@
# limitations under the License.

name: firestore-bigquery-export
version: 0.2.4
version: 0.2.5
specVersion: v1beta

displayName: Stream Firestore to BigQuery
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
"url": "github.com/firebase/extensions.git",
"directory": "firestore-bigquery-export/firestore-bigquery-change-tracker"
},
"version": "1.1.41",
"version": "1.1.42",
"description": "Core change-tracker library for Cloud Firestore Collection BigQuery Exports",
"main": "./lib/index.js",
"scripts": {
Expand Down
Original file line number Diff line number Diff line change
@@ -1,13 +1,12 @@
import { FirestoreBigQueryEventHistoryTrackerConfig } from ".";
import { FirestoreDocumentChangeEvent } from "..";
import { ChangeType, FirestoreDocumentChangeEvent } from "..";
import * as firebase from "firebase-admin";

import * as logs from "../logs";
import * as bigquery from "@google-cloud/bigquery";
import * as functions from "firebase-functions";
import { getNewPartitionField } from "./schema";
import { BigQuery, TableMetadata } from "@google-cloud/bigquery";

import { PartitionFieldType } from "../types";

export class Partitioning {
Expand Down Expand Up @@ -195,11 +194,16 @@ export class Partitioning {
Delete changes events have no data, return early as cannot partition on empty data.
**/
getPartitionValue(event: FirestoreDocumentChangeEvent) {
if (!event.data) return {};
// When old data is disabled and the operation is delete
// the data and old data will be null
if (event.data == null && event.oldData == null) return {};

const firestoreFieldName = this.config.timePartitioningFirestoreField;
const fieldName = this.config.timePartitioningField;
const fieldValue = event.data[firestoreFieldName];
const fieldValue =
event.operation === ChangeType.DELETE
? event.oldData[firestoreFieldName]
: event.data[firestoreFieldName];

if (!fieldName || !fieldValue) {
return {};
Expand Down
8 changes: 4 additions & 4 deletions firestore-bigquery-export/functions/package-lock.json

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 1 addition & 1 deletion firestore-bigquery-export/functions/package.json
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@
"author": "Jan Wyszynski <wyszynski@google.com>",
"license": "Apache-2.0",
"dependencies": {
"@firebaseextensions/firestore-bigquery-change-tracker": "^1.1.41",
"@firebaseextensions/firestore-bigquery-change-tracker": "^1.1.42",
"@google-cloud/bigquery": "^7.6.0",
"@types/chai": "^4.1.6",
"@types/express-serve-static-core": "4.17.30",
Expand Down
42 changes: 42 additions & 0 deletions firestore-bigquery-export/guides/EXAMPLE_QUERIES.md
Original file line number Diff line number Diff line change
Expand Up @@ -115,6 +115,10 @@ If you want to clean up data from your `changelog` table, use the following
`DELETE` query to delete all rows that fall within a certain time period,
e.g. greater than 1 month old.

#### Option 1: Remove stale changelog records but keep latest change per document (default)

If you want to remove all entries that are over one month old, regardless of whether they are the latest change for a document (e.g., including DELETE operations), use the following query:

```sql
/* The query below deletes any rows below that are over one month old. */
DELETE FROM `[PROJECT ID].[DATASET ID].[CHANGELOG TABLE ID]`
Expand All @@ -132,3 +136,41 @@ WHERE (document_name, timestamp) IN
AND DATETIME(t.timestamp) < DATE_ADD(CURRENT_DATETIME(), INTERVAL -1 MONTH)
)
```

⚠️ Note: This query will remove all entries older than one month, including the most recent record for documents whose last change (e.g., a DELETE) happened more than a month ago. Use this only if you do not need to retain full historical state in your changelog table.

#### Option 2: Remove all changelog records older than one month — including latest DELETE operations

If you want to remove all entries that are over one month old, regardless of whether they are the latest change for a document (e.g., including DELETE operations), use the following query:

```sql
/* Deletes all changelog records older than one month, including latest DELETEs */
DELETE FROM `[PROJECT ID].[DATASET ID].[CHANGELOG TABLE ID]`
WHERE DATETIME(timestamp) < DATE_ADD(CURRENT_DATETIME(), INTERVAL -1 MONTH)
```

#### Option 3: Remove all changelog records older than one month, including latest DELETE operations only

This option removes all old records, and it will also delete DELETE operations even if they are the latest change for a document — as long as they are older than one month.

Use this if you want to aggressively clean up deleted documents from your changelog, even if that means latest views will no longer reflect that those documents were deleted.

```sql
/* Deletes any changelog records over one month old,
including DELETEs that are the latest entry for a document */
DELETE FROM `[PROJECT ID].[DATASET ID].[CHANGELOG TABLE ID]`
WHERE (document_name, timestamp) IN (
WITH latest AS (
SELECT MAX(timestamp) AS timestamp, document_name
FROM `[PROJECT ID].[DATASET ID].[CHANGELOG TABLE ID]`
GROUP BY document_name
)
SELECT (t.document_name, t.timestamp)
FROM `[PROJECT ID].[DATASET ID].[CHANGELOG TABLE ID]` AS t
JOIN latest ON t.document_name = latest.document_name
WHERE (t.timestamp != latest.timestamp OR t.operation = 'DELETE')
AND DATETIME(t.timestamp) < DATE_ADD(CURRENT_DATETIME(), INTERVAL -1 MONTH)
)
```

⚠️ Note: This will remove DELETE records that are older than one month even if they are the most recent change. As a result, your \_latest view will no longer show that those documents were deleted — they may appear as if they never existed. Use this option only if that behavior is acceptable for your use case.
Loading