data-sanitization: protect credentials and personal data from accidental exposure

Sensitive data (credentials, PII, PHI, and other private information) ends up in logs more often than it should.

data-sanitization masks or removes sensitive field values before they leave your application.

Use it in log pipelines, request handlers, and error reporters to catch what might otherwise slip through.

It matches field names across objects, arrays, and strings, and lets you extend the built-in defaults with your own patterns for PII, PHI, or any domain-specific fields.

Before / After

const input = {
  username: 'john',
  password: 'super-secret',
  api_key: 'sk_live_abc123',
};

sanitizeData(input);
// => { username: 'john', password: '**********', api_key: '**********' }

Highlights

Zero runtime dependencies, with compiled JS and full TypeScript declarations
Sanitizes nested structures at any depth, preserving types and class instances
Matches sensitive field names across any data shape without requiring exact path declarations
Detects circular references and throws without leaking input; never silently returns partial data
Sanitization errors never expose the original input payload
Drop-in adapters for pino and winston via data-sanitization-log-providers

Why not fast-redact or pino-redact?

Tools like fast-redact and pino's built-in redaction are excellent choices when you control your data shape. They require you to declare the exact paths to redact upfront (user.password, req.headers.authorization) and compile those paths into a specialized function at initialization, achieving near-zero overhead.

The tradeoff is that you must know the shape of your data ahead of time. That works well for application-level logging where you own the data models, but falls short when sanitizing third-party library payloads, error objects with arbitrary attached metadata, or log entries assembled from sources you don't control.

data-sanitization takes a pattern-based approach instead. A single 'password' entry matches password, db_password, resetPasswordToken, and any other key containing that substring, at any depth, in any structure, without path declarations. The cost is a small per-call overhead versus path-based tools; the benefit is that it works on data whose shape you don't fully know.

If you control your data shape exactly and need maximum throughput, reach for fast-redact. If you need to sanitize data you don't fully control, data-sanitization is the right tool.

Scope and limitations

data-sanitization is a best-effort defensive layer, not a security boundary or compliance proof.

It will miss sensitive data when:

A field name is not covered by the configured patterns
Values appear in unsupported serialization formats (binary blobs, protocol buffers, custom encoding)
Sensitive content is embedded inside values in ways the configured matchers cannot recognize
The input arrives in a form sanitizeData cannot introspect (encrypted payloads, opaque strings)

Masking also leaks that a field is present and sensitive. If minimizing that signal matters, use removeMatches: true rather than the default mask.

Use data-sanitization to catch accidental leakage in logs and request payloads, not as a substitute for access controls, network security, or data-handling policies.

Log provider integrations

data-sanitization-log-providers is a companion package with pre-built adapters that wire data-sanitization directly into your logging pipeline:

Adapter	Import path	How it works
Pino hook	`data-sanitization-log-providers/pino-hook`	Registers a `pino.hooks.logMethod` hook that sanitizes arguments before they reach pino
Pino transport	`data-sanitization-log-providers/pino-transport`	A `pino-abstract-transport` stream you can pass to `pino({ transport: ... })`
Winston transport	`data-sanitization-log-providers/winston`	A `winston-transport` subclass that sanitizes each log entry before forwarding it

Install the companion package alongside your logger:

npm install data-sanitization-log-providers

See the data-sanitization-log-providers README for usage examples and configuration options.

Installation

npm install data-sanitization

yarn add data-sanitization

pnpm add data-sanitization

bun add data-sanitization

Importing

import { sanitizeData, DataSanitizationError } from 'data-sanitization';

import sanitizeData from 'data-sanitization';

const { sanitizeData } = require('data-sanitization');

Utility helpers for log middleware are available on a separate subpath; see docs/utils.md.

import {
  diffSanitizedFields,
  buildSanitizedWarning,
} from 'data-sanitization/utils';

Usage

Quick start

import { sanitizeData } from 'data-sanitization';

const input = {
  username: 'john',
  password: 'super-secret',
  api_key: 'sk_live_abc123',
};

const result = sanitizeData(input);
// => { username: 'john', password: '**********', api_key: '**********' }

Sanitize a string

Pass a string directly and it will be sanitized in place. This is useful for sanitizing serialized data before logging. For example, a raw request body, a form-encoded payload, or a JSON string you have not yet parsed:

sanitizeData('{"password":"secret","username":"john"}');
// => '{"password":"**********","username":"john"}'

sanitizeData('password=secret&username=john');
// => 'password=**********&username=john'

Parse JSON strings

By default, valid JSON object and array strings are parsed first and sanitized the same way an object would be. This correctly handles all value types, including numeric-valued sensitive fields:

sanitizeData('{"password":12345,"username":"john"}');
// => '{"password":9999999999,"username":"john"}'

Non-JSON strings fall back to text-based pattern matching automatically.

Note

Output is re-serialized with JSON.stringify, which does not preserve original whitespace or formatting. Set parseJsonStrings: false to use text-based matching instead when formatting fidelity is required or when the input is never JSON:

sanitizeData('{"password":12345,"username":"john"}', {
  parseJsonStrings: false,
});
// => '{"password":12345,"username":"john"}' (numeric value not masked on regex path)

If the string cannot be parsed as JSON, sanitizeData silently falls back to text-based pattern matching. If you need strict behavior (fail or redact on parse failure), open an issue This is tracked for a future release.

Remove fields instead of masking

sanitizeData(
  { password: 'secret', token: 'abc', username: 'john' },
  { removeMatches: true },
);
// => { username: 'john' }

Sanitize PII and PHI with custom patterns

Use the exported piiPatterns and phiPatterns constants, or build your own list, and pass them via customPatterns.

import { sanitizeData, piiPatterns, phiPatterns } from 'data-sanitization';

const patient = {
  accountId: 'acct_123',
  full_name: 'Avery Example',
  email: 'avery@example.com',
  phone: '+1-555-0100',
  date_of_birth: '1989-04-12',
  health_card: 'HC-1234-5678',
  medications: ['example-medication'],
};

sanitizeData(patient, {
  customPatterns: [...piiPatterns, ...phiPatterns],
  useDefaultPatterns: false,
});
// => {
//   accountId: 'acct_123',
//   full_name: '**********',
//   email: '**********',
//   phone: '**********',
//   date_of_birth: '**********',
//   health_card: '**********',
//   medications: '**********',
// }

Use removeMatches with the same patterns to remove those fields instead of masking them.

sanitizeData(patient, {
  customPatterns: [...piiPatterns, ...phiPatterns],
  removeMatches: true,
  useDefaultPatterns: false,
});
// => { accountId: 'acct_123' }

Common configurations

Credentials and auth headers only (the default)

No configuration needed. Out of the box, sanitizeData covers common credential fields and HTTP authentication headers:

sanitizeData({ password: 'secret', token: 'abc', username: 'john' });
// => { password: '**********', token: '**********', username: 'john' }

Add PII patterns

import { sanitizeData, piiPatterns } from 'data-sanitization';

sanitizeData(data, { customPatterns: piiPatterns });

Add PHI patterns

import { sanitizeData, phiPatterns } from 'data-sanitization';

sanitizeData(data, { customPatterns: phiPatterns });

Strict privacy removal

Masking reveals that a field is sensitive. Removal is more appropriate when field presence itself must not appear in logs:

import { sanitizeData, piiPatterns, phiPatterns } from 'data-sanitization';

sanitizeData(data, {
  customPatterns: [...piiPatterns, ...phiPatterns],
  removeMatches: true,
});

Sanitize Maps and Sets

Enable sanitizeCollections: true to traverse Map and Set instances. Each collection is sanitized and returned as a new instance; the original is never mutated.

const session = new Map([
  ['token', 'abc123'],
  ['username', 'john'],
]);

sanitizeData({ session }, { sanitizeCollections: true });
// => { session: Map { 'token' => '**********', 'username' => 'john' } }

const tags = new Set(['api_key=hunter2', 'env=production']);

sanitizeData({ tags }, { sanitizeCollections: true });
// => { tags: Set { 'api_key=**********', 'env=production' } }

Tip

Map and Set are not JSON-serializable by default; JSON.stringify turns them into {} and []. To include them in structured logs, spread them first:

// Map with string keys → plain object
JSON.stringify(Object.fromEntries(sanitizedMap));

// Map with mixed or object keys → entries array
JSON.stringify([...sanitizedMap.entries()]);

// Set → array
JSON.stringify([...sanitizedSet]);

Options

Option	Type	Default	Description
`patternMask`	`string`	`**********`	String used to replace matched string field values
`numericMask`	`number`	`9999999999`	Number used to replace matched number field values
`removeMatches`	`boolean`	`false`	Remove matched fields entirely instead of masking
`sanitizeCollections`	`boolean`	`false`	Sanitize `Map` and `Set` instances by traversing their entries and returning a new sanitized copy. When false, these pass through unchanged like other non-plain object instances.
`scanStringValues`	`boolean`	`true`	Scan string values on non-sensitive keys for embedded patterns. Applies to object input and to string input parsed via `parseJsonStrings`; has no effect on raw string input.
`parseJsonStrings`	`boolean`	`true`	Parse valid JSON string inputs as structured data and sanitize by field name. Re-serializes with `JSON.stringify`, discarding whitespace. Set to `false` to use the regex path.
`customPatterns`	`PatternEntry[]`	`[]`	Additional field name patterns to match. Each entry is a pattern string (substring match) or `{ match: string; strict?: boolean }` for an exact match.
`customMatchers`	`DataSanitizationMatcher[]`	`[]`	Additional regex matchers for custom string formats
`useDefaultPatterns`	`boolean`	`true`	Set to `false` to use only your custom patterns, ignoring the built-in defaults.
`useDefaultMatchers`	`boolean`	`true`	Set to `false` to use only your custom matchers, ignoring the built-in defaults.
`ignorePatterns`	`string[]`	`[]`	Patterns to exclude from the active set. Applied after defaults and `customPatterns` are merged. Use to prevent false positives from built-in substring matching.

Default patterns

The following field name patterns are matched by default. All use case-insensitive substring matching unless noted as exact.

Credentials (credentialPatterns):

apikey
api_key
password
secret
token

HTTP authentication headers (headerPatterns):

authorization
api-key

A field named db_password or x-authorization would also match because these patterns match as substrings.

Two additional pattern groups are exported but not included by default:

piiPatterns: Personally Identifiable Information: names, contact details, government IDs, and digital identifiers. Ambiguous single-word terms such as address, city, state, and zip use exact matching to avoid false positives (e.g. email_address is not masked when only address is in piiPatterns).
phiPatterns: Protected Health Information under HIPAA: medical record identifiers, healthcare dates, clinical data, and biometrics.

Use them via customPatterns:

import { sanitizeData, piiPatterns, phiPatterns } from 'data-sanitization';

sanitizeData(patient, {
  customPatterns: [...piiPatterns, ...phiPatterns],
});

Exact vs. substring matching

Each pattern in customPatterns is a PatternEntry: either a plain string (substring match) or an object with strict: true for an exact field-name match.

// Substring: matches 'token', 'access_token', 'session_token', ...
sanitizeData(data, { customPatterns: ['token'] });

// Exact: matches only 'token', not 'access_token'
sanitizeData(data, { customPatterns: [{ match: 'token', strict: true }] });

Use exact matching when a pattern is a common English word that would produce false positives as a substring; for example, state would otherwise mask statement or stateCode.

ignorePatterns and exact matching: ignorePatterns is a string[] matched against the match string of each active pattern. To suppress an exact-match entry such as { match: 'state', strict: true }, pass ignorePatterns: ['state'].

Default matchers

Three matchers are included by default:

JSON matcher: matches "fieldName":"value" patterns in JSON and JSON-like strings
Escaped JSON matcher: matches \"fieldName\":\"value\" patterns in JSON embedded inside JSON string values
Cookie and form-encoded matcher: matches fieldName=value and fieldName:value patterns in URL form-encoded strings and HTTP Cookie headers. Values stop at &, ;, \r, or \n so neither format's separator is consumed as part of a value.

Custom patterns and matchers

Use customPatterns to add field names on top of the defaults, or use useDefaultPatterns: false to replace the defaults entirely:

import { sanitizeData } from 'data-sanitization';

const data = {
  username: 'john',
  ssn: '123-45-6789',
  credit_card: '4111111111111111',
};

// Add to the built-in defaults
sanitizeData(data, {
  customPatterns: ['ssn', 'credit_card'],
});
// => { username: 'john', ssn: '**********', credit_card: '**********' }

// Use only specific patterns, ignoring the defaults
sanitizeData(data, {
  customPatterns: ['ssn'],
  useDefaultPatterns: false,
});
// => { username: 'john', ssn: '**********', credit_card: '4111111111111111' }

// Use a different mask string
sanitizeData(data, {
  customPatterns: ['ssn', 'credit_card'],
  patternMask: '[REDACTED]',
});
// => { username: 'john', ssn: '[REDACTED]', credit_card: '[REDACTED]' }

Use ignorePatterns to prevent a built-in pattern from matching field names that are not sensitive in your application. The default token pattern, for example, would also match tokenizer_config:

const data = {
  tokenizer_config: 'bert-base-uncased',
  api_key: 'sk-abc123',
  username: 'john',
};

// Without ignorePatterns: tokenizer_config is incorrectly masked
sanitizeData(data);
// => { tokenizer_config: '**********', api_key: '**********', username: 'john' }

// With ignorePatterns: token pattern suppressed, other patterns still active
sanitizeData(data, { ignorePatterns: ['token'] });
// => { tokenizer_config: 'bert-base-uncased', api_key: '**********', username: 'john' }

Note that ignorePatterns suppresses the entire substring pattern; any field whose name matches the pattern will pass through unmasked. If you have a field named token alongside tokenizer_config, both will be unmasked when token is ignored. Use useDefaultPatterns: false with explicit customPatterns for fine-grained per-field control.

Number-typed sensitive values are masked with numericMask to preserve the field's type:

sanitizeData({ password: 12345, username: 'john' });
// => { password: 9999999999, username: 'john' }

sanitizeData({ password: 12345, username: 'john' }, { numericMask: 0 });
// => { password: 0, username: 'john' }

For custom data formats, provide a DataSanitizationMatcher, a function that takes a pattern string and returns a global, case-insensitive RegExp. The regex must use capture groups $1 and $2 to preserve the field name and trailing delimiter while replacing the value.

import type { DataSanitizationMatcher } from 'data-sanitization';

const headerMatcher: DataSanitizationMatcher = (pattern) =>
  new RegExp(`(${pattern}:\\s*).+?(\\n|$)`, 'gi');

sanitizeData('authorization: Bearer abc123\nuser: john', {
  customMatchers: [headerMatcher],
  customPatterns: ['authorization'],
  useDefaultMatchers: false,
});
// => 'authorization: **********\nuser: john'

Error handling

sanitizeData throws a DataSanitizationError when:

The input is not a string, object, or null.
An unexpected error occurs during sanitization.

import { sanitizeData, DataSanitizationError } from 'data-sanitization';

try {
  sanitizeData(123 as any);
} catch (error) {
  if (error instanceof DataSanitizationError) {
    console.error(error.message); // 'Invalid data type'
    console.error(error.details); // { inputType: 'number' }
  }
}

Error details are limited to safe diagnostic metadata and do not include the original input payload.

How it works

sanitizeData dispatches on the input type and applies the configured patterns and matchers accordingly:

String input: by default, valid JSON object and array strings are parsed and sanitized the same way as object input (see item 2 below), then re-serialized with JSON.stringify. Non-JSON strings, and strings when parseJsonStrings: false is set, are sanitized directly via regex replacement with the configured matchers.
Object input is sanitized recursively by key name without JSON serialization. Sensitive keys are masked or removed regardless of whether their values are strings, numbers, arrays, objects, or other primitives.
Plain nested objects and arrays are cloned as they are sanitized. Non-plain object instances are preserved without modification to avoid corrupting their prototypes. Enable sanitizeCollections: true to instead traverse Map and Set instances, producing a new sanitized copy.
Object property names and Map string keys are used for pattern matching but are not themselves sanitized. If a property name or string Map key happens to contain sensitive data it will appear unsanitized in the output. Map keys that are objects are recursed into and sanitized like any other nested object.
Null input is accepted and returns null.
For object input, each pattern is matched case-insensitively against key names. By default (scanStringValues: true), string values on non-sensitive keys are also scanned, which catches credentials embedded in log messages or other free-text fields.
For string input, each pattern is tested against each matcher to find and replace sensitive values in the raw string directly.

Performance

sanitizeData is designed for in-process sanitization of log payloads, request/response objects, and similar data before they leave your application. It is not designed for streaming pipelines or bulk batch processing of large files.

String-value scanning (scanStringValues: true, the default) adds overhead on object workloads. The cost depends on how many non-sensitive string fields the input has and how long they are. Rough throughput on a modern laptop (Apple M-series, Node.js 22):

Workload	ops/s	ms/call	scan overhead
Shallow object (1 sensitive key)	~464,000	~0.002	~18%
Log object, stack trace with credentials	~46,000	~0.022	~88%
Log object, clean stack trace	~318,000	~0.003	~18%
Object with 10KB non-sensitive string	~200,000	~0.005	~68%
Large flat object (50 fields, 1 sensitive key)	~82,000	~0.012	~10%
Array (1,000 items, 1 sensitive key each)	~2,161	~0.46	~5%
Array (1,000,000 items, 1 sensitive key each)	~1.7	~574	~4%

Array workloads pay ~3–5% overhead regardless of size. The per-item pre-filter cost is negligible. The cost is most visible on individual objects with long non-sensitive string values such as stack traces or large text fields; a single 10KB non-sensitive string value incurs ~68% overhead.

Tip

Set scanStringValues: false when you control your data structure and know sensitive values only appear on sensitive-named keys. This recovers full pre-scanning throughput.

JSON string inputs are parsed and sanitized via objectReplacer by default, which is 3–4× faster than the regex path and correctly masks numeric-valued sensitive fields. Set parseJsonStrings: false to use the regex path instead.

On first call with a given set of options, sanitizeData compiles its regex set and caches the result by option fingerprint. Subsequent calls with the same options reuse the cache at no extra cost. This applies whether options are passed inline or as a variable, as long as the content is the same.

Warning

Building customPatterns dynamically per call from variable data causes a cache miss on every call, so compilation runs on each request instead of being reused.

// Anti-pattern: patterns differ on every call, cache never hits
app.post('/log', (req) => {
  sanitizeData(req.body, {
    customPatterns: [...basePatterns, ...req.user.sensitiveFields],
  });
});

// Correct: build options once at startup (or per stable configuration)
const sanitizerOptions = {
  customPatterns: [...basePatterns, ...knownSensitiveFields],
};

app.post('/log', (req) => {
  sanitizeData(req.body, sanitizerOptions);
});

If dynamic options are unavoidable, set scanStringValues: false. This skips the string-scanning cache and avoids the fingerprinting overhead on every call.

When options must genuinely vary per call, each call pays the first-call compilation cost (~32× slower than a cached call).

For full benchjohn tables, charts, and scaling analysis see docs/performance.md. To run the benchjohns:

yarn bench

Contributing

Bug reports and pull requests are welcome. See CONTRIBUTING.md to get started.

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 257 Commits
.agents/skills/code-review		.agents/skills/code-review
.ai		.ai
.claude		.claude
.github		.github
.husky		.husky
.vscode		.vscode
.yarn/sdks		.yarn/sdks
docs		docs
packages		packages
scripts		scripts
.coderabbit.yaml		.coderabbit.yaml
.editorconfig		.editorconfig
.gitattributes		.gitattributes
.gitignore		.gitignore
.markdownlint.json		.markdownlint.json
.markdownlintignore		.markdownlintignore
.oxfmtrc.json		.oxfmtrc.json
.oxlintrc.json		.oxlintrc.json
.yarnrc.yml		.yarnrc.yml
CLAUDE.md		CLAUDE.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
commitlint.config.js		commitlint.config.js
lint-staged.config.mjs		lint-staged.config.mjs
package.json		package.json
skills-lock.json		skills-lock.json
tsconfig.json		tsconfig.json
vitest.config.base.ts		vitest.config.base.ts
yarn.lock		yarn.lock

Folders and files

Latest commit

History

Repository files navigation

data-sanitization: protect credentials and personal data from accidental exposure

Before / After

Highlights

Why not fast-redact or pino-redact?

Scope and limitations

Log provider integrations

Table of Contents

Installation

Importing

Usage

Quick start

Sanitize a string

Parse JSON strings

Remove fields instead of masking

Sanitize PII and PHI with custom patterns

Common configurations

Credentials and auth headers only (the default)

Add PII patterns

Add PHI patterns

Strict privacy removal

Sanitize Maps and Sets

Options

Default patterns

Exact vs. substring matching

Default matchers

Custom patterns and matchers

Error handling

How it works

Performance

Contributing

License

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 23

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages