You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
- Stable releases check out `main`; alpha/beta check out `dev`
233
260
- Old alphas pruned to 7, betas to 5
234
261
-`[skip ci]` in version bump commits to prevent loops
235
262
236
263
### Pre-commit Hooks
264
+
237
265
-**isort**, **black**, **flake8**, **ruff**: Code formatting and linting
238
266
-**prettier**: Markdown, JSON, YAML formatting
239
267
-**gitleaks**: Secret scanning
240
268
-**pre-commit-hooks**: Large file checks, merge conflict detection, YAML validation
241
269
242
270
## Environment Variables
271
+
243
272
```bash
244
273
# Testing configuration
245
274
export PYTEST_DONUT=yes # Enable real OCR testing
@@ -250,33 +279,51 @@ export PYTHONPATH=$(pwd) # Local development imports
250
279
```
251
280
252
281
## Performance Requirements
282
+
253
283
-**Core Package**: <2MB (from ~8MB in v4.0.x)
254
284
-**Regex Engine**: 150x+ faster than spaCy (currently 190x)
255
-
-**GLiNER Engine**: 25x+ faster than spaCy (currently 32x)
285
+
-**GLiNER Engine**: 25x+ faster than spaCy (currently 32x)
256
286
-**Memory Usage**: Graceful handling of large texts (1MB+ chunks)
257
287
-**Model Loading**: Cache GLiNER models to avoid repeated downloads
258
288
259
-
## Best Practices for Claude Agents
289
+
## Agent skills
290
+
291
+
### Issue tracker
292
+
293
+
Issues and PRDs are tracked in Linear under the DFPY team. See `docs/agents/issue-tracker.md`.
294
+
295
+
### Triage labels
296
+
297
+
Use the default five-label triage vocabulary. See `docs/agents/triage-labels.md`.
298
+
299
+
### Domain docs
300
+
301
+
Single-context repo: use root `CONTEXT.md` and root `docs/adr/` when present. See `docs/agents/domain.md`.
302
+
303
+
## Best Practices for Agents
260
304
261
305
Before beginning any task please checkout a branch from `dev` and create a pull request to `dev`.
262
306
263
307
### Code Quality
308
+
264
309
- Follow existing patterns before implementing new approaches
265
310
- Add comprehensive tests for all new functionality
266
311
- Update documentation immediately with code changes
267
312
- Run benchmarks for any text processing modifications
268
313
269
314
### GLiNER Development
315
+
270
316
- Use PII-specialized models when available (`urchade/gliner_multi_pii-v1`)
271
317
- Test graceful degradation when GLiNER dependencies missing
272
318
- Validate smart cascading thresholds with real data
273
319
- Consider model download time and caching strategies
274
320
275
321
### Release Preparation
322
+
276
323
- Alpha/beta releases are automated via `release.yml` schedule
277
324
- Stable releases: merge `dev` → `main`, then trigger `release.yml` with `stable` type
278
325
- Use `dry_run: true` to validate before actual publish
279
326
- Performance validation on realistic data sets
280
-
- In Release Notes or Comments, do not reference that it was authored by Claude (all code is anonymously authored)
327
+
- In Release Notes or Comments, do not reference that it was authored by an AI agent (all code is anonymously authored)
281
328
282
-
This guide provides the essential information for DataFog development while maintaining focus on current priorities and recent GLiNER integration work.
329
+
This guide provides the essential information for DataFog development while maintaining focus on current priorities and recent GLiNER integration work.
0 commit comments