Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 1 addition & 3 deletions AGENTS.md
Original file line number Diff line number Diff line change
Expand Up @@ -99,7 +99,7 @@ The `@nx/js:library` generator’s output diverges from the conventions in this
- `name` → `@lde/<new-name>`
- `description` — write something useful
- `repository.directory` → `packages/<new-name>`
- `version` → `0.0.0` from the get-go (do NOT keep the sibling’s version). nx release bumps from there via conventional commits, so the introducing `feat:` commit lands the first release at `0.1.0`; any higher starting version overshoots it. This must be in place before the PR merges — see [Releasing a new package](#releasing-a-new-package).
- `version` → `0.1.0` from the get-go (do NOT keep the sibling’s version). nx release bumps from there via conventional commits, so the introducing `feat:` commit lands the first release at `0.1.0`; any higher starting version overshoots it. This must be in place before the PR merges — see [Releasing a new package](#releasing-a-new-package).
- `dependencies` and `peerDependencies` — replace with what the new package actually needs
4. **Replace the source.** Empty out `src/` and `test/`, write the new code.
5. **Update `tsconfig.lib.json` `references`** to match the new package’s actual `@lde/*` peers.
Expand Down Expand Up @@ -129,8 +129,6 @@ For releasing the new package’s first version, see [Releasing a new package](#

`.github/workflows/release.yml` publishes existing packages on every push to main, but the CI workflow alone cannot bring up a brand-new `@lde/<name>` package: npm’s Trusted Publisher configuration can only be added to a package that already exists on the registry. The first version has to be published manually by a maintainer; CI takes over from the second version onwards.

The package’s `version` must already be `0.0.0` before the PR merges (set in [Creating New Packages](#creating-new-packages) step 3) so the introducing `feat:` commit publishes `0.1.0`.

One-time bootstrap for a new package (do this once it has been merged to main):

1. **Publish the first version manually.** From a maintainer’s machine:
Expand Down
6 changes: 6 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -127,6 +127,11 @@ await pipeline.run();
<td><a href="https://www.npmjs.com/package/@lde/sparql-importer"><img src="https://img.shields.io/npm/v/@lde/sparql-importer" alt="npm"></a></td>
<td>Import data dumps to a local SPARQL endpoint for querying</td>
</tr>
<tr>
<td><a href="packages/sparql-anything">@lde/sparql-anything</a></td>
<td><a href="https://www.npmjs.com/package/@lde/sparql-anything"><img src="https://img.shields.io/npm/v/@lde/sparql-anything" alt="npm"></a></td>
<td>Convert tabular and other non-RDF sources to RDF with the SPARQL Anything CLI</td>
</tr>
<tr><th colspan="3" align="left">Publication – Serve and document your data</th></tr>
<tr>
<td><a href="packages/fastify-rdf">@lde/fastify-rdf</a></td>
Expand Down Expand Up @@ -222,6 +227,7 @@ graph TD
distribution-probe --> dataset
pipeline --> distribution-probe
sparql-importer --> dataset
sparql-anything --> task-runner
distribution-health --> distribution-probe
distribution-health --> sparql-importer
end
Expand Down
61 changes: 37 additions & 24 deletions package-lock.json

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

45 changes: 45 additions & 0 deletions packages/sparql-anything/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,45 @@
# SPARQL Anything

Convert tabular and other non-RDF sources to RDF with the [SPARQL Anything](https://sparql-anything.cc) CLI.

The `SparqlAnythingConverter` runs the SPARQL Anything jar **once per input chunk** to bound memory use, then concatenates the per-chunk N-Triples outputs into a single file. Processes are spawned through an [`@lde/task-runner`](../task-runner), so the same converter works on the host, in Docker, or anywhere else a `TaskRunner` is implemented.

## Usage

```typescript
import { SparqlAnythingConverter } from '@lde/sparql-anything';
import { NativeTaskRunner } from '@lde/task-runner-native';

const converter = new SparqlAnythingConverter({
queryFile: 'config/places.rq', // CONSTRUCT query; `{SOURCE}` is replaced per chunk
jarPath: 'bin/sparql-anything.jar',
adminCodesFile: 'data/admin-codes.ttl', // loaded into the default graph via --load
taskRunner: new NativeTaskRunner(),
});

await converter.convert(
['data/geonames_aa.csv', 'data/geonames_ab.csv'],
'output/geonames.nt',
);
```

For each chunk, the converter:

1. Replaces the literal `{SOURCE}` in the query file with the chunk's path and writes the result to a temporary `.rq` file.
2. Runs `java -jar <jar> -q <query> --load <adminCodesFile> --format NT --output <chunk>.nt`.
3. Waits for the process; a non-zero exit **aborts the whole conversion** so a crashed chunk can never be silently dropped from the output.

Finally, the per-chunk `.nt` files are concatenated, in the order the chunks were given, into the output path. The concatenation streams, so multi-gigabyte outputs do not have to fit in memory. N-Triples has no prefixes or document structure, so concatenating per-chunk files always yields a single valid document.

## Options

| Option | Type | Description |
| ---------------- | ------------------ | -------------------------------------------------------------------------------- |
| `queryFile` | `string` | Path to the SPARQL CONSTRUCT query. The literal `{SOURCE}` is replaced per chunk |
| `jarPath` | `string` | Path to the SPARQL Anything CLI jar |
| `adminCodesFile` | `string` | Path to the Turtle file loaded into the default graph (`--load`) |
| `taskRunner` | `TaskRunner<Task>` | Runs the SPARQL Anything process for each chunk |

## Chunking

The converter consumes pre-split chunk files; it does not split sources itself. Producing the chunks (and the `adminCodesFile`) is the caller's responsibility.
22 changes: 22 additions & 0 deletions packages/sparql-anything/eslint.config.mjs
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
import baseConfig from '../../eslint.config.mjs';

export default [
...baseConfig,
{
files: ['**/*.json'],
rules: {
'@nx/dependency-checks': [
'error',
{
ignoredFiles: [
'{projectRoot}/eslint.config.{js,cjs,mjs}',
'{projectRoot}/vite.config.{js,ts,mjs,mts}',
],
},
],
},
languageOptions: {
parser: await import('jsonc-eslint-parser'),
},
},
];
31 changes: 31 additions & 0 deletions packages/sparql-anything/package.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
{
"name": "@lde/sparql-anything",
"version": "0.1.0",
"description": "Convert tabular and other non-RDF sources to RDF with the SPARQL Anything CLI, running one process per input chunk via an @lde/task-runner.",
"repository": {
"url": "git+https://github.com/ldelements/lde.git",
"directory": "packages/sparql-anything"
},
"license": "MIT",
"type": "module",
"exports": {
"./package.json": "./package.json",
".": {
"types": "./dist/index.d.ts",
"import": "./dist/index.js",
"development": "./src/index.ts",
"default": "./dist/index.js"
}
},
"main": "./dist/index.js",
"module": "./dist/index.js",
"types": "./dist/index.d.ts",
"files": [
"dist",
"!**/*.tsbuildinfo"
],
"dependencies": {
"@lde/task-runner": "0.2.11",
"tslib": "^2.3.0"
}
}
4 changes: 4 additions & 0 deletions packages/sparql-anything/src/index.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
export {
SparqlAnythingConverter,
type SparqlAnythingConverterOptions,
} from './sparql-anything-converter.js';
Loading