Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
45 changes: 45 additions & 0 deletions .github/workflows/ci_conda_publish.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,45 @@
name: CI Conda Publish

on: workflow_dispatch

jobs:
build:
publish-to-conda:
name: Publish to Conda
# if: startsWith(github.ref, 'refs/tags/')
# needs:
# - publish-to-pypi
runs-on: conda/miniconda3:latest
environment: conda
steps:
- name: Install conda requirements
run: |
conda install anaconda-client conda-build

- name: Install Grayskull
run: |
conda install -c conda-forge grayskull=3.1.0

- name: Build conda metadata
run: |
conda grayskull pypi data-validation-engine

- name: Replace incorrect values
run: |
sed -i 's/- AddYourGitHubIdHere/- georgeRobertson\n - stevenhsd/' ./data-validation-engine/meta.yaml
sed -i 's/- data_validation_engine/- dve/' ./data-validation-engine/meta.yaml

- name: Build Conda packages
run: |
conda config --set anaconda_upload no
make cdist

- name: upload dist
env:
ANACONDA_TOKEN = ${{ secrets.ANACONDA_TOKEN }}
ANACONDA_USER = ${{ secrets.ANACONDA_USER }}
run: |
anaconda upload conda_dist/noarch/*.conda --token "$ANACONDA_TOKEN" --user "$ANACONDA_USER"

- name: cleanup
run: conda build purge
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
name: CI Publish
name: CI PyPi Publish

on: workflow_dispatch

Expand Down
4 changes: 3 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,8 @@ spark-warehouse/*
build/
develop-eggs/
dist/
conda_dist/
data-validation-engine/
downloads/
eggs/
.eggs/
Expand Down Expand Up @@ -109,7 +111,7 @@ celerybeat.pid

# Environments
.env
.venv
.venv*
env/
venv/
ENV/
Expand Down
2 changes: 1 addition & 1 deletion .mise.toml
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
[tools]
python="3.11"
poetry="2.2.1"
poetry="2.3.3"
java="liberica-1.8.0"
2 changes: 1 addition & 1 deletion .tool-versions
Original file line number Diff line number Diff line change
@@ -1,3 +1,3 @@
python 3.11.14
poetry 2.2.1
poetry 2.3.3
java liberica-1.8.0
3 changes: 3 additions & 0 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,9 @@ wheel:

dist: wheel

cdist:
conda build . -c conda-forge --output-folder ./conda_dist/

# testing
behave:
${activate} behave
Expand Down
27 changes: 15 additions & 12 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@

The Data Validation Engine (DVE) is a configuration driven data validation library built and utilised by NHS England. Currently the package has been reverted from v1.0.0 release to a 0.x as we feel the package is not yet mature enough to be considered a 1.0.0 release. So please bear this in mind if reading through the commits and references to a v1+ release when on v0.x.

As mentioned above, the DVE is "configuration driven" which means the majority of development for you as a user will be building a JSON document to describe how the data will be validated. The JSON document is known as a `dischema` file and example files can be accessed [here](./tests/testdata/). If you'd like to learn more about JSON document and how to build one from scratch, then please read the documentation [here](./docs/).
As mentioned above, the DVE is "configuration driven" which means the majority of development for you as a user will be building a JSON document to describe how the data will be validated. The JSON document is known as a `dischema` file and example files can be accessed [here](https://github.com/NHSDigital/data-validation-engine/tree/main/tests/testdata). If you'd like to learn more about JSON document and how to build one from scratch, then please read the documentation [here](https://nhsdigital.github.io/data-validation-engine/).

Once a dischema file has been defined, you are ready to use the DVE. The DVE is typically orchestrated based on four key "services". These are...

Expand All @@ -21,7 +21,7 @@ Once a dischema file has been defined, you are ready to use the DVE. The DVE is
| 3. | Business Rules | The business rules service will perform more complex validations such as comparisons between fields and tables, aggregations, filters etc to generate new entities. |
| 4. | Error Reports | The error reports service will take all the errors raised in previous services and surface them into a readable format for a downstream users/service. Currently, this implemented to be an excel spreadsheet but could be reconfigured to meet other requirements/use cases. |

If you'd like more detailed documentation around these services the please read the extended documentation [here](./docs/).
If you'd like more detailed documentation around these services the please read the extended documentation [here](https://nhsdigital.github.io/data-validation-engine/).

The DVE has been designed in a way that's modular and can support users who just want to utilise specific "services" from the DVE (i.e. just the file transformation + data contract). Additionally, the DVE is designed to support different backend implementations. As part of the base installation of DVE, you will find backend support for `Spark` and `DuckDB`. So, if you need a `MySQL` backend implementation, you can implement this yourself. Given our organisations requirements, it will be unlikely that we add anymore specific backend implementations into the base package beyond Spark and DuckDB. So, if you are unable to implement this yourself, I would recommend reading the guidance on [requesting new features and raising bug reports here](#requesting-new-features-and-raising-bug-reports).

Expand All @@ -43,7 +43,7 @@ pip install data-validation-engine

*Note - Only versions >=0.6.2 are available on PyPi. For older versions please install directly from the git repo or build from source.*

Once you have installed the DVE you are ready to use it. For guidance on how to create your dischema JSON document (configuration), please read the [documentation](./docs/).
Once you have installed the DVE you are ready to use it. For guidance on how to create your dischema JSON document (configuration), please read the [documentation](https://nhsdigital.github.io/data-validation-engine/).

Version 0.0.1 does support a working Python 3.7 installation. However, we will not be supporting any issues with that version of the DVE if you choose to use it. __Use at your own risk__.

Expand All @@ -56,17 +56,20 @@ If you have feature request then please follow the same process whilst using the

## Upcoming features
Below is a list of features that we would like to implement or have been requested.
| Feature | Release Version | Released? |
| ------- | --------------- | --------- |
| Open source release | 0.1.0 | Yes |
| Uplift to Python 3.11 | 0.2.0 | Yes |
| Upgrade to Pydantic 2.0 | Before 1.0 release | No |
| Create a more user friendly interface for building and modifying dischema files | Not yet confirmed | No |

Beyond the Python and Pydantic upgrade, we cannot confirm the other features will be made available anytime soon. Therefore, if you have the interest and desire to make these features available, then please read the [Contributing](#contributing) section and get involved.
| Feature | Release Version | Released? |
| ------------------------------------------------------------------------------- | ----------------- | --------- |
| Open source release | 0.1.0 | Yes |
| Uplift to Python 3.11 | 0.2.0 | Yes |
| Uplift Pyspark to 3.5 | TBA | No |
| Allow DVE to run on Python 3.12+ | TBA | No |
| Upgrade to Pydantic 2.0 | TBA | No |
| Uplift Pyspark to 4.0+ | TBA | No |
| Create a more user friendly interface for building and modifying dischema files | Not yet confirmed | No |

Beyond the Python and Pydantic upgrade, we cannot confirm the other features will be made available anytime soon. Therefore, if you have the interest and desire to make these features available, then please read the [Contributing](#Contributing) section and get involved.

## Contributing
Please see guidance [here](./CONTRIBUTE.md).
Please see guidance [here](https://github.com/NHSDigital/data-validation-engine/blob/main/CONTRIBUTE.md).

## Legal
This codebase is released under the MIT License. This covers both the codebase and any sample code in the documentation.
Expand Down
Loading
Loading