This program is written is pure Rust, so only a Rust toolchain is necessary to build chunkIQ from source. If you do not have Rust installed, check out the official getting started guide. As of now this project is not packaged for any package manager, but I might consider this when it becomes feature complete.
This program works in two modes, trace and parse.
It splits the provided files into chunks of a specified size and calculates a digest for each chunk that serves as a unique identifier for the data in this chunk. It then compares every new digest with a database of existing chunks to find duplicates.
This mode splits the file system data into chunks and compares them to find spacially redundant data.
To run chunkIQ in this mode, use
cargo run trace [OPTIONS] <filenames>
Important
Not all the options listed under cargo run trace --help
are implemented yet.
This mode uses the trace files generated by the trace mode to find redundancy between the analzed datasets. This mode can be used to find spacial and temporal redundancies.
To run chunkIQ in this mode, use
cargo run parse [OPTIONS] <trace-filenames>
To automate duplicate discovery you may register the trace mode as a Cronjob on your system.
Warning
This mode is not yet implemented.
If you want to contribute to this project, take a look at the TODOs which I've left inside the source files. You can list them all with
grep -rn --include "*.rs" "TODO:" . | awk '{$1=$1};1'
Feel free to make a PR for any of these enhancements. For anything else please open a feature request first, so we can discuss if your idea makes sense in the context of this project.
Note
The following standards are a requirement to getting your contributions merged:
- All submitted code must be properly formatted with the rustfmt defaults.
- All submitted code must include a reasonable amount of unit tests inside the source file (sometimes less is more but nothing is still nothing).
Note
This project is based on a Scala 2 implementation which is no longer maintained and relies on deprecated dependencies. This Rust implementation aims to offer improved usability and performance and avoids the deprecated dependencies issue.