Skip to content

inverted-tree/chunkIQ

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

23 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

chunkIQ

Analyze file system data for internal and temporal redundancy.

Installation

Requirements

This program is written is pure Rust, so only a Rust toolchain is necessary to build chunkIQ from source. If you do not have Rust installed, check out the official getting started guide. As of now this project is not packaged for any package manager, but I might consider this when it becomes feature complete.

Modus Operandi

This program works in two modes, trace and parse.

It splits the provided files into chunks of a specified size and calculates a digest for each chunk that serves as a unique identifier for the data in this chunk. It then compares every new digest with a database of existing chunks to find duplicates.

File System Tracing

This mode splits the file system data into chunks and compares them to find spacially redundant data.

To run chunkIQ in this mode, use

cargo run trace [OPTIONS] <filenames>

Important

Not all the options listed under cargo run trace --help are implemented yet.

Log File Parsing

This mode uses the trace files generated by the trace mode to find redundancy between the analzed datasets. This mode can be used to find spacial and temporal redundancies.

To run chunkIQ in this mode, use

cargo run parse [OPTIONS] <trace-filenames>

To automate duplicate discovery you may register the trace mode as a Cronjob on your system.

Warning

This mode is not yet implemented.

Contributing

If you want to contribute to this project, take a look at the TODOs which I've left inside the source files. You can list them all with

grep -rn --include "*.rs" "TODO:" . | awk '{$1=$1};1'

Feel free to make a PR for any of these enhancements. For anything else please open a feature request first, so we can discuss if your idea makes sense in the context of this project.

Note

The following standards are a requirement to getting your contributions merged:

  • All submitted code must be properly formatted with the rustfmt defaults.
  • All submitted code must include a reasonable amount of unit tests inside the source file (sometimes less is more but nothing is still nothing).

Note

This project is based on a Scala 2 implementation which is no longer maintained and relies on deprecated dependencies. This Rust implementation aims to offer improved usability and performance and avoids the deprecated dependencies issue.

About

Analyze file system data for internal and temporal redundancy.

Topics

Resources

License

Stars

Watchers

Forks

Languages