Description
When I run the API on Docker I get this warning in the logs:
INFO: 172.17.0.1:57538 - "POST /text/match HTTP/1.1" 422 Unprocessable Entity
INFO: 172.17.0.1:57554 - "POST /text/match HTTP/1.1" 200 OK
INFO: 172.17.0.1:57562 - "POST /text/match HTTP/1.1" 200 OK
INFO: 172.17.0.1:54830 - "POST /text/match HTTP/1.1" 200 OK
INFO: 172.17.0.1:54834 - "POST /text/match HTTP/1.1" 200 OK
INFO: 172.17.0.1:54844 - "POST /text/match?is_negate=false HTTP/1.1" 200 OK
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
- Avoid using `tokenizers` before the fork if possible
- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
2025-01-25 16:31:09,257 [AnyIO worker] [WARNI] Failed to see startup log message; retrying...
Preparing data for Tika
maybe it's nothing to worry about. I guess we can fix it by setting TOKENIZERS_PARALLELISM but I want to confirm that this is not doing any harm first. Needs some investigation.
I suspect we might be instantiating multiple instances of the HuggingFace transformers. If an LLM exists multiple times in RAM we need to stop that happening because they are already memory hungry.
Environment
Docker
How to Reproduce
- Check out the repo
export COMMIT_ID=git show -s --format=%ci_%h | sed s/[^_a-z0-9]//g | sed s/0[012]00_/_/g && docker build -t harmonyapi -t harmonyapi:$COMMIT_ID -t harmonydata/harmonyapi -t harmonydata/harmonyapi:$COMMIT_ID --build-arg COMMIT_ID=$COMMIT_ID .
docker run -p 8000:80 harmonydata/harmonyapi:$COMMIT_ID
Expected Behaviour
Ideally we would not get this error.
Description
When I run the API on Docker I get this warning in the logs:
maybe it's nothing to worry about. I guess we can fix it by setting
TOKENIZERS_PARALLELISMbut I want to confirm that this is not doing any harm first. Needs some investigation.I suspect we might be instantiating multiple instances of the HuggingFace transformers. If an LLM exists multiple times in RAM we need to stop that happening because they are already memory hungry.
Environment
Docker
How to Reproduce
export COMMIT_ID=git show -s --format=%ci_%h | sed s/[^_a-z0-9]//g | sed s/0[012]00_/_/g&& docker build -t harmonyapi -t harmonyapi:$COMMIT_ID -t harmonydata/harmonyapi -t harmonydata/harmonyapi:$COMMIT_ID --build-arg COMMIT_ID=$COMMIT_ID .docker run -p 8000:80 harmonydata/harmonyapi:$COMMIT_IDExpected Behaviour
Ideally we would not get this error.