I'm trying to run the pre-built Phi-3 ONNX Optimized models that are found on HuggingFace here: https://huggingface.co/microsoft/Phi-3-mini-128k-instruct-onnx.
They have pre-built ONNX models that are runnable on CPUs in their repo, but when I try to load it in Ortex, I get the following error:
iex(1)> Ortex.load("phi-3-onnx/cpu_and_mobile/cpu-int4-rtn-block-32-acc-level-4/phi3-mini-128k-instruct-cpu-int4-rtn-block-32-acc-level-4.onnx")
** (RuntimeError) Failed to create ONNX Runtime session: Load model from phi-3-onnx/cpu_and_mobile/cpu-int4-rtn-block-32-acc-level-4/phi3-mini-128k-instruct-cpu-int4-rtn-block-32-acc-level-4.onnx failed:Fatal error: com.microsoft:MatMulNBits(-1) is not a registered function/op
(ortex 0.1.9) lib/ortex/model.ex:28: Ortex.Model.load/3
iex:1: (file)
I do see that this specific operator is listed under the ONNX Runtime Contrib Operators: https://github.com/microsoft/onnxruntime/blob/main/docs/ContribOperators.md#com.microsoft.MatMulNBits.
I'm by no means anywhere close to an expert on these things, so I'm not sure if this is related to building a new Execution Provider (do I need to build the runtime with all of the contrib operators?), or if this is related to the ORT bindings (maybe they need to be updated?).
I'm trying to run the pre-built Phi-3 ONNX Optimized models that are found on HuggingFace here: https://huggingface.co/microsoft/Phi-3-mini-128k-instruct-onnx.
They have pre-built ONNX models that are runnable on CPUs in their repo, but when I try to load it in Ortex, I get the following error:
I do see that this specific operator is listed under the ONNX Runtime Contrib Operators: https://github.com/microsoft/onnxruntime/blob/main/docs/ContribOperators.md#com.microsoft.MatMulNBits.
I'm by no means anywhere close to an expert on these things, so I'm not sure if this is related to building a new Execution Provider (do I need to build the runtime with all of the contrib operators?), or if this is related to the ORT bindings (maybe they need to be updated?).