Skip to content

decoding speed #280

@housebaby

Description

@housebaby

I wonder how to decode in 70ms for 10s audio as you mention in "The SenseVoice-Small model utilizes a non-autoregressive end-to-end framework, leading to exceptionally low inference latency. It requires only 70ms to process 10 seconds of audio, which is 15 times faster than Whisper-Large."

it took me 200ms to decode 5s audio on GPU
But I don't use onnx and quantize, is it the cause why it is more time-comsuming than that as you declare?

Metadata

Metadata

Assignees

No one assigned

    Labels

    questionFurther information is requested

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions