Has anyone been successful with quantized vision transformers? #674
Unanswered
alexander-soare
asked this question in
General
Replies: 1 comment 1 reply
-
@alexander-soare I have not done this, but I'd pay particular attention to what's happening with the GELU activations during quantization, how do they get approximated? Also the LayerNorm mean/std, possible overflow? What's the precision of the accumulator? Despite the annoyances of BatchNorm, they are great for inference/quantization compared to GN, LN, etc that must always calc activations stats in the fwd pass. |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Today I managed to run FX quantization on
vit_deit_base_distilled_patch16_384
with a few minor tweaks. I get a 2.5x speed up on CPU. Accuracy plummets though :(Wondering if anyone has had experience with doing Quantization Aware Training on a vision transformer. Were you successful?
Beta Was this translation helpful? Give feedback.
All reactions