Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bugfix: unexpected behavior when hidden_dim % group_size != 0 #532

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

EmreAdabag
Copy link

This fixes two bugs that cause unexpected behavior when the hidden dim isn't evenly divisible by the quantization group size like in Stories42M which has hidden dim 1376 and group size 64.

  1. Matmul uses the wrong scaling factors when performing matmul(_, _, _, hidden_dim, _);
  2. When quantizing vectors of length hidden_dim the tail-end hidden_dim % group_size elements aren't quantized.

This fix enables inference to be run with quantized models exported by export.py regardless of hidden_dim % group_size. This has been tested with Stories42M and validated against a python implementation of quantized inference. There will be a negligible performance hit caused by smaller group sizes during the matrix multiplication, otherwise the performance of quantized inference should remain unchanged.

Alternatively/additionally, just ensure that hidden_dim % group_size == 0 in export.py.

Both bugs cause unexpected behavior when the
model's hidden dim isn't evenly divisible
by the group size, like Stories42M which has
hidden dim 1376 and group size 64.
@EmreAdabag
Copy link
Author

Fixing this in export.py
#533

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant