bugfix: unexpected behavior when hidden_dim % group_size != 0 #532
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This fixes two bugs that cause unexpected behavior when the hidden dim isn't evenly divisible by the quantization group size like in Stories42M which has hidden dim 1376 and group size 64.
This fix enables inference to be run with quantized models exported by export.py regardless of hidden_dim % group_size. This has been tested with Stories42M and validated against a python implementation of quantized inference. There will be a negligible performance hit caused by smaller group sizes during the matrix multiplication, otherwise the performance of quantized inference should remain unchanged.
Alternatively/additionally, just ensure that hidden_dim % group_size == 0 in export.py.