Add llama3.2.c port to README.md #543

Dylan-Harden3 · 2024-12-06T15:59:32Z

Clone of llama2.c but updated to work with Llama 3.2 1B/3B base and instruct

Uzair-90 · 2025-01-29T12:25:41Z

Hey Dylan I have a question, any assistance will be highly appreciated. I want to convert DeepSeek-R1-Llama-8B into .bin format can I use the same export.py for this?

Dylan-Harden3 · 2025-01-29T16:54:20Z

@Uzair-90 Maybe? I only ever tested with meta-llama/Llama-3.2-1B. For export.py to work the model needs to be loadable with transformers and share all the same parameter names as llama. I don't know what the specifics are for their distillation process so not sure if it will work.
You can try:

python3 export.py DeepSeek-R1-Distill-Llama-8B.bin --hf deepseek-ai/DeepSeek-R1-Distill-Llama-8B

If all you want to do is run a model locally check out lmstudio, or ollama, these are more general established projects that let you run basically any of the models whereas this one is hard coded for llama.

Uzair-90 · 2025-01-29T19:55:26Z

I already tried this and it works like you can make a .bin file from DeepSeek-Distill-Llama-8B but the provided tokenizer.bin file is not compatible I guess I need to figure out what formatting do I need for my tokenizer.bin

Dylan-Harden3 · 2025-01-29T20:26:08Z

@Uzair-90 Yeah so looking at the 2 tokenizers here and here they seem to have some small differences but I think you can get around them if you are determined.

It looks like its mainly just the special tokens have different IDs (see the added_tokens key in both files), but the mergeable ranks/bpe part seems to be the same (vocab key).

I believe you need to edit tokenizer.py to make the tokenizer.bin have the special token IDs/ranks for deepseek. If you look at the export method in there all its doing is doing is outputting the tokens with there scores to the tokenizer.bin file.

Uzair-90 · 2025-01-30T05:40:44Z

Thank you @Dylan-Harden3 really appreciate it I will look into it.

Uzair-90 · 2025-01-31T07:35:21Z

@Dylan-Harden3 can you help me with the problem:

size mismatch for model.layers.31.self_attn.k_proj.weight: copying a param with shape torch.Size([1024, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 4096]).
size mismatch for model.layers.31.self_attn.v_proj.weight: copying a param with shape torch.Size([1024, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 4096]).
You may consider adding `ignore_mismatched_sizes=True` in the model `from_pretrained` method

The same problem arises for all 32 layers what specific changes do I need?
I ma using transformers 4.30.0

Dylan-Harden3 · 2025-01-31T17:24:53Z

@Uzair-90 Is this when you run export.py? Please kindly open a Q&A discussion in my fork for further questions, I would like Andrej to approve this PR one day and don't want a long unrelated thread to get in the way.

Add llama3.2.c port to README.md

a91ccfe

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add llama3.2.c port to README.md #543

Add llama3.2.c port to README.md #543

Dylan-Harden3 commented Dec 6, 2024

Uzair-90 commented Jan 29, 2025 •

edited

Loading

Dylan-Harden3 commented Jan 29, 2025

Uzair-90 commented Jan 29, 2025

Dylan-Harden3 commented Jan 29, 2025

Uzair-90 commented Jan 30, 2025

Uzair-90 commented Jan 31, 2025 •

edited

Loading

Dylan-Harden3 commented Jan 31, 2025

Add llama3.2.c port to README.md #543

Are you sure you want to change the base?

Add llama3.2.c port to README.md #543

Conversation

Dylan-Harden3 commented Dec 6, 2024

Uzair-90 commented Jan 29, 2025 • edited Loading

Dylan-Harden3 commented Jan 29, 2025

Uzair-90 commented Jan 29, 2025

Dylan-Harden3 commented Jan 29, 2025

Uzair-90 commented Jan 30, 2025

Uzair-90 commented Jan 31, 2025 • edited Loading

Dylan-Harden3 commented Jan 31, 2025

Uzair-90 commented Jan 29, 2025 •

edited

Loading

Uzair-90 commented Jan 31, 2025 •

edited

Loading