Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add llama3.2.c port to README.md #543

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

Dylan-Harden3
Copy link

Clone of llama2.c but updated to work with Llama 3.2 1B/3B base and instruct

@Uzair-90
Copy link

Uzair-90 commented Jan 29, 2025

Hey Dylan I have a question, any assistance will be highly appreciated. I want to convert DeepSeek-R1-Llama-8B into .bin format can I use the same export.py for this?

@Dylan-Harden3
Copy link
Author

@Uzair-90 Maybe? I only ever tested with meta-llama/Llama-3.2-1B. For export.py to work the model needs to be loadable with transformers and share all the same parameter names as llama. I don't know what the specifics are for their distillation process so not sure if it will work.
You can try:

python3 export.py DeepSeek-R1-Distill-Llama-8B.bin --hf deepseek-ai/DeepSeek-R1-Distill-Llama-8B

If all you want to do is run a model locally check out lmstudio, or ollama, these are more general established projects that let you run basically any of the models whereas this one is hard coded for llama.

@Uzair-90
Copy link

I already tried this and it works like you can make a .bin file from DeepSeek-Distill-Llama-8B but the provided tokenizer.bin file is not compatible I guess I need to figure out what formatting do I need for my tokenizer.bin

@Dylan-Harden3
Copy link
Author

@Uzair-90 Yeah so looking at the 2 tokenizers here and here they seem to have some small differences but I think you can get around them if you are determined.

It looks like its mainly just the special tokens have different IDs (see the added_tokens key in both files), but the mergeable ranks/bpe part seems to be the same (vocab key).

I believe you need to edit tokenizer.py to make the tokenizer.bin have the special token IDs/ranks for deepseek. If you look at the export method in there all its doing is doing is outputting the tokens with there scores to the tokenizer.bin file.

@Uzair-90
Copy link

Thank you @Dylan-Harden3 really appreciate it I will look into it.

@Uzair-90
Copy link

Uzair-90 commented Jan 31, 2025

@Dylan-Harden3 can you help me with the problem:

size mismatch for model.layers.31.self_attn.k_proj.weight: copying a param with shape torch.Size([1024, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 4096]).
size mismatch for model.layers.31.self_attn.v_proj.weight: copying a param with shape torch.Size([1024, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 4096]).
You may consider adding `ignore_mismatched_sizes=True` in the model `from_pretrained` method

The same problem arises for all 32 layers what specific changes do I need?
I ma using transformers 4.30.0

@Dylan-Harden3
Copy link
Author

@Uzair-90 Is this when you run export.py? Please kindly open a Q&A discussion in my fork for further questions, I would like Andrej to approve this PR one day and don't want a long unrelated thread to get in the way.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants