Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Loss: nan #9

Open
uiloatoat opened this issue Aug 14, 2024 · 3 comments
Open

Loss: nan #9

uiloatoat opened this issue Aug 14, 2024 · 3 comments

Comments

@uiloatoat
Copy link

Hi, @xiazhi1
I didn't change any settings, but after training for about 10 epochs, the loss will become nan.
Have you encountered this situation during training? Can you provide some solutions? thank you

@xiazhi1
Copy link
Collaborator

xiazhi1 commented Aug 16, 2024

Hi, @xiazhi1 I didn't change any settings, but after training for about 10 epochs, the loss will become nan. Have you encountered this situation during training? Can you provide some solutions? thank you

Hi @uiloatoat
Have you solved this problem now? Can you provide some more specific information, including gpu and environment configuration(cuda ,torch, python) ?

@ziwei-cui
Copy link
Collaborator

Maybe you can change "mixed_precision: true" to "mixed_precision: false" in the "config.yaml" file.

@uiloatoat
Copy link
Author

Thanks to @ziwei-cui 's advice, the model can converge. My GPU is 3090, and after turning off mixed precision, I had to reduce the batch size to 16. However, the trained model only achieved 0.6499 bPQ and 0.4835 mPQ, which is different from the paper. I only changed the batch size and set layer_decay to 0.9999 because it is a necessary parameter, and the other settings were unchanged. I have a few questions:
1> Is the model very sensitive to batch size?
2> Is the setting in config.yaml the best setting in your experiment?
3> How can I reproduce the accuracy in the paper?
Thanks again. Looking forward to your guidance.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants