Loss: nan #9

uiloatoat · 2024-08-14T13:30:06Z

Hi, @xiazhi1
I didn't change any settings, but after training for about 10 epochs, the loss will become nan.
Have you encountered this situation during training? Can you provide some solutions? thank you

xiazhi1 · 2024-08-16T07:07:00Z

Hi, @xiazhi1 I didn't change any settings, but after training for about 10 epochs, the loss will become nan. Have you encountered this situation during training? Can you provide some solutions? thank you

Hi @uiloatoat
Have you solved this problem now? Can you provide some more specific information, including gpu and environment configuration(cuda ,torch, python) ?

ziwei-cui · 2024-08-18T07:53:40Z

Maybe you can change "mixed_precision: true" to "mixed_precision: false" in the "config.yaml" file.

uiloatoat · 2024-08-21T02:32:46Z

Thanks to @ziwei-cui 's advice, the model can converge. My GPU is 3090, and after turning off mixed precision, I had to reduce the batch size to 16. However, the trained model only achieved 0.6499 bPQ and 0.4835 mPQ, which is different from the paper. I only changed the batch size and set layer_decay to 0.9999 because it is a necessary parameter, and the other settings were unchanged. I have a few questions:
1> Is the model very sensitive to batch size?
2> Is the setting in config.yaml the best setting in your experiment?
3> How can I reproduce the accuracy in the paper?
Thanks again. Looking forward to your guidance.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Loss: nan #9

Loss: nan #9

uiloatoat commented Aug 14, 2024

xiazhi1 commented Aug 16, 2024

ziwei-cui commented Aug 18, 2024

uiloatoat commented Aug 21, 2024

Loss: nan #9

Loss: nan #9

Comments

uiloatoat commented Aug 14, 2024

xiazhi1 commented Aug 16, 2024

ziwei-cui commented Aug 18, 2024

uiloatoat commented Aug 21, 2024