Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to reproduce the fid=7.13 in table 1? #22

Open
sp12138 opened this issue Feb 18, 2025 · 1 comment
Open

How to reproduce the fid=7.13 in table 1? #22

sp12138 opened this issue Feb 18, 2025 · 1 comment

Comments

@sp12138
Copy link

sp12138 commented Feb 18, 2025

Thank you for your outstanding work.

I am currently experiencing some issues while attempting to reproduce an FID score of 7.13 using SD-VAE. Specifically, I replaced the VA-VAE with SD-VAE in your code framework and utilized the same training settings outlined in lightningdit_xl_vavae_f16d32_64ep_cfg.yaml, running it for 100,000 steps. Unfortunately, I am only able to achieve a model with an FID score of 8.970012796829849.

I kindly request your assistance in reproducing the FID score of 7.13. Could you please provide the training code or offer some guidance?

Thank you once again for your exceptional work!

@JingfengYao
Copy link
Member

The following are the possible reasons I can think of.

  1. Latent normalization. During the training of SD-VAE, we maintained the same normalization operation as DiT, which directly multiplies the latent by 0.18215. This means you need to set latent_norm in the config to false and latent_multiplier to 0.18215. Since we trained a large number of tokenizers for the study, the channel-wise normalization we adopted is a stable operation, but it may not be the most optimal for SD-VAE.
  2. Sampling details. For this part of the experiment, we used the dopri5 sample method, without using Euler with 250 steps, CFG interval, and timestep shift. A friendly reminder is that the number of samples needs to be ensured to be 50k. Our paper also reports some results for 10k, but the values are higher than the FID at 50k.

Feel free to provide more details for further discussion.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants