This paper is interestingly thought- provoking for me.
There is a chance, that it's easier to "align t2i model with real physics" in post-training. And let it learn to generate whatever (physically implausible) combinations in pretrain.
As opposed to trying hard to come up with stuff that is supposed to learn only really physically plausible stuff from the start but might never work (not gonna call names here but i have something prominent in mind lol)
π Introducing Science-T2I - Towards bridging the gap between AI imagination and scientific reality in image generation! [CVPR 2025]
π Paper: arxiv.org/abs/2504.13129
π Project: jialuo-li.github.io/Science-β¦
π» Code: github.com/Jialuo-Li/Scienceβ¦
π€ Dataset: huggingface.co/collections/Jβ¦
π The Problem:
Current text-to-image models (like Stable Diffusion/FLUX) often create scientifically implausible visuals (e.g., tomatoes floating unnaturally in water π
π§, copper that burns with yellow flame π₯). These "science illusions" reveal a critical blind spot in generative AI!
π€ Why It Matters:
From education π to simulation π¬, AI-generated content must obey real-world physics/chemistry. We' re towards making synthetic visuals not just prettyβbut physically accurate!
π‘ Our Solution:
β
Science-T2I Dataset: Over 20K *expert-annotated* adversarial image pairs + 9K prompts across physics, chemistry & biology.
β
SciScore: A novel reward model that evaluates images like a science teacher.
β
Two-Stage Training: Fine-tunes models to align with scientific realism, boosting accuracy by 50%+.
Check out more results and insights we conclude from our training in the thread below. π #CVPR2025
Apr 22, 2025 Β· 10:28 PM UTC
8
16
212


