ML for Art and Creativity, working @HuggingFace (apolinario@multimodal.art)

Qwen Image Multiple Angles LoRA is an exquisitely trained LoRA! 📐˚₊‧꒰ა Keep character and scenes consistent, and flies the camera around! Open source got there! One of the best LoRAs I've come across lately 🙌
50
256
2,225
121,495
Boring Reality LoRA just dropped for HunyuanVideo 🏙️🏞️ A fine-tune that lead not to cinematic shots, but to something that could've come out of your phone 📱
47
149
1,615
407,681
testing out the Diffusers Image Fill demo capabilities on a random image
40
136
1,164
274,355
Google just announced a DALLE-2-like model: Imagen For now no code, just demo site: gweb-research-imagen.appspot… And paper: gweb-research-imagen.appspot…
16
128
891
Hunyuan-3D-2.1 image-to-3D is now out! ✨ Open weights, permissively licensed 🔓 2.1 improves on 2.0 by a LOT in generating high quality textures for the 3D assets 🔥 This level of detail from a single image 🖤💎
9
104
872
49,901
Apply Texture Qwen Image Edit LoRA by tarn59 works with EVERYTHING! 👉🪵🧶, this model trains so well I've built this demo so you can apply *any* texture to *any* object on @huggingface
13
124
870
67,704
I hacked @huggingface Spaces to build an open source @gradio Dreambooth Training UI that allows you to train a model for less than US$0.80 🐱‍💻 (you can also use it locally for free): huggingface.co/spaces/multim…
28
105
793
My favorite part is that it works really well with out-of-the-distribution garments
Testing out the new virtual try-on pipeline on @huggingface, IDM-VTON ▶️ huggingface.co/spaces/yisol/…
17
80
783
86,310
Editing facial expressions in real time now on @huggingface Spaces 👨‍🎤🔀 A Grog converted Cog image to Gradio running a ComfyUI backend - magic of open source 🤝 ▶️ huggingface.co/spaces/fffilo…
10
129
768
71,702
Releasing my first FLUX LoRA: FLUX Tarot v1! 🌙🧙‍♀️🃏 Based on Raider Waite's 1920 tarot (public domain) Model and demo: huggingface.co/multimodalart… Image & Caption Dataset: huggingface.co/datasets/mult…
41
90
701
70,781
outpainting with the new FLUX-1[dev] Fill model is just on a completely new level 🪼 i've built a Space for you to try it👇
20
79
668
90,730
yes! qwen-image excels at following precise rule breaking instructions very few models can do things like "a fried egg with a blue yolk"
A powerful image model knows when/how to break the rules. > a photo of an eye with three separate pupils qwen-image
172
29
662
3,068,683
Introducing Kontext Relight! 💡 ✨ A FLUX Kontext Relight LoRA + demo trained for state-of-the art relighting for subjects & landscapes
22
83
665
76,065
rejoiced by the rebirth of the skeuomorphic isomorphic 3D icons on @Airbnb I've trained a FLUX LoRA to generate 3D icons in that style
22
42
651
475,877
open source nano banana? bytedance just dropped USO, an open source editing model that... just works
26
56
640
84,405
reminder for flux: prompting is case-sensitive 𝙰𝚊 left: Mark Zuckerberg eating pasta right: mark zuckerberg eating pasta same seed
23
37
616
103,894
1 week of Stable Diffusion A creative explosion is unfolding with Stable Diffusion,s showing the power of open source as state of the art! We curated 23+ applications this week: new features, workflow integrations, UIs; run on Win, CPU, AMD, M1 and more! multimodal.art/news/1-week-o…
8
136
595
After some, uh, developments yesterday: - Stable Diffusion v1-5 is out by @runwayml - Fine-tuned image decoder (VAE) out by @StabilityAI Magic of open source🧙 collaboration continues no matter what, here's the Best Available Stable Diffusion™ notebook: colab.research.google.com/dr…
9
95
601
ok, is this it?! I just tested the DreamO demo and it's a framework for FLUX that can kind of do it all 🪄🔧 👨‍🎤 A really good FLUX ID preservation reference 🎨 IP Adapter for composition and style 👗 Virtual Try-on ... and more!
12
82
612
66,701
Very exciting 'breaking' news! CompVis (research group behind VQGAN) have just released a new 1.45B parameter model to its Latent Diffusion model: github.com/CompVis/latent-di… From the released image it seems like it has an unprecedented text-synthesis capacity. More to follow soon
12
115
587
Replying to @multimodalart
Just redo every famous landmark in existence - could be watch this all day.
7
43
593
42,441
GANs are so back?! Scientists from Brown and Cornell have published a paper with a ✨ modern architecture GAN ✨ that is 🗿 stable to train 🗿 and competitive with SOTA GANs and even diffusion models Paper and demo 👇
16
73
580
64,162
ok i can't take it anymore: announcing the chatgpt image yellow tint corrector a @huggingface space that runs locally on your browser to fix the yellow tint of the chatgpt generated images
1/n I’m thrilled to share that our @OpenAI reasoning system scored high enough to achieve gold 🥇🥇 in one of the world’s top programming competitions - the 2025 International Olympiad in Informatics (IOI) - placing first among AI participants! 👨‍💻👨‍💻
56
24
538
99,487
LLaDA (the first Large Language Diffusion Model) is *just* out 💥 and I've built a demo, try out now 👨‍💻 It's mesmerizing to watch the diffusion process 🌀, and it being a diffusion model gives you superpowers like "the 4th word has to be pineapple" 🦸 Demo and weights 👇
15
94
513
82,626
Thanks @angrypenguinPNG for merging my PR to add high resolution to the Illusion Diffusion Space 📺🌀 It's now as fast, double the resolution and has crispy details - go play ▶️ huggingface.co/spaces/AP123/…
17
90
500
69,280
IC-Light v2 was just released by @lvminzhang 🔦, now runs on FLUX, and it is the best relighting tool in the world 🌐, just like that Try out the official demo ✨📣 huggingface.co/spaces/lllyas…
14
84
511
57,139
Google just announced "Parti" - a text-to-image model co-developed with "Imagen" "Parti" doesn't use diffusion models - rather it scales up Transformer + VQGAN architectures like DALL-E 1 and its open source replicas (dalle-pytorch, ruDALLE, DALL-E Mini) parti.research.google
7
93
489
Excited to introduce LEDITS++, a novel way to edit real images with precision ✏️ - Multiple edits ✂️🔁 - Automagic free masking 🪄🎭 - 🆕 DPM-Solver fast inversion 🔀⚡ 🤗 Try it: huggingface.co/spaces/editin… 🔗 Project: leditsplusplus-project.stati… 📝 Paper huggingface.co/papers/2311.1…
15
107
491
131,684
FLUX.1 ai-toolkit now has an official UI 🖼️ with @Gradio With this open source UI you can 💻, locally or any cloud: - Drag and drop images 🖱️ - Caption them ✏️ (or use AI to caption 🤖) - Start training 🏃 No code/yaml needed 😌 Thanks for merging my PR @ostrisai 🔥
9
89
487
49,794
nano-banana is really good at few-shot learning for example, if I upload a single photo of my face it will always keep the same facial expression and scene but if I upload multiple photos of myself, it kind of "learns" my likeness and can do everything
11
38
495
134,340
this is not a drill 🚨, real-time open source video generation is here 🔥 Self-Forcing - a real-time video distilled model from Wan 2.1 by @Adobe is out, and they open sourced it 🐐 I've built a live real time demo on @huggingface Spaces 📹💨
24
72
482
53,161
IT'S OUT! 🚀 MoDA: Multi-modal Diffusion Architecture for Talking Head Generation finally a talking head: open source 🏋️ fast ⚡ portrait + audio-driven 🧑‍🎨🎧 with emotion control (and yes, i built an inference system + Gradio, generate in < 15s on @huggingface spaces 🤗)
11
78
469
46,856
ControlNet is cool, but what if you could have MORE control? 🤯 With MultiDiffusion Region Control you can 🎛️ draw masks ✏️ and give a specific prompt for each mask 📜 The @gradio demo is just out on @huggingface 🤗 - kudos to the author @omerbartal! huggingface.co/spaces/weizma…
8
97
423
54,479
we hacked Wan 2.2 and discovered that it does first and last frame filling, works out of the box on 🧨 diffusers i've built an app for it on @huggingface Spaces (which is powering powering our nano banana video mode too 🍌 🎬)
14
58
417
49,418
Less than 1 minute guide on how to train your own LoRA with LoRA Ease 🧞‍♂️⚡ Train high-quality LoRAs on objects 📦, faces 😊, styles 🎨 or characters 🧑‍🎤 effortlessly and super cheap ༄ ▶️ huggingface.co/spaces/multim…
14
92
416
90,954
It's out! 🥳 Browse visually the Stable Diffusion Concepts Library - and use more than 100+ community taught concepts in your prompt directly on the same UI! Colab with Gradio UI: colab.research.google.com/gi…
7
59
398
You can now finally create your own stock photo smiling while eating salad in seconds 👨‍🎤🥗 IP-Apdater-FaceID Plus was silently released last week - it's first inference technique time face really captures my likeness 🥸🦚 ▶️ huggingface.co/spaces/multim…
8
70
408
60,739
How to train your own ControlNet? 🥅 We wrote a guide, ranging from deciding which controls to use 🎛️, how to prepare your dataset, all the way to gpus going brrr 🔥 (with an unexpected trip to the uncanny valley 👀) From me and @pcuenq with ❤️ huggingface.co/blog/train-yo…
9
86
402
72,737
Whoa, I just tested the IC Edit @huggingface demo and it seems the new 🐐👑 of image editing for It's an image editing LoRA for FLUX featuring: 👨‍🎤 Identity preservation (beating GPT-4o) ✏️✏️ Does multiple edits 🐎 10s image editing 🌌 style support
16
57
397
45,498
The evals they didn't show you How does GPT 4.5 compare with latest non-thinking models: Sonnet 3.7 (no thinking), Deepseek V3 (not R1!), Grok 3 (no thinking)
15
34
380
77,524
Iterated with @angrypenguinPNG on some enhancements to their Illusion Diffusion Space, @MrUgleh-inspired QR ControlNet patterns 🌀 ▶️ huggingface.co/spaces/AP123/…
12
54
375
72,304
Upgraded the TokenFlow demo to an A100! And defaults changed - the edits should be ~2.5x faster huggingface.co/spaces/weizma…
8
51
364
55,250
early results for the Qwen "Boring Reality" LoRA 📸 by kudzueye the model is still experimental and work in progress 🚧
14
27
379
64,642
This was drawn by GPT-4
15
24
358
54,652
The first large scale open source DALL-E 2 replication is here🧙 Karlo is an unCLIP model trained by #KakaoBrain I'm having fun playing with it on 🤗 @huggingface Spaces: huggingface.co/spaces/kakaob… Model card: huggingface.co/kakaobrain/ka… GitHub: github.com/kakaobrain/karlo
12
73
360
59,903
Introducing LoRA the Explorer 🔎: browse the coolest SDXL LoRAs, play with them online ▶️, use locally 💿 (...and no need to dodge semi-naked waifus 🚫) Join the fun 🕺 huggingface.co/spaces/multim…
6
77
347
57,761
🧨 diffusers 0.5.0 now supports JAX for super fast #stablediffusion inference on TPUs You can generate 8 images in ~8s on Colab Free using TPU 🚀 colab.research.google.com/gi…
2
72
335
The Stable Diffusion Multi Inpainting Spaces is out! On it you can do both: Inpainting by masking the image (with the newest @Gradio masking) or inpainting with words, your choice! huggingface.co/spaces/multim…
7
60
329
The first open Stable Diffusion 3-like architecture model is JUST out 💣 - but it is not SD3! 🤔 It is HunyuanDiT by Tencent, a 1.5B parameter DiT (diffusion transformer) text-to-image model 🖼️✨ In the paper they claim to be SOTA open source! I'm working on a @huggingface demo as you read this so we can all vibe check Model: huggingface.co/Tencent-Hunyu… GitHub: github.com/Tencent/HunyuanDi… Paper: tencent.github.io/HunyuanDiT…
12
73
342
82,697
I'm super thrilled to announce that our assemble of the Latent Diffusion LAION-400M text-to-image model is now available on @huggingface🤗, democratizing even further the access to text-to-image ai art! Thank you for all the help @osanseviero! huggingface.co/spaces/multim…
11
78
336
I'm delighted to announce I've joined @huggingface as a ML Art Engineer 🤗, to help make AI art even more accessible, easy to use and to develop for! This tech is going to empower human expression and creativity in unprecedented ways - and building it openly feels the right way!
28
29
332
FLUX Kontext [dev] is out with open weights 🔥 - Inference & LoRA tuning script (yes!) with diffusers 🧨 - Official demo on @huggingface Spaces 🖼️ - 2 community demos: multi-context & LoRA explorer (ye, some FLUX[dev] loras work out of the box 🪄 )
7
41
337
20,258
Text-to-3D and Image-to-3D in 7 seconds 🤯 💨 That's LGM: Large Multi-View Gaussian Model for High-Resolution 3D Content Creation 🧊 And it's open source ✨ Try it ▶️ huggingface.co/spaces/ashawk…
17
63
285
32,238
ControlNets are cool, but T2I-Adapters are 94% smaller 🤏 , and way faster 💨 Today TencentARC released 6 T2I Adapters for SDXL: depth, canny, lineart, openpose, and... DOODLY! Come play: huggingface.co/spaces/Tencen…
3
59
312
42,384
FINALLY! Generate a full song with lyrics in < 20 seconds! ⚡ 🔥 DiffRhythm is ⟡ just out ⟡ an open weights end-to-end full song generation model that generate 1-2min songs in just a few seconds 🏎️💨 Give it a reference + lyrics and get a song back! Sound on! 🔊 ▶️
13
54
322
75,161
Introducing FLUX LoRA the Explorer 🧭✨ Explore, generate and download FLUX LoRAs! 🖼️ Including the popular flux-realism and the cute Frosting Lane Come over, we're just getting started 🛸 ▶️ huggingface.co/spaces/multim…
16
73
321
48,852
The MarioGPT @huggingface Spaces demo is now playable! 🕹️ Now you can play the levels you generate - hopefully you're better than me 😂 huggingface.co/spaces/multim…
7
56
305
49,695
this is so good! mid-frames are here, multi-frame to video is an easy to use workflow! kudos to @morphic for open sourcing it
Morphic's frames-to-video, with up to 5 frames and time control, is now open-source. GitHub: github.com/morphicfilms/fram… Hugging Face: huggingface.co/morphic/Wan2.… More details in the thread:
3
39
305
30,230
Meta just released a new collection their open access "Seamless" translation models 🔊 They do speech-to-text, text-to-speech, speech-to-speech, text-to-text 💬🔄📝 The Expressive model keeps speech rate, pauses and style 🗣️ 📁 Models and demos: huggingface.co/collections/f…
6
74
301
51,883
Wow! I wasn't expecting the outpainting of the new FLUX Inpainting Beta Controlnet to be this good 🤯 👇 links to try it
10
26
299
28,640
Whoa, 1000 likes in the Wan 2.1 Fast Space 🎥 💨 still feels surreal that we can generate such high quality videos so fast with open source models, and that's the slowest/worst it's ever gonna be ✨
14
31
294
38,068
The diffusers 🧨 library just did a release incorporating ControlNet, it runs so fast! 🏎️‍💨 Blog: huggingface.co/blog/controln… Colab: colab.research.google.com/gi…
2
59
285
52,526
Diffusers Outpaint now allows for infinite zoom-out with a resize input size + "use as input" button @kingnish24 🤝 @fffiloni ▶️ huggingface.co/spaces/fffilo…
7
40
284
28,493
omg, it seems recraft v3 can perform simple language model tasks 🤯 1: "this page contains the number of letters r that the word strawberry has" 2: "this page contains the result of 2+5" 3: "write 2 adjectives in english" 4: "write the name of the US president"
11
31
280
57,264
Collaborative new concepts on #StableDiffusion🎨 1. Teach Stable Diffusion new concepts 👩‍🏫(add to the public library if you wish): colab.research.google.com/gi… (or browse the library to pick one🧤 huggingface.co/sd-concepts-l…) 2. Run with the learned concepts 🖼️ colab.research.google.com/gi…
3
53
270
New model alert! 🚨 ⋆✴︎˚。FLEX.1 Alpha ˚。✴︎⋆ is an 8B parameter model pruned and further trained by @ostrisai from 12B FLUX.1 [schnell]: 🖼️ High quality, competitive with FLUX[dev] 🎨 Good at styles 🤏 Smol 📜 openly licensed (Apache 2.0) ⚗️ de-destiled, CFG optional
9
41
272
25,850
Stable Diffusion 2 by @StabilityAI is out with new 5 models 👽 You can try now the 768x768 model (the largest one released) on @huggingface Spaces huggingface.co/spaces/stabil…
9
42
265
Happy Public Domain day! 🎉 To celebrate Steamboat Willie finally joining the public domain, I created a @huggingface dataset with all frames of the 1928 short 🐭📜 ▶️ huggingface.co/datasets/mult…
7
45
258
56,891
whoa, @Remade_AI just dropped 8 open source video LoRA effects for Wan 2.1 on @huggingface 🤯 Squish 🥞, Cakefy 🍰, Inflate 🎈, Deflate 📉, Shooting 🔫, Rotate 🔄 and Muscle 💪 all available openly
8
51
263
56,828
Breaking news: OpenAI open sourced their CLIP ViT-L/14@336px! github.com/openai/CLIP/commi… I'll hook it soon to many generation systems, stay tuned!
5
29
254
The official ToonCrafter demo is now available @huggingface Spaces ZeroGPU 🤯 This generative cartoon interpolation model is by far the best coherent generative interpolation model I've seen 🖼️ IMO it will change how animations are made 🪄 ▶️ huggingface.co/spaces/Doubii…
3
60
256
57,707
the gpt-oss model is really easy to tune! get started with customizing/fine-tuning to make gpt-oss your own with the @openai + @huggingface cookbook 🤝 cookbook.openai.com/articles…
3
50
261
26,334
Ok - I just quickly assembled the LAION-400M trained Latent Diffusion CFG TTI model to a Google Colab, you can try it yourself: colab.research.google.com/gi… "A mecha robot holding a sign that reads: 'This is weird'"
Very exciting 'breaking' news! CompVis (research group behind VQGAN) have just released a new 1.45B parameter model to its Latent Diffusion model: github.com/CompVis/latent-di… From the released image it seems like it has an unprecedented text-synthesis capacity. More to follow soon
35
48
239
🎅 Ho-ho-ho! Today a bunch of ICLR 2023 papers dropped! This is a conference with blind submission, authors are anonymous till review A lot of multimodal AI: text-to-video (yes, another one), text-to-3D, another 'teach-diffusion-new-concepts', texto-to-audio... and more! 🧵
4
52
242
Stable Diffusion model card is up, and the weights are available for academic and research purposes first This is the first step ahead of a full public release which should be coming soon! 🤩 #StableDiffusion huggingface.co/CompVis/stabl…
4
49
240
The Dream 7B (diffusion reasoning language model) is OUT! 🚨 I built a demo so you can test it out (and check the diffusion process live) 𖣯🔍
🚀Excited to announce Dream 7B (Diffusion reasoning model): the most powerful open diffusion large language model to date.
8
44
248
35,703
This week's updates were not only made of Dall-E 2! We also got: - Latent Diffusion LAION 400M (an open model!) - KNN Diffusion paper (promising new approach to text-to-image) - 3 new exciting TEXT-to-VIDEO models! and more! Check out our weekly update: multimodal.art/news/this-wee…
4
46
234
OPEN TO EVERYBODY! I optimized the Latent Diffusion LAION-400M Colab RAM usage and now it should run on free non-Pro accounts. And fast! 8 images in 20 seconds on a P4 GPU colab.research.google.com/gi… Google Drive support and VRAM optimizations by @RiversHaveWings were also added
Ok - I just quickly assembled the LAION-400M trained Latent Diffusion CFG TTI model to a Google Colab, you can try it yourself: colab.research.google.com/gi… "A mecha robot holding a sign that reads: 'This is weird'"
21
41
231
Stable Video Diffusion is an amazing (and chonky 🐼) new model by @StabilityAI - if you can't run it locally, you can now play with it on @huggingface Spaces 🤗 ▶️ huggingface.co/spaces/multim…
3
39
240
56,850
Yesterday OpenCLIP released the first LAION-2B trained perceptor! a ViT-B/32 CLIP that suprasses OpenAI's ViT-B/32 quite significantly: github.com/mlfoundations/ope…
3
34
231
And the Spaces for the Stable Diffusion Concepts Library is out! Navigate 250+ community taught object and styles with Textual Inversion and use them in your prompts! huggingface.co/spaces/sd-con…
3
42
229
Guess who's back? Back again! 🎵 @StabilityAI is back, tell a friend 🎤 Stable Diffusion 3.5 Large is here 🔥 - 🏋️ 8B parameters - Full 💪 and 🏎️💨 4-step Turbo variant - 🧾 🤝 commercial use (for orgs below 1M year/rev) - 🧨 day-0 LoRA fine-tuning support
9
44
232
31,177
SongGen is joining YuE as an open-source text-to-music (Suno-style) model Feed it a 3s voice sample (optional) → describe your song → write the lyrics 🟰 get a song!
3
32
236
13,166
DALL-E Flow is an awesome new tool by @JinaAI_'s @hxiao Like Centipede Diffusion it is a mix of models: It generates images from with DALL-E Mega, refines and creates variations with Latent Diffusion, ranks the best with CLIP and upscales the results github.com/jina-ai/dalle-flo…
11
36
228
Following the full open source release of Stable Diffusion, the @huggingface Spaces for it is out🤗 Stable Diffusion is a state-of-the-art text-to-image model that was released today by @StabilityAI #stablediffusion huggingface.co/spaces/stabil…
4
58
224
InstructPix2Pix by Tim Brooks allows you to write natural language instructions to edit images ✏️🖼️ We are getting closer and closer to "photoshop with words"! 🎨 Play with it now on @huggingface Spaces huggingface.co/spaces/timbro…
5
39
214
27,353
Check it out here, and inference it directly on @huggingface with @fal or @replicatehuggingface.co/multimodalart…
9
21
221
12,925
a mysterious new button appeared on the @huggingface Spaces Nano Banana app 👀
15
19
220
40,235
ComfyUI → @huggingface Spaces → serverless ZeroGPU ✨😌 We wrote a tutorial on how to turn any ComfyUI workflow into an easy to use Gradio app and (optionally) host it for free with ZeroGPU 💥 huggingface.co/blog/run-comf…
3
35
218
25,788
Since VQGAN+CLIP times, we've been learning to prompt with @openai CLIP knowledge (incl. SDv1, conditioned on OAI CLIP) Stable Diffusion 2 breaks that 💥 with LAION-trained CLIP, "trending on artstation", "greg rutkowski" don't work; we're all learning to prompt again! 👶
13
22
213
✨ PD12M ✨, a 12.4 million high quality image-caption dataset for AI training 🎛️, featuring: - 🤖✏️ Florence-2 synthetic captions - 🌸 Aesthetic and safety filtered from 34M superset - 🔓 only public domain images superb release by @spawning_ huggingface.co/datasets/Spaw…
6
41
211
21,988
Fuck yes! CogView 4 is out! 🔥🚀 New 6B parameters text-to-image model! 🧠🎨 Native 2048x2048 resolution! 🖼️🔍 Great prompt adherence for very long prompts! ✍️✨ Apache 2.0 license! 📜🔓
5
29
206
12,432
SDXL Flash 📸 is here! While SDXL LCM/Turbo/Lightning/Hyper do a great job in 1-4 steps, SDXL Flash gets uncompromised quality in 10 steps 💥 A new sweet intermediary spot to unlock use-cases 🍬 Model: huggingface.co/sd-community/… Demo: huggingface.co/spaces/KingNi…
8
32
206
23,923
It was missing, so I added @AnthropicAI Opus 4 Thinking and @OpenAI o3 benchmark results to the comparison mix chart 🆚🔎 Vibe check pending, but on benchmarks it seems that we got an open model competitive with Opus 4 / o3 / Gemini 2.5 🤯
🚀 We’re excited to introduce Qwen3-235B-A22B-Thinking-2507 — our most advanced reasoning model yet! Over the past 3 months, we’ve significantly scaled and enhanced the thinking capability of Qwen3, achieving: ✅ Improved performance in logical reasoning, math, science & coding ✅ Better general skills: instruction following, tool use, alignment ✅ 256K native context for deep, long-form understanding 🧠 Built exclusively for thinking mode, with no need to enable it manually. The model now natively supports extended reasoning chains for maximum depth and accuracy. Hugging Face: huggingface.co/Qwen/Qwen3-23… or huggingface.co/Qwen/Qwen3-23… ModelScope: modelscope.cn/models/Qwen/Qw… or modelscope.cn/models/Qwen/Qw… API Doc: alibabacloud.com/help/en/mod…
11
23
197
59,756
💥 If SDXL was trained with LLM as a text encoder, what would happen? 🧪 Kolors is the answer 🎨 - Kwai trained (from scratch!) an SDXL-arch model with the GLM-4 LLM as the text encoder, and it's fantastic! ▶️ Demo huggingface.co/spaces/gokayg… 📁 Model huggingface.co/Kwai-Kolors/K…
11
42
193
25,415
The Logo in Context Spaces demo + 🧨 diffusers implementation is here! 🖼️🏷️ In-Context LoRA + Image-to-Image + Inpainting → allow you to apply your logos to anything huggingface.co/spaces/multim…
10
42
198
24,914
PAG (Perturbed-Attention Guidance) is not getting nearly the attention it deserves, I've adapted it to work on SDXL with diffusers 🧨 ...and it DELIVERS! 🤯 Try it here ▶️ huggingface.co/spaces/multim… thanks to KU-CVLAB researchers: Donghoon Ahn Hyoungwon Cho et. al ❤️
Recent studies reveal that the quality of samples from diffusion models relies on techniques like CG and CFG, yet these fall short in unconditional generation and tasks like image restoration. This research paper introduces Perturbed-Attention Guidance (PAG), a novel method enhancing diffusion samples in all scenarios without extra training, offering significant improvements in tasks where traditional guidances falter. Donghoon Ahn, Hyoungwon Cho, Jaewon Min, Wooseok Jang, Jungwoo Kim, SeonHwa Kim, Hyun Hee Park, Kyong Hwan Jin, Seungryong Kim Paper: paperswithcode.com/paper/sel… Repo: github.com/KU-CVLAB/Perturbe… #ai #diffusionmodels #artificialintelligence
9
47
187
49,345
Qwen Image Edit works too well with lightx2v LoRA to run with just 8 and 4 steps, wtf? in my experience, 8 steps keeps the quality of the edits at the same level as the original model, at a 12x speedup 💨 (ofc i built a demo for it)
6
9
194
28,594
MindsEye - an open source interface to 'pilot' AI art models without using code - is now available to everyone Check it out, share it around and let me know what you think! Colab: colab.research.google.com/dr… Discord: discord.gg/Np6Ec9DG Guide and FAQ: multimodal.art/mindseye
14
31
182