AK · Oct 11, 2024 · 2:26 PM UTC

AK

Pinned Tweet

@_akhaliq

11 Oct 2024

for daily papers, authors can directly submit them here: huggingface.co/papers/submit

Hugging Face – The AI community building the future.

We’re on a journey to advance and democratize artificial intelligence through open source and open science.

huggingface.co

596

560,121

AK · Jun 2, 2023 · 3:08 PM UTC

@_akhaliq

2 Jun 2023

ChatGPT playing rock paper scissors

171

3,593

68,321

3,575,943

AK · May 19, 2023 · 5:04 AM UTC

@_akhaliq

19 May 2023

Drag Your GAN: Interactive Point-based Manipulation on the Generative Image Manifold paper page: huggingface.co/papers/2305.1…

310

5,428

20,685

3,407,502

AK · Jun 9, 2023 · 6:22 AM UTC

@_akhaliq

9 Jun 2023

Fixing things with AI

142

2,060

17,449

1,884,029

AK · Jun 1, 2023 · 4:53 AM UTC

@_akhaliq

1 Jun 2023

ChatGPT, Bro just kept going?

206

1,624

15,986

1,647,681

AK · Sep 27, 2023 · 2:49 PM UTC

@_akhaliq

27 Sep 2023

1,997

15,559

777,016

AK · Jun 4, 2023 · 9:26 PM UTC

@_akhaliq

4 Jun 2023

Hey ChatGPT, finish this building...

105

1,219

12,755

1,440,155

AK · May 20, 2023 · 3:01 AM UTC

@_akhaliq

20 May 2023

chatgpt gives a random youtube link

354

12,358

1,221,465

AK · Jun 18, 2023 · 2:43 AM UTC

@_akhaliq

18 Jun 2023

Fake Apple Products, midjourney AI 1. Apple Jetpack

141

1,239

11,898

2,163,478

AK · Jun 4, 2023 · 3:12 PM UTC

@_akhaliq

4 Jun 2023

AI is taking over

161

1,209

11,020

1,608,148

AK · Mar 20, 2025 · 1:31 PM UTC

@_akhaliq

20 Mar 2025

SpatialLM just dropped on Hugging Face Large Language Model for Spatial Understanding

115

1,215

10,375

673,799

AK · Jun 11, 2023 · 4:56 PM UTC

@_akhaliq

11 Jun 2023

AI Generative fill with memes

1,353

9,316

991,540

AK · Jun 15, 2023 · 4:26 AM UTC

@_akhaliq

15 Jun 2023

real life Simpsons, midjourney AI 1. Flanders

343

724

8,364

6,166,341

AK · Jun 16, 2023 · 4:30 PM UTC

@_akhaliq

16 Jun 2023

Dogs being Human, midjourney AI 1. Golden Retriever

127

792

7,025

2,020,032

AK · Jul 4, 2023 · 10:23 PM UTC

@_akhaliq

4 Jul 2023

Harry Potter Anime using stable diffusion by u/Inner-Reflections

1,428

6,352

730,151

AK · Jan 25, 2025 · 2:54 AM UTC

@_akhaliq

25 Jan 2025

DeepSeek-R1 write a script for a bouncing yellow ball within a Rhombicosidodecahedron, make sure to handle collision detection properly. make the Rhombicosidodecahedron slowly rotate. make sure ball stays within the Rhombicosidodecahedron. implement it in p5.js

This tweet is unavailable

113

679

6,163

1,478,656

AK · Feb 10, 2025 · 2:40 PM UTC

@_akhaliq

10 Feb 2025

o3-mini prompt: make a app called chatgpt ad maker that takes in a image and does a black and white dotted image effect with sliders to adjust dot size

OpenAI

@OpenAI

10 Feb 2025

What do you want to create next?

263

6,090

1,395,000

AK · Jun 7, 2023 · 6:00 PM UTC

@_akhaliq

7 Jun 2023

What did you do with generative AI?

276

5,430

523,887

AK · Aug 27, 2022 · 5:42 PM UTC

@_akhaliq

27 Aug 2022

stable diffusion img2img web UI + workflow video github: github.com/hlky/stable-diffu… reddit thread: teddit.net/r/StableDiffusion…

927

5,283

AK · Jul 13, 2020 · 12:07 AM UTC

@_akhaliq

13 Jul 2020

stylegan2 finetuning ffhq to metfaces

1,134

5,014

AK · Oct 14, 2021 · 3:24 AM UTC

@_akhaliq

14 Oct 2021

ADOP: Approximate Differentiable One-Pixel Point Rendering abs: arxiv.org/abs/2110.06635

973

4,912

AK · Jun 14, 2023 · 3:24 PM UTC

@_akhaliq

14 Jun 2023

Celebrities if They Worked Normal Jobs, Midjourney AI 1. Tom Cruise

101

281

4,213

2,149,336

AK · Oct 19, 2022 · 8:06 PM UTC

@_akhaliq

19 Oct 2022

Mubert-Text-to-Music 🎵🎵🎵 Colab notebooks demonstrating prompt-based music generation via Mubert API GitHub: github.com/MubertAI/Mubert-T…

1,140

4,318

AK · Jan 31, 2025 · 10:29 PM UTC

@_akhaliq

31 Jan 2025

OpenAI o3-mini just one shotted this prompt: write a script for 100 bouncing yellow balls within a sphere, make sure to handle collision detection properly. make the sphere slowly rotate. make sure balls stays within the sphere. implement it in p5.js

137

399

4,237

814,801

AK · Jun 18, 2023 · 2:43 AM UTC

@_akhaliq

18 Jun 2023

6. Apple Orange

439

4,102

368,330

AK · Feb 4, 2025 · 5:37 PM UTC

@_akhaliq

4 Feb 2025

This is HUGE The AI App store is here Ask anything you want to do with AI With ~400k Apps, this is the best place to find the AI apps you need developers can build apps, users can try them out and find new apps with AI search

151

550

4,170

662,764

AK · Sep 12, 2020 · 7:59 PM UTC

@_akhaliq

12 Sep 2020

Monster Mash: A Single-View Approach to Casual 3D Modeling and Animation pdf: dcgi.fel.cvut.cz/home/sykora… project page: dcgi.fel.cvut.cz/home/sykora…

1,024

3,899

AK · Jul 18, 2023 · 11:22 PM UTC

@_akhaliq

18 Jul 2023

AI generative fill extending scenes, movies shot in portrait format by @Alex_Cerrato

Alex Cerrato

763

3,977

683,527

AK · Jul 23, 2023 · 11:05 PM UTC

@_akhaliq

23 Jul 2023

Text to image with midjourney and image to video with gen2 by @commonstyle

Creative.Edge CL+

758

3,782

668,122

AK · Jun 28, 2023 · 5:44 AM UTC

@_akhaliq

28 Jun 2023

Meme Legends, Photoshop generative fill AI by savvydone

593

3,722

458,049

AK · Mar 24, 2025 · 1:27 PM UTC

@_akhaliq

24 Mar 2025

Fork blender, call it cursor for 3D Raise $100M

3,526

432,729

AK · Apr 14, 2023 · 1:18 AM UTC

@_akhaliq

14 Apr 2023

Zip-NeRF: Anti-Aliased Grid-Based Neural Radiance Fields abs: arxiv.org/abs/2304.06706 project page: jonbarron.info/zipnerf/

588

3,391

942,495

AK · Apr 12, 2023 · 3:07 PM UTC

@_akhaliq

12 Apr 2023

BREAKING OpenAI released a implementation of Consistency Models consistency models, a new family of generative models that achieve high sample quality without adversarial training. They support fast one-step generation by design, while still allowing for few-step sampling to trade compute for sample quality github: github.com/openai/consistenc…

755

3,265

1,379,809

AK · May 10, 2023 · 1:34 PM UTC

@_akhaliq

10 May 2023

Stable Diffusion AI Deepfake De-Aged Harrison Ford SD+ControlNet+EbSynth+Fusion reddit thread: teddit.net/r/StableDiffusion…

619

3,148

551,548

AK · Mar 20, 2025 · 8:44 PM UTC

@_akhaliq

20 Mar 2025

StarVector is out on Hugging Face StarVector is a foundation model for generating Scalable Vector Graphics (SVG) code from images and text. It utilizes a Vision-Language Modeling architecture to understand both visual and textual inputs, enabling high-quality vectorization and text-guided SVG creation.

492

3,287

254,290

AK · Apr 24, 2023 · 1:21 AM UTC

@_akhaliq

24 Apr 2023

Scaling Transformer to 1M tokens and beyond with RMT Recurrent Memory Transformer retains information across up to 2 million tokens. During inference, the model effectively utilized memory for up to 4,096 segments with a total length of 2,048,000 tokens—significantly exceeding the largest input size reported for transformer models (64K tokens for CoLT5 (Ainslie et al., 2023), and 32K tokens for GPT-4 (OpenAI, 2023)). This augmentation maintains the base model’s memory size at 3.6 GB in our experiments abs: arxiv.org/abs/2304.11062 github: github.com/booydar/t5-experi…

748

3,192

1,722,421

AK · Jun 24, 2023 · 9:40 PM UTC

@_akhaliq

24 Jun 2023

midjourney version 5.2 zoom out feature: Unleashing the Potential of A Broader View

443

3,108

502,037

AK · Oct 15, 2023 · 9:55 PM UTC

@_akhaliq

15 Oct 2023

Training AI to Play Pokemon with Reinforcement Learning by @computerender github: github.com/PWhiddy/PokemonRe… youtube: piped.video/watch?v=DcYLT37I…

660

3,097

837,741

AK · Aug 7, 2023 · 3:43 PM UTC

@_akhaliq

7 Aug 2023

Celebrity Mortal Kombat, Midjourney AI + gen2 + ElevenLabs by u/fignewtgingrich

693

2,997

457,289

AK · Jun 18, 2023 · 2:43 AM UTC

@_akhaliq

18 Jun 2023

5. Apple Teleport

159

2,943

411,085

AK · Sep 2, 2021 · 2:09 AM UTC

@_akhaliq

2 Sep 2021

Eyes Tell All: Irregular Pupil Shapes Reveal GAN-generated Faces pdf: arxiv.org/pdf/2109.00162.pdf abs: arxiv.org/abs/2109.00162

703

2,962

AK · Aug 31, 2022 · 5:05 PM UTC

@_akhaliq

31 Aug 2022

DALL·E: Introducing Outpainting Extend creativity and tell a bigger story with DALL-E images of any size blog: openai.com/blog/dall-e-intro…

715

2,961

AK · Feb 17, 2023 · 2:37 AM UTC

@_akhaliq

17 Feb 2023

3D-aware Conditional Image Synthesis abs: arxiv.org/abs/2302.08509 project page: cs.cmu.edu/~pix2pix3D/

632

2,914

301,477

AK · Sep 13, 2024 · 1:50 PM UTC

@_akhaliq

13 Sep 2024

Tencent presents GameGen-O Open-world Video Game Generation We introduce GameGen-O, the first diffusion transformer model tailored for the generation of open-world video games. This model facilitates high-quality, open-domain generation by simulating a wide array of game engine features, such as innovative characters, dynamic environments, complex actions, and diverse events. Additionally, it provides interactive controllability, thus allowing for the gameplay simulation. The development of GameGen-O involves a comprehensive data collection and processing effort from scratch. We collect and build the first Open-World Video Game Dataset (OGameData), amassed extensive data from over a hundred of next-generation open-world games, employing a proprietary data pipeline for efficient sorting, scoring, filtering, and decoupled captioning. This robust and extensive OGameData forms the foundation of our model's training process. GameGen-O undergoes a two-stage training process, consisting of foundation model pretraining and instruction tuning. In the first phase, the model is pre-trained on the OGameData via the text-to-video and video continuation, endowing GameGen-O with the capability for open-domain video game generation. In the second phase, the pre-trained model is frozen, and we fine-tuned using a trainable InstructNet, which enables the production of subsequent frames based on multimodal structural instructions. This whole training process imparts the model with the ability to generate and interactively control content. In summary, GameGen-O represents a notable initial step forward in the realm of open-world video game generation via generative models. It underscores the potential of generative models to serve as an alternative to rendering techniques, which can efficiently combine creative generation with interactive capabilities.

557

2,914

366,980

AK · Apr 2, 2025 · 6:35 PM UTC

@_akhaliq

2 Apr 2025

vibe coding AI apps for free has never been easier 100% open source app, DeepSite on Hugging Face

319

2,945

395,071

AK · Jul 4, 2023 · 2:17 PM UTC

@_akhaliq

4 Jul 2023

299

2,684

289,726

AK · Mar 28, 2025 · 5:13 PM UTC

@_akhaliq

28 Mar 2025

DeepSeek-V3-0324 is next level 🤯 Someone made DeepSite, letting you vibe code your own AI app or game and host it for FREE ⬇️ Results are insane, its like cursor in the browser

353

2,731

423,945

AK · Jul 6, 2023 · 9:51 PM UTC

@_akhaliq

6 Jul 2023

Midjourney AI recreating the Original 151 Pokémon - Part 1: The Starters by u/OfficialKnockout 1. Bulbasaur #001

266

2,517

917,765

AK · Jan 9, 2025 · 4:17 AM UTC

@_akhaliq

9 Jan 2025

Agent Laboratory Using LLM Agents as Research Assistants

363

2,568

204,538

AK · Sep 30, 2022 · 12:56 AM UTC

@_akhaliq

30 Sep 2022

MDM: Human Motion Diffusion Model abs: arxiv.org/abs/2209.14916 project page: guytevet.github.io/mdm-page/

590

2,500

AK · May 23, 2022 · 9:34 PM UTC

@_akhaliq

23 May 2022

Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding project page: gweb-research-imagen.appspot… sota FID(7.27 on COCO), without ever training on COCO, human raters find Imagen samples to be on par with the COCO data itself in image-text alignment

635

2,501

AK · Jun 29, 2023 · 8:52 PM UTC

@_akhaliq

29 Jun 2023

Another Meme Legends, Photoshop generative fill AI by @SavvyDone

433

2,431

373,649

AK · Dec 20, 2023 · 2:33 AM UTC

@_akhaliq

20 Dec 2023

Apple announces LLM in a flash: Efficient Large Language Model Inference with Limited Memory paper page: huggingface.co/papers/2312.1… Large language models (LLMs) are central to modern natural language processing, delivering exceptional performance in various tasks. However, their intensive computational and memory requirements present challenges, especially for devices with limited DRAM capacity. This paper tackles the challenge of efficiently running LLMs that exceed the available DRAM capacity by storing the model parameters on flash memory but bringing them on demand to DRAM. Our method involves constructing an inference cost model that harmonizes with the flash memory behavior, guiding us to optimize in two critical areas: reducing the volume of data transferred from flash and reading data in larger, more contiguous chunks. Within this flash memory-informed framework, we introduce two principal techniques. First, "windowing'" strategically reduces data transfer by reusing previously activated neurons, and second, "row-column bundling", tailored to the sequential data access strengths of flash memory, increases the size of data chunks read from flash memory. These methods collectively enable running models up to twice the size of the available DRAM, with a 4-5x and 20-25x increase in inference speed compared to naive loading approaches in CPU and GPU, respectively. Our integration of sparsity awareness, context-adaptive loading, and a hardware-oriented design paves the way for effective inference of LLMs on devices with limited memory.

448

2,475

698,084

AK · Feb 18, 2025 · 12:11 PM UTC

@_akhaliq

18 Feb 2025

grok 3 prompt: I’d like to make a p5.js simulation of a sphere made up of ASCII numbers, rotating. The closest numbers should be pure white, and the farthest ones should fade to gray, on a black background

Javi Lopez ⛩️

@javilopen

3 Feb 2025

⚡ Simulation with o3-mini high! Ever since I was 9 and messing around with BASIC on my MSX, I’ve loved any kind of visual simulation. Now, with o3, we have the power to create anything that comes to mind with just a couple of prompts. It’s mind blowing 🤯 The prompt was: "I’d like to make a JS simulation of a sphere made up of ASCII numbers, rotating. The closest numbers should be pure white, and the farthest ones should fade to gray, on a black background." After just a few interactions, this is the result. What a time to be alive!

203

2,460

1,325,639

AK · Jan 20, 2025 · 3:32 PM UTC

@_akhaliq

20 Jan 2025

DeepSeek-R1 Coder its like cursor in the browser

190

2,470

285,606

AK · Dec 15, 2022 · 3:26 PM UTC

@_akhaliq

15 Dec 2022

Riffusion, real-time music generation with stable diffusion @huggingface model: huggingface.co/riffusion/rif… project page: riffusion.com/about

577

2,415

AK · Feb 28, 2024 · 6:41 AM UTC

@_akhaliq

28 Feb 2024

Microsoft presents The Era of 1-bit LLMs All Large Language Models are in 1.58 Bits Recent research, such as BitNet, is paving the way for a new era of 1-bit Large Language Models (LLMs). In this work, we introduce a 1-bit LLM variant, namely BitNet b1.58, in which every single parameter (or weight) of the LLM is ternary {-1, 0, 1}. It matches the full-precision (i.e., FP16 or BF16) Transformer LLM with the same model size and training tokens in terms of both perplexity and end-task performance, while being significantly more cost-effective in terms of latency, memory, throughput, and energy consumption. More profoundly, the 1.58-bit LLM defines a new scaling law and recipe for training new generations of LLMs that are both high-performance and cost-effective. Furthermore, it enables a new computation paradigm and opens the door for designing specific hardware optimized for 1-bit LLMs.

578

2,398

433,763

AK · Feb 26, 2024 · 3:48 AM UTC

@_akhaliq

26 Feb 2024

Google presents Genie Generative Interactive Environments introduce Genie, the first generative interactive environment trained in an unsupervised manner from unlabelled Internet videos. The model can be prompted to generate an endless variety of action-controllable virtual worlds described through text, synthetic images, photographs, and even sketches. At 11B parameters, Genie can be considered a foundation world model. It is comprised of a spatiotemporal video tokenizer, an autoregressive dynamics model, and a simple and scalable latent action model. Genie enables users to act in the generated environments on a frame-by-frame basis despite training without any ground-truth action labels or other domain-specific requirements typically found in the world model literature. Further the resulting learned latent action space facilitates training agents to imitate behaviors from unseen videos, opening the path for training generalist agents of the future.

499

2,284

684,303

AK · Apr 25, 2023 · 2:58 AM UTC

@_akhaliq

25 Apr 2023

Track Anything: Segment Anything Meets Videos Track-Anything is a flexible and interactive tool for video object tracking and segmentation suitable for: - Video object tracking and segmentation with shot changes. - Visualized development and data annnotation for video object tracking and segmentation. - Object-centric downstream video tasks, such as video inpainting and editing. abs: arxiv.org/abs/2304.11968 github: github.com/gaomingqi/Track-A…

459

2,295

578,613

AK · Apr 10, 2023 · 2:50 AM UTC

@_akhaliq

10 Apr 2023

Generative Agents: Interactive Simulacra of Human Behavior abs: arxiv.org/abs/2304.03442 project page: reverie.herokuapp.com/arXiv_…

474

2,253

902,565

AK · Feb 11, 2022 · 2:06 AM UTC

@_akhaliq

11 Feb 2022

Block-NeRF: Scalable Large Scene Neural View Synthesis abs: arxiv.org/abs/2202.05263 project page: waymo.com/research/block-ner…

511

2,275

AK · Jan 22, 2024 · 4:15 AM UTC

@_akhaliq

22 Jan 2024

TikTok presents Depth Anything Unleashing the Power of Large-Scale Unlabeled Data paper page: huggingface.co/papers/2401.1… demo: huggingface.co/spaces/LiheYo… Depth Anything is trained on 1.5M labeled images and 62M+ unlabeled images jointly, providing the most capable Monocular Depth Estimation (MDE) foundation models with the following features: zero-shot relative depth estimation, better than MiDaS v3.1 (BEiTL-512) zero-shot metric depth estimation, better than ZoeDepth optimal in-domain fine-tuning and evaluation on NYUv2 and KITTI

385

2,265

600,083

AK · Mar 24, 2025 · 10:28 AM UTC

@_akhaliq

24 Mar 2025

Alibaba just dropped TaoAvatar on Hugging Face Real-Time Lifelike Full-Body Talking Avatars for Augmented Reality via 3D Gaussian Splatting

296

2,252

329,263

AK · Sep 19, 2025 · 4:48 AM UTC

@_akhaliq

19 Sep 2025

Wan2.2-Animate-14B just dropped on Hugging Face Unified Character Animation and Replacement with Holistic Replication

303

2,295

157,398

AK · Apr 9, 2024 · 3:44 AM UTC

@_akhaliq

9 Apr 2024

Apple presents Ferret-UI Grounded Mobile UI Understanding with Multimodal LLMs Recent advancements in multimodal large language models (MLLMs) have been noteworthy, yet, these general-domain MLLMs often fall short in their ability to comprehend and interact effectively with

375

2,187

680,537

AK · Feb 6, 2022 · 5:52 AM UTC

@_akhaliq

6 Feb 2022

stylegan3-projector Mario github: github.com/ouhenio/stylegan3…

342

2,057

AK · Mar 16, 2023 · 5:18 PM UTC

@_akhaliq

16 Mar 2023

alpaca-lora: Code for reproducing the Stanford Alpaca InstructLLaMA result on consumer hardware github: github.com/tloen/alpaca-lora

447

2,114

1,363,903

AK · Oct 31, 2024 · 5:58 PM UTC

@_akhaliq

31 Oct 2024

chatgpt search vs perplexity

107

2,088

375,405

AK · Feb 14, 2025 · 11:41 PM UTC

@_akhaliq

14 Feb 2025

Microsoft just dropped OmniParser V2, looks incredible Turning Any LLM into a Computer Use Agent

319

2,131

226,353

AK · Aug 28, 2024 · 3:11 AM UTC

@_akhaliq

28 Aug 2024

Google presents Diffusion Models Are Real-Time Game Engines discuss: huggingface.co/papers/2408.1… We present GameNGen, the first game engine powered entirely by a neural model that enables real-time interaction with a complex environment over long trajectories at high quality. GameNGen can interactively simulate the classic game DOOM at over 20 frames per second on a single TPU. Next frame prediction achieves a PSNR of 29.4, comparable to lossy JPEG compression. Human raters are only slightly better than random chance at distinguishing short clips of the game from clips of the simulation. GameNGen is trained in two phases: (1) an RL-agent learns to play the game and the training sessions are recorded, and (2) a diffusion model is trained to produce the next frame, conditioned on the sequence of past frames and actions. Conditioning augmentations enable stable auto-regressive generation over long trajectories.

448

2,095

493,090

AK · Jul 18, 2023 · 4:01 PM UTC

@_akhaliq

18 Jul 2023

Meta releases Llama 2: Open Foundation and Fine-Tuned Chat Models paper: ai.meta.com/research/publica… blog: ai.meta.com/llama/ develop and release Llama 2, a collection of pretrained and fine-tuned large language models (LLMs) ranging in scale from 7 billion to 70 billion parameters. Our fine-tuned LLMs, called Llama 2-Chat, are optimized for dialogue use cases. Our models outperform open-source chat models on most benchmarks we tested, and based on our human evaluations for helpfulness and safety, may be a suitable substitute for closedsource models. We provide a detailed description of our approach to fine-tuning and safety improvements of Llama 2-Chat in order to enable the community to build on our work and contribute to the responsible development of LLMs.

544

2,086

637,638

AK · May 1, 2022 · 4:19 AM UTC

@_akhaliq

1 May 2022

everyone on ML twitter right now

160

2,028

AK · Aug 12, 2020 · 12:56 AM UTC

@_akhaliq

12 Aug 2020

Vid2Player: Controllable Video Sprites that Behave and Appear like Professional Tennis Players pdf: arxiv.org/pdf/2008.04524.pdf abs: arxiv.org/abs/2008.04524 project page: cs.stanford.edu/~haotianz/re…

515

2,017

AK · Feb 3, 2025 · 6:20 PM UTC

@_akhaliq

3 Feb 2025

this looks insane, MatAnyone Stable Video Matting with Consistent Memory Propagation

202

2,068

310,421

AK · Sep 29, 2022 · 1:48 PM UTC

@_akhaliq

29 Sep 2022

make-a-video: text-to-video generation without text-video data paper: makeavideo.studio/Make-A-Vid… project page: makeavideo.studio/

460

2,010

AK · Jun 1, 2023 · 3:36 PM UTC

@_akhaliq

1 Jun 2023

AI will take over the world?

279

1,922

222,557

AK · Oct 11, 2021 · 9:19 PM UTC

@_akhaliq

11 Oct 2021

stylegan3 is out github: github.com/NVlabs/stylegan3

377

1,981

AK · Jun 18, 2023 · 2:43 AM UTC

@_akhaliq

18 Jun 2023

3. Apple Jeans

1,851

251,132

AK · May 10, 2023 · 4:47 AM UTC

@_akhaliq

10 May 2023

One is Midjourney 5.1, the other is real. Which one is which? reddit thread: teddit.net/r/midjourney/comm…

441

204

1,892

1,029,024

AK · Jun 15, 2023 · 4:31 AM UTC

@_akhaliq

15 Jun 2023

10. Marge

196

1,813

5,552,719

AK · Sep 20, 2023 · 1:54 AM UTC

@_akhaliq

20 Sep 2023

Language Modeling Is Compression paper page: huggingface.co/papers/2309.1… It has long been established that predictive models can be transformed into lossless compressors and vice versa. Incidentally, in recent years, the machine learning community has focused on training increasingly large and powerful self-supervised (language) models. Since these large language models exhibit impressive predictive capabilities, they are well-positioned to be strong compressors. In this work, we advocate for viewing the prediction problem through the lens of compression and evaluate the compression capabilities of large (foundation) models. We show that large language models are powerful general-purpose predictors and that the compression viewpoint provides novel insights into scaling laws, tokenization, and in-context learning. For example, Chinchilla 70B, while trained primarily on text, compresses ImageNet patches to 43.4% and LibriSpeech samples to 16.4% of their raw size, beating domain-specific compressors like PNG (58.5%) or FLAC (30.3%), respectively. Finally, we show that the prediction-compression equivalence allows us to use any compressor (like gzip) to build a conditional generative model.

364

1,930

774,788

AK · Sep 8, 2023 · 6:18 AM UTC

@_akhaliq

8 Sep 2023

Tracking Anything with Decoupled Video Segmentation paper page: huggingface.co/papers/2309.0… Training data for video segmentation are expensive to annotate. This impedes extensions of end-to-end algorithms to new video segmentation tasks, especially in large-vocabulary settings. To 'track anything' without training on video data for every individual task, we develop a decoupled video segmentation approach (DEVA), composed of task-specific image-level segmentation and class/task-agnostic bi-directional temporal propagation. Due to this design, we only need an image-level model for the target task (which is cheaper to train) and a universal temporal propagation model which is trained once and generalizes across tasks. To effectively combine these two modules, we use bi-directional propagation for (semi-)online fusion of segmentation hypotheses from different frames to generate a coherent segmentation. We show that this decoupled formulation compares favorably to end-to-end approaches in several data-scarce tasks including large-vocabulary video panoptic segmentation, open-world video segmentation, referring video segmentation, and unsupervised video object segmentation.

398

1,945

305,622

AK · Aug 4, 2023 · 2:50 PM UTC

@_akhaliq

4 Aug 2023

332

1,910

188,849

AK · Nov 6, 2021 · 5:13 PM UTC

@_akhaliq

6 Nov 2021

.@Gradio Demo for AnimeGANv2 Face Portrait v2 now on @huggingface Spaces demo: huggingface.co/spaces/akhali… github: github.com/bryandlee/animega…

298

1,903

AK · Dec 22, 2019 · 1:22 AM UTC

@_akhaliq

22 Dec 2019

#StyleGAN2 interps

364

1,874

AK · Jun 15, 2023 · 4:29 AM UTC

@_akhaliq

15 Jun 2023

6. Apu

1,774

682,725

AK · Apr 12, 2023 · 9:09 PM UTC

@_akhaliq

12 Apr 2023

Its over run Large Language Models like LLaMA, llama.cpp, GPT-J, Pythia, OPT, GALACTICA, gpt4all, auto-gpt easily in a web ui, free, and open source github: github.com/oobabooga/text-ge…

GitHub - oobabooga/textgen: Open-source desktop app for local LLMs. Text, vision, tool-calling,...

Open-source desktop app for local LLMs. Text, vision, tool-calling, OpenAI/Anthropic-compatible API. 100% private. - oobabooga/textgen

github.com

442

1,927

261,519

AK · Mar 21, 2025 · 4:15 AM UTC

@_akhaliq

21 Mar 2025

ByteDance just announced InfiniteYou available on Hugging Face Flexible Photo Recrafting While Preserving Your Identity

220

1,959

222,303

AK · Feb 3, 2023 · 2:38 AM UTC

@_akhaliq

3 Feb 2023

Dreamix: Video Diffusion Models are General Video Editors abs: arxiv.org/abs/2302.01329 project page: dreamix-video-editing.github… present diffusion-based method that is able to perform text-based motion and appearance editing of general videos

414

1,868

398,162

AK · Aug 23, 2023 · 7:05 PM UTC

@_akhaliq

23 Aug 2023

Got married 💍

221

1,882

138,184

AK · Jan 3, 2024 · 2:16 AM UTC

@_akhaliq

3 Jan 2024

JPMorgan announces DocLLM A layout-aware generative language model for multimodal document understanding paper page: huggingface.co/papers/2401.0… Enterprise documents such as forms, invoices, receipts, reports, contracts, and other similar records, often carry rich semantics at the intersection of textual and spatial modalities. The visual cues offered by their complex layouts play a crucial role in comprehending these documents effectively. In this paper, we present DocLLM, a lightweight extension to traditional large language models (LLMs) for reasoning over visual documents, taking into account both textual semantics and spatial layout. Our model differs from existing multimodal LLMs by avoiding expensive image encoders and focuses exclusively on bounding box information to incorporate the spatial layout structure. Specifically, the cross-alignment between text and spatial modalities is captured by decomposing the attention mechanism in classical transformers to a set of disentangled matrices. Furthermore, we devise a pre-training objective that learns to infill text segments. This approach allows us to address irregular layouts and heterogeneous content frequently encountered in visual documents. The pre-trained model is fine-tuned using a large-scale instruction dataset, covering four core document intelligence tasks. We demonstrate that our solution outperforms SotA LLMs on 14 out of 16 datasets across all tasks, and generalizes well to 4 out of 5 previously unseen datasets.

341

1,896

352,903

AK · Oct 6, 2022 · 2:54 PM UTC

@_akhaliq

6 Oct 2022

A implementation of text-to-3D dreamfusion, powered by stable diffusion github: github.com/ashawkey/stable-d…

397

1,829

AK · Oct 18, 2022 · 1:04 AM UTC

@_akhaliq

18 Oct 2022

Imagic: Text-Based Real Image Editing with Diffusion Models abs: arxiv.org/abs/2210.09276

396

1,850

AK · Sep 29, 2022 · 5:44 PM UTC

@_akhaliq

29 Sep 2022

DreamFusion: Text-to-3D using 2D Diffusion paper: openreview.net/pdf?id=FjNys5… abs: openreview.net/forum?id=FjNy… project page: dreamfusionpaper.github.io/ DeepDream on a pretrained 2D diffusion model enables text-to-3D synthesis

382

1,808

AK · Feb 21, 2025 · 10:11 PM UTC

@_akhaliq

21 Feb 2025

20 second tutorial on making apps with Grok 3 and deploying on Hugging Face example showing gradio app with halftone effect

310

1,799

4,896,039

AK · Jan 24, 2025 · 9:32 PM UTC

@_akhaliq

24 Jan 2025

ByteDance announces Doubao-1.5-pro - Includes a "Deep Thinking" mode, surpassing O1-preview and O1 models on the AIME benchmark. - Outperforms deepseek-v3, gpt4o, and llama3.1-405B on popular benchmarks. - Built on a MoE architecture, with activated parameters far fewer than those in the above models. - Achieves a 7x MoE performance leverage—delivering dense model performance with just 1/7 of the activated parameters (e.g., 20B activated params = 140B dense performance). - Engineering-wise, features heterogeneous system design for prefill-decode and attn-fffn, maximizing throughput under low-latency requirements.

267

1,831

389,914

AK · Jun 9, 2023 · 2:25 PM UTC

@_akhaliq

9 Jun 2023

Meta just released MusicGen, a simple and controllable model for music generation MusicGen is a single stage auto-regressive Transformer model trained over a 32kHz EnCodec tokenizer with 4 codebooks sampled at 50 Hz. Unlike existing methods like MusicLM, MusicGen doesn't not require a self-supervised semantic representation, and it generates all 4 codebooks in one pass. By introducing a small delay between the codebooks, can predict them in parallel, thus having only 50 auto-regressive steps per second of audio try out the @Gradio demo: huggingface.co/spaces/facebo… Models on @huggingface: huggingface.co/models?other=… github: github.com/facebookresearch/…

386

1,772

627,444

AK · Jun 18, 2023 · 2:43 AM UTC

@_akhaliq

18 Jun 2023

7. Apple Ship

1,712

288,774

AK · Mar 21, 2025 · 10:12 PM UTC

@_akhaliq

21 Mar 2025

Alibaba just released LHM on Hugging Face Large Animatable Human Reconstruction Model from a Single Image in Seconds

242

1,754

170,407

AK · Dec 23, 2022 · 2:38 AM UTC

@_akhaliq

23 Dec 2022

GeoCode: Interpretable Shape Programs abs: arxiv.org/abs/2212.11715 project page: threedle.github.io/GeoCode/

267

1,720

1,429,189