Edward Ahn · Feb 19, 2024 · 4:07 PM UTC

Edward Ahn

Pinned Tweet

Edward Ahn

@edwardahn9

19 Feb 2024

If you like 3D graphics, I wrote a high-level post on the difference between Neural Radiance Fields (NeRFs) and 3D Gaussian Splatting (3DGS). It's a ten min read, and aims to educate those not as familiar with the field of neural rendering. Link and some snippets below. 1/5

532

74,466

Edward Ahn · May 29, 2025 · 4:33 PM UTC

Edward Ahn

@edwardahn9

29 May 2025

This is live. Real-time. On a Quest 3. Feels like you're actually talking to a real person. Because you are. Couldn't wait around while Meta burns another $60B on this. 🔥

391

64,892

Edward Ahn · Apr 2, 2025 · 3:22 PM UTC

Edward Ahn

@edwardahn9

2 Apr 2025

Spot the difference? Left: our avatar system. Right: Apple Persona. Both captured on VR headsets, both 3D. Been quietly building something that actually looks like you, not your weird digital cousin 😏 Follow me for updates as we bring truly realistic 3D avatars to life ✨

196

17,213

Edward Ahn · Jun 9, 2025 · 9:14 PM UTC

Edward Ahn

@edwardahn9

9 Jun 2025

again so bullish with this approach - feedforward gaussian splats -> very fast reconstruction of the environment, no lack of memory issue - large scale training -> only one camera needed so much hype around the recent 4dv approach (see link in comments) but i'm way more excited about this

MrNeRF

@janusch_patas

9 Jun 2025

4DGT: Learning a 4D Gaussian Transformer Using Real-World Monocular Videos Abstract: We propose 4DGT, a 4D Gaussian-based Transformer model for dynamic scene reconstruction, trained entirely on real-world monocular posed videos. Using 4D Gaussian as an inductive bias, 4DGT unifies static and dynamic components, enabling the modeling of complex, time-varying environments with varying object lifespans. We introduced a novel density control strategy in training, which allows our 4DGT to handle longer space-time input while maintaining efficient rendering at runtime. Our model processes 64 consecutive posed frames in a rolling-window fashion, predicting consistent 4D Gaussians in the scene. Unlike optimization-based methods, 4DGT performs purely feed-forward inference, reducing reconstruction time from hours to seconds and scaling effectively to long video sequences. Trained only on large-scale monocular posed video datasets, 4DGT can significantly outperform prior Gaussian-based networks in real-world videos and achieve on-par accuracy with optimization-based methods on cross-domain videos.

101

5,971

Edward Ahn · Jul 29, 2024 · 5:57 PM UTC

Edward Ahn

@edwardahn9

29 Jul 2024

Wow. Just tried the Google Project Starline demo, amazing work by the team! It’s hard to describe how cool and futuristic the experience felt, and I can’t wait for the public to try it out. My impressions as someone working in 3D and ML:

21,143

Edward Ahn · Jun 12, 2025 · 7:45 PM UTC

Edward Ahn

@edwardahn9

12 Jun 2025

Apple could really run away with the growing XR market if they produce a lighter headset (doesn't even have to be that much cheaper imo, "free is not cheap is enough" a la palmer luckey) visionOS just works and Apple actually shipped

UploadVR

@UploadVR

12 Jun 2025

Meta shipped the Quest v77 update knowing it had bugs, the company's CTO has acknowledged, despite his pledge to improve quality control. Details here: uploadvr.com/meta-shipped-qu…

11,271

Edward Ahn · Dec 26, 2023 · 5:19 AM UTC

Edward Ahn

@edwardahn9

26 Dec 2023

Replying to @andrewpprice

I do a lot of work in this area. To start off, both methods are used for 3d reconstruction, or the ability to fully reconstruct a scene from any perspective given a few images of the scene. For people with no graphics background: nerfs predict the color at each individual pixel (point on the screen), whereas splats “splat” a bunch of colored blobs together until the picture is made. For people with some tech/gaming background: nerfs use ray tracing to create a scene, whereas gaussian splats use rasterization create a scene. they both use machine learning to “remember” the colors in the scene to rasterize or ray trace. For people with a tech background: nerfs are neural networks that predict each pixel’s color given a ray direction while splats create and modify millions of colored/transparent blobs (3d gaussians) until they form a scene using gradient-based optimization. both are trained from a limited number of viewpoints of a scene (the training data), and allows “novel view synthesis” (the ability to view the scene from new viewpoints not included in the training data). Still oversimplifying but if there’s interest I can expand on this in a blog post.

6,928

Edward Ahn · Jul 30, 2025 · 3:45 PM UTC

Edward Ahn

@edwardahn9

30 Jul 2025

Reminder that headset weights are not trending down (yet) They will, but most improvement has been in display resolution the past decade ie. brace yourselves fellow XR folk, more winters to come. strengthen those neck muscles! p.s. AVP is actually weighs average

5,285

Edward Ahn · Mar 24, 2025 · 9:47 PM UTC

Edward Ahn

@edwardahn9

24 Mar 2025

Can't wait to launch something much better soon... just a couple more bugs to fix :)

UploadVR

@UploadVR

23 Mar 2025

Quest's Horizon OS v76 PTC lets you use your Meta Avatar as a virtual webcam in video calling apps. Details here: uploadvr.com/quest-v76-ptc-m…

4,542

Edward Ahn · Feb 7, 2024 · 10:05 AM UTC

Edward Ahn

@edwardahn9

7 Feb 2024

Replying to @AlanMCole

Hard disagree. Once you learn some advanced algorithms, you’re able to solve the cube intuitively, at which point each solve is a new puzzle. It never gets old! I still solve it a decade later. For beginners with no intuition though, I can see why it’s gets boring.

3,552

Edward Ahn · Jun 12, 2025 · 4:27 PM UTC

Edward Ahn

@edwardahn9

12 Jun 2025

Continually surprised that there's not much coverage on the new Personas for the Vision Pro... it's SO GOOD, one of the best updates of WWDC imo The only problem is I have no one to call.. Anyone want to get on a call with me?

3,913

Edward Ahn · Dec 17, 2024 · 8:44 PM UTC

Edward Ahn

@edwardahn9

17 Dec 2024

Replying to @dankuntz

this is very very cool is the code online or app released? would love to have it on my phone..

3,798

Edward Ahn · Sep 17, 2024 · 9:54 PM UTC

Edward Ahn

@edwardahn9

17 Sep 2024

Replying to @falmenara @bilawalsidhu

yea that alone turned me off. i want to own my devices.

2,531

Edward Ahn · Apr 3, 2023 · 9:58 PM UTC

Edward Ahn

@edwardahn9

3 Apr 2023

Pretty crazy what you can do when you build an AI-native 3D design tool with #ChatGPT. It understood what a Rubik's Cube was and knew to only rotate the 9 cubes on the right (wrt the global x axis) by itself. Also, all on my VR headset in #WebXR 😎

13,121

Edward Ahn · Feb 19, 2024 · 4:07 PM UTC

Edward Ahn

@edwardahn9

19 Feb 2024

Link: edwardahn.me/writing/NeRFvs3… I'll be writing these types of posts every couple of weeks, so if you're interested please give me a follow!

1,877

Edward Ahn · Aug 7, 2024 · 11:36 PM UTC

Edward Ahn

@edwardahn9

7 Aug 2024

Replying to @leonsilicon

sad to not see vim or dare i say even emacs

12,946

Edward Ahn · May 30, 2025 · 3:37 PM UTC

Edward Ahn

@edwardahn9

30 May 2025

Lots of people asking how it’s done or asking to try it out! I’m a bit in a time crunch right now, so I’ll address those in follow-up posts in a couple weeks. Follow me if you’re interested ❤️

Edward Ahn

@edwardahn9

29 May 2025

This is live. Real-time. On a Quest 3. Feels like you're actually talking to a real person. Because you are. Couldn't wait around while Meta burns another $60B on this. 🔥

2,349

Edward Ahn · Jul 19, 2024 · 4:21 PM UTC

Edward Ahn

@edwardahn9

19 Jul 2024

Really enjoyed attending and learned a ton of great insights from the @fdotinc panel featuring @Azadux @jmdagdelen @cixliv @JackSouthardVR. highly recommend @SVVRLIVE for anyone looking to do anything in XR and I’ll definitely be back!!

Azad Balabanian @Azadux

19 Jul 2024

Great night at @SVVRLIVE tonight!

1,801

Edward Ahn · May 7, 2025 · 8:46 PM UTC

Edward Ahn

@edwardahn9

7 May 2025

Like I keep saying feed-forward gaussian splatting is the future. Real-time reconstruction + real-time rendering from 3DGS means we can finally have real-time 4D content. The very content we complain about lacking in ar/vr!

MrNeRF

@janusch_patas

7 May 2025

GUAVA: Generalizable Upper Body 3D Gaussian Avatar Contributions: • We propose GUAVA, the first framework for generalizable upper-body 3D Gaussian avatar reconstruction from a single image. Using projection sampling and inverse texture mapping, GUAVA enables fast feed-forward inference to reconstruct Ubody Gaussians from the image. • We introduce an expressive human template model with a corresponding upper-body tracking framework, providing an accurate prior for reconstruction. • Extensive experiments show that GUAVA outperforms existing methods in rendering quality and significantly outperforms 2D diffusion-based methods in speed, offering fast reconstruction and real-time animation.

1,616

Edward Ahn · Jun 3, 2025 · 4:11 PM UTC

Edward Ahn

@edwardahn9

3 Jun 2025

incredible and much needed!! for those that don’t know basically all performant splat renderers for the web/javascript were proprietary, meaning it’s surprisingly difficult to render splats on a production level this unlocks so much for the community

Ben Mildenhall

@BenMildenhall

2 Jun 2025

At @theworldlabs, we built a new Gaussian splatting web renderer with all the bells and whistles we needed to make splats a first-class citizen of the incredible @threejs ecosystem. Today, we're open sourcing Forge under the MIT license.

936

Edward Ahn · Jun 20, 2024 · 4:12 PM UTC

Edward Ahn

@edwardahn9

20 Jun 2024

Replying to @jasonyuan

so cool to see tech helping human problems, problems that aren’t really quantifiable and don’t have direct, one-step solutions. as a founder, also cool to see that dot got you to open up more about founder life, so much so that you’re now posting publicly about it!

663

Edward Ahn · Jan 21, 2025 · 4:31 PM UTC

Edward Ahn

@edwardahn9

21 Jan 2025

There's something beautiful about working so hard on something and finally seeing it come to fruition. Really excited to show everyone what I'm working on - cross-platform 3D avatars that enhances the feeling of presence. Avatars way better than Apple's Personas. Stay tuned!

532

Edward Ahn · Sep 16, 2024 · 3:32 PM UTC

Edward Ahn

@edwardahn9

16 Sep 2024

Playing around with real-time face tracking and fidelity is much better than I expected! Thinking of putting this in VR for fun.. wonder how hard it'd be to show up with just this in VRChat

1,955

Edward Ahn · Dec 26, 2023 · 10:15 PM UTC

Edward Ahn

@edwardahn9

26 Dec 2023

Replying to @andrewpprice

These are really good questions. Let me try my best... 1. The process is different. The only similarity is that for both, you feed it a few input images with different viewpoints of the scene/object you want to reconstruct in 3D. - For NeRFs, you train a neural network to output a color for a given ray (ie. a 3D position and viewing direction). Put differently, for each input image, you use a neural network to predict the color of each pixel (which is a ray if you think about it) in that image. You then compare this color to the actual color of the pixel in the input image. You continue training over all input images until the network can predict the same colors. The difference between this and normal ray tracing is that in ray tracing, you actually simulate how light bounces for each ray to get a color. Here, you have the neural network do that for you. - For 3DGS, for every step in training, you're either spawning a new 3D gaussian (a 'blob' or a 'splat') or modifying existing 3D gaussians (such as making it longer or changing the color/transparency). Then, for a given input image, you render an image via rasterization from the same viewpoint as your input image. You compare this generated image with your input image. You keep spawning/modifying gaussians until your generated images look the same as your input images. Trained 3DGS scenes can have millions of gaussians. 2. NeRFs can't directly be converted to 3DGS or vice versa. The NeRF is essentially a neural network (a file with a list of weights), and 3DGS is essentially a file with a list of gaussian parameters (ex. color of the gaussian, size of the gaussian, position of the gaussian). That said, there is work that can convert NeRFs to meshes, meshes to NeRFs, 3DGS to meshes, meshes to 3DGS... so I'm sure you could if you wanted to. 3. No, the comparison with ray tracing doesn't imply that NeRFs are more accurate than 3DGS. To explain a bit better, NeRFs and 3DGS use different rendering techniques to generate images. NeRFs use ray tracing, but instead of doing fancy math to find out which color the ray should return, you're getting the neural network to guess the color. 3DGS has millions of gaussians, and given these gaussians it uses rasterization to render images. In other words, the use of techniques doesn't imply which one is more accurate. For example, if you have enough gaussians in your scene, your scene could be more accurate than NeRFs. This is slightly different from traditional ray tracing. In NeRFs, we're just talking about how to render an image by predicting each ray to match our input images. In traditional ray tracing, you don't have ground truth input images. You just have a scene with meshes and you need to know how to color it properly to make your image look realistic. Simulating the way light moves does this pretty well. An eli5 explanation of this would be that we have different goals. In 3D reconstruction, we have examples of what the scene look like (our input images), so we just have to copy it, and both 3DGS and NeRFs are good at copying despite using different rendering techniques. In traditional rendering, you don't have examples to copy, and so without examples you do want to simulate light as much as possible to get something realistic. Hence, in traditional rendering, ray tracing is more 'accurate' than rasterization. Hope that makes sense, but having trouble explaining the 3rd question well. Obligatory disclaimer that for all of these answers, there's still some simplification, but the general gist is there.

1,295

Edward Ahn · Jan 20, 2024 · 7:49 AM UTC

Edward Ahn

@edwardahn9

20 Jan 2024

Replying to @jmdagdelen

That is pretty annoying, as i’d like to have the ability to show people demos and use the headset without having to show them my personal stuff. I hope that changes.

1,750

Edward Ahn · Jun 7, 2025 · 4:28 PM UTC

Edward Ahn

@edwardahn9

7 Jun 2025

Crazy cool!!! Note all of these scenes had ~20 different cameras filming at once, all time synced We need to have some large scale training so that anyone can create these scenes with just 1 camera Exciting times we live in, can’t wait for true memory preservation

Will Eastcott

@willeastcott

6 Jun 2025

Oh wow oh wow oh wow! 4D Gaussian Splatting built on top of the @PlayCanvas Engine by Chinese company 4DV. 😍

1,468

Edward Ahn · Jul 10, 2025 · 3:55 PM UTC

Edward Ahn

@edwardahn9

10 Jul 2025

The quality bar at Apple was making calls good enough so you could deliver unfortunate news to someone. Imagine your boss firing you with a cartoon avatar. How does Meta want AR/VR to take off and then ship these avatars?? It isn't that hard:

UploadVR

@UploadVR

9 Jul 2025

Zoom is now freely available on Quest headsets, through an official 2D Android app on the Meta Horizon Store: uploadvr.com/zoom-now-availa…

1,461

Edward Ahn · May 15, 2024 · 3:11 PM UTC

Edward Ahn

@edwardahn9

15 May 2024

There’s something quite magical about using 3D to create a better sense of presence. Here’s an old demo of a 3D avatar system we built that we haven’t rendered on a headset-less 3D display — until now. It’s wayyy cooler in-person when you can see the 3D. Details below: 1/6

1,047

Edward Ahn · Mar 20, 2024 · 7:22 AM UTC

Edward Ahn

@edwardahn9

20 Mar 2024

Surprisingly, the most exciting thing I saw at GTC today wasn’t the robots (maybe because I used to work on robots) or the AR/VR headsets, but the stunning @LeiaInc displays. Even as someone who’s worked on the Vision Pro.. it’s exciting to see a headset-less future

1,232

Edward Ahn · May 29, 2025 · 5:13 PM UTC

Edward Ahn

@edwardahn9

29 May 2025

Replying to @ScienceArt

it’s 3d calling on an xr headset that’s good enough to make you feel like the person you’re calling is actually there!

1,471

Edward Ahn · Jun 14, 2024 · 3:32 PM UTC

Edward Ahn

@edwardahn9

14 Jun 2024

I gathered data on VR headsets released over the past decade and made some interesting discoveries! 🧵 1. Pixel density is increasing dramatically for both consumer and enterprise headsets. Higher PPD means higher resolution. We can't distinguish real vs. virtual from 60 PPD.

738

Edward Ahn · Jan 28, 2025 · 4:37 PM UTC

Edward Ahn

@edwardahn9

28 Jan 2025

It is weirdly validating when I'm desperately trying to debug this problem by scouring the internet, asking Claude, etc... and then DeepSeek R1 says "Okay, this problem seems pretty complex, involving CUDA, OpenGL, OpenXR, and multithreading." Validation from AI is nice

358

Edward Ahn · Jan 30, 2024 · 8:49 PM UTC

Edward Ahn

@edwardahn9

30 Jan 2024

No way in hell. The EyeSight display imo is one of the best UX choices Apple made. It makes it such that users can seamlessly interact with people in the real world without having to take off the headset. Otherwise you’re stuck taking the headset on and off like any other headset to have a genuine conversation. Or people avoid talking to you even though you see them via passthrough. Obviously EyeSight isn’t good enough yet but it will be, like any other technology. This is a bad take.

Quinn Nelson

@SnazzyLabs

30 Jan 2024

The most surprising takeaway from all the Vision Pro reviews/videos is how universally awful the EyeSight display is. Until today, I expected it to be super important to the “I’m still in the real world” experience. Now, I’m 95% sure it’ll be canned by the 2nd gen.

5,917

Edward Ahn · Jul 8, 2024 · 6:39 PM UTC

Edward Ahn

@edwardahn9

8 Jul 2024

Founders overestimate their health. Like other founders, I worked a ton of hours, juggled too much, and neglected my health. Then a health scare hit. (all is well!) It's not worth sleeping little for that extra hour to push that commit. Two steps forward is nothing if health sets you three days back. I've gotten much less sick (and way more productive) by eating well, sleeping well, and exercising regularly. Go for that walk. Get that doctor check-up. Make a salad. Visit your dentist. Integrate health into your daily process and don't make it a "chore". Even a simple cold can set you back a week.

751

Edward Ahn · Jan 14, 2025 · 7:38 PM UTC

Edward Ahn

@edwardahn9

14 Jan 2025

Hell yeah, so bullish on feed-forward 3DGS

MrNeRF

@janusch_patas

14 Jan 2025

F3D-Gaus: Feed-forward 3D-aware Generation on ImageNet with Cycle-Consistent Gaussian Splatting Contributions: • We pioneer 3D-aware generation using generalizable feed-forward Gaussian Splatting representation, achieving significant efficiency and favorable rendering quality on monocular datasets. • We significantly advance the capability of pixel-aligned Gaussian Splatting representations by designing a self-supervised cycle training strategy specifically tailored for monocular datasets. • We further mitigate the artifacts of 3D-aware representations caused by large viewpoint shifts by introducing geometry-aware video priors.

498

Edward Ahn · Jul 29, 2024 · 5:57 PM UTC

Edward Ahn

@edwardahn9

29 Jul 2024

Did I say it just works? This is game-changing technology for tech-illiterate people, which is most people in the world. I don’t trust my grandparents to use a headset. I’m a big believer that there’s room for both this and AR/VR calling (and more) to solve telepresence.

851

Edward Ahn · Jul 29, 2024 · 5:57 PM UTC

Edward Ahn

@edwardahn9

29 Jul 2024

The 3D effect is stunning. The massive 65-inch display allows parallax to be shown in a wide FOV, and because it’s autostereoscopic you just sit down and it works. No glasses necessary. Also, no discernible issues with frame rate and latency.

1,350

Edward Ahn · Jul 17, 2024 · 8:44 PM UTC

Edward Ahn

@edwardahn9

17 Jul 2024

Coding shaders so much more unintuitive than I thought. I'm having to wrap my head around coding in an entirely new paradigm when I thought it'd just be learning how to write in a new language. I guess I'm surprised there isn't a higher-order language that takes care of this.

358

Edward Ahn · Jan 15, 2025 · 2:45 AM UTC

Edward Ahn

@edwardahn9

15 Jan 2025

This is the kind of cool side project I miss doing 2025 will be the year of more side projects

Dexerto

@Dexerto

14 Jan 2025

A 17-year-old jailbroke his smart glasses to automatically show the best moves during his chess games

296

Edward Ahn · Jul 29, 2024 · 5:57 PM UTC

Edward Ahn

@edwardahn9

29 Jul 2024

It does seem to be slightly jittery on the edges, and has subtle specularity issues that only a trained eye could see. But the feeling of presence removes those issues entirely. ie. I didnt see any technical issues that would cause people to not use it.

1,229

Edward Ahn · Jul 12, 2024 · 3:51 PM UTC

Edward Ahn

@edwardahn9

12 Jul 2024

Used to think as an engineer that tools like this were dumb. But making the implicit details explicit using frameworks like this has saved me so much time as a founder. Also @HBO can you make another season of Silicon Valley starring ChatGPT pls

399

Edward Ahn · Jul 19, 2025 · 11:10 PM UTC

Edward Ahn

@edwardahn9

19 Jul 2025

Replying to @cixliv @rek

the kind of thing where as a past robotics researcher i'd be explaining why you don't do a, b, c, d, etc. and be a debbie downer but the kind of thing where as a startup founder i say hellll yeah keep going 😎

2,731

Edward Ahn · Jun 26, 2024 · 5:31 PM UTC

Edward Ahn

@edwardahn9

26 Jun 2024

Has everyone seen this before except me? Panasonic's "Silky Fine Mist" demo from last year, showing light projection on top of its mist. Super cool, and seems to have that 3D "holographic" effect. Use case is less clear, but damn I'd love to see this in-person.

469

Edward Ahn · Apr 2, 2025 · 3:22 PM UTC

Edward Ahn

@edwardahn9

2 Apr 2025

Oh and here's a real photo of me if you don't know what I look like..

998

Edward Ahn · Jan 24, 2024 · 3:54 PM UTC

Edward Ahn

@edwardahn9

24 Jan 2024

Replying to @hrafntho

Agreed, can’t find myself concentrating on work with these avatars. Luckily, this is what I’m working on - photorealistic 3D avatars! Public launch coming soon next month.

536

Edward Ahn · May 20, 2025 · 7:22 PM UTC

Edward Ahn

@edwardahn9

20 May 2025

Thoughts on Google I/O's XR-related updates: - No updates on Project Moohan, but it's coming out this year? Surprised at how little details there are - Glasses were super impressive -- it's exactly what I want, where AI feels frictionless and a high throughput highway between my mind and the computer is established. But when will it be out? They said dev access will be available later this year, but this almost feels like Meta's Orion; nowhere near shipping. - No 3D calling. It seems only Apple has this 3D avatar tech. This could be because of a lot of reasons -- not enough users to warrant developing it, performance concerns for 3D calling, avatars still bordering research territory, or even something as simple as Project Moohan not having face tracking cameras. We'll see. - Is utility enough to spark mass adoption of XR? All of the demos show devices being useful to the user.. but how many times do I really need to translate something, use directions, or remember where my keys are? I think we need some sort of new social experience, as throughout history every new tech seems to have been adopted by the public via some sort of social network effect. Gemini might not be the killer XR app that Google thinks it is (I think telepresence is but I may be biased..) - Lastly, beating the dead horse but... I can't help but worry about privacy. I feel the most comfortable with Apple having constant access to my environment given their different business model, but obviously Apple's dropping the ball here (for now at least). They didn't talk much (if at all) about privacy measures The general feeling is this feels like Orion where it's more marketing for the tech. If this is true, it sucks for the industry because it continues this sentiment of "XR is here" when it's still years and years away from being consumer-grade. The hype cycle really hurts startups. That said, if Android XR is really released this year, I'm (cautiously) excited! Fingers crossed, I'd love to hack on it

608

Edward Ahn · Feb 19, 2024 · 4:07 PM UTC

Edward Ahn

@edwardahn9

19 Feb 2024

Both NeRFs and 3DGS solve novel view synthesis: "given a few images of some 3D scene taken from different camera viewpoints, can we generate an image of this scene from any new camera viewpoint?" This has wide-ranging applications in AR/VR and 3D applications. 2/5

1,939

Edward Ahn · Jan 19, 2024 · 1:14 PM UTC

Edward Ahn

@edwardahn9

19 Jan 2024

Hellllll yeahhhh. I’m one of the few people lucky enough to have worked on and with this prototype and I guarantee you you’ll like the product.

217

Edward Ahn · Jun 8, 2025 · 4:03 PM UTC

Edward Ahn

@edwardahn9

8 Jun 2025

and did you know gaussian splatting engines are y-down and right handed? afaik this isn’t documented anywhere and just assumed which has caused me so much headache bonus: pytorch3d/opengl is y-up, right handed but opengl’s camera typically faces -z

Tim Sweeney

@TimSweeneyEpic

5 Jun 2025

Indeed Unreal Engine is moving to Left-Up-Forward coordinates everywhere, starting with UEFN, and coming to UE5-6 in an incrementally-adoptable way through UI settings and C++ helper functions/macros to ease the transition. This will align Unreal with Y-Up, right handed standards of USD and glTF. Why? Because future 3d tools and ecosystems will be increasingly interoperable and standards-based. There are a lot of missing standards we’ll need to propose, and Team Unreal will be far more successful proposing new things if we adopt and add to existing standards and conventions. The USD-glTF-Maya-Houdini quadrant is the center of mass for complex code-art-pipeline tooling that is highly sensitive to coordinates. (Flipping coordinates when exporting from AutoCAD or Blender is easy enough; changing a movie vfx pipeline is not). Coordinates based on project settings sound like a have-it-your-way compromise but are a combinatorial mess when projects are a mix of code modules and content packages from many independent authors. The best time to make this change would have been 1995, but I believe the second best time is now with the launch of Scene Graph in UEFN.

676

Edward Ahn · Jun 11, 2025 · 9:51 PM UTC

Edward Ahn

@edwardahn9

11 Jun 2025

Replying to @benz145

From what I can tell from the research: 1. The camera images are encoded using a neural network into latent codes (list of numbers) 2. The codes are sent across the internet to the client 3. The client uses another neural network to decode the latent codes to produce textures

791

Edward Ahn · Jul 29, 2024 · 5:57 PM UTC

Edward Ahn

@edwardahn9

29 Jul 2024

The apple demo (reaching out to touch an apple my caller was holding) is magical. Really demonstrates how quick you forget you’re looking at a screen and not a real person.

1,016

Edward Ahn · Mar 21, 2025 · 3:58 PM UTC

Edward Ahn

@edwardahn9

21 Mar 2025

idea: in XR you always want a high fps to avoid nausea, etc. why not make it dynamic, where you render your environment (ex. skybox) at 120 fps, then render other assets at lower fps, like 50? it saves compute. from personal experience this seems to work. anyone else try it?

704

Edward Ahn · May 20, 2025 · 7:41 PM UTC

Edward Ahn

@edwardahn9

20 May 2025

This is a subtle but huge difference. Most people, including me, are too self-conscious to wear birdbath optics in public. ie. mass market adoption won't happen with birdbaths

David Heaney

@Heaney555

20 May 2025

Aura looks cool, but I already see the misleading headlines calling it a competitor to Meta and Apple's coming AR glasses. Xreal devices use birdbath optics. They're designed to resemble sunglasses from the front, but sit much further out. They're not glasses in the same sense.

1,106

Edward Ahn · Jun 11, 2024 · 3:33 PM UTC

Edward Ahn

@edwardahn9

11 Jun 2024

Can you do this for Gaussian Splatting tho

Andrej Karpathy

@karpathy

9 Jun 2024

📽️ New 4 hour (lol) video lecture on YouTube: "Let’s reproduce GPT-2 (124M)" piped.video/l8pRSuU81PU The video ended up so long because it is... comprehensive: we start with empty file and end up with a GPT-2 (124M) model: - first we build the GPT-2 network - then we optimize it to train very fast - then we set up the training run optimization and hyperparameters by referencing GPT-2 and GPT-3 papers - then we bring up model evaluation, and - then cross our fingers and go to sleep. In the morning we look through the results and enjoy amusing model generations. Our "overnight" run even gets very close to the GPT-3 (124M) model. This video builds on the Zero To Hero series and at times references previous videos. You could also see this video as building my nanoGPT repo, which by the end is about 90% similar. Github. The associated GitHub repo contains the full commit history so you can step through all of the code changes in the video, step by step. github.com/karpathy/build-na… Chapters. On a high level Section 1 is building up the network, a lot of this might be review. Section 2 is making the training fast. Section 3 is setting up the run. Section 4 is the results. In more detail: 00:00:00 intro: Let’s reproduce GPT-2 (124M) 00:03:39 exploring the GPT-2 (124M) OpenAI checkpoint 00:13:47 SECTION 1: implementing the GPT-2 nn.Module 00:28:08 loading the huggingface/GPT-2 parameters 00:31:00 implementing the forward pass to get logits 00:33:31 sampling init, prefix tokens, tokenization 00:37:02 sampling loop 00:41:47 sample, auto-detect the device 00:45:50 let’s train: data batches (B,T) → logits (B,T,C) 00:52:53 cross entropy loss 00:56:42 optimization loop: overfit a single batch 01:02:00 data loader lite 01:06:14 parameter sharing wte and lm_head 01:13:47 model initialization: std 0.02, residual init 01:22:18 SECTION 2: Let’s make it fast. GPUs, mixed precision, 1000ms 01:28:14 Tensor Cores, timing the code, TF32 precision, 333ms 01:39:38 float16, gradient scalers, bfloat16, 300ms 01:48:15 torch.compile, Python overhead, kernel fusion, 130ms 02:00:18 flash attention, 96ms 02:06:54 nice/ugly numbers. vocab size 50257 → 50304, 93ms 02:14:55 SECTION 3: hyperpamaters, AdamW, gradient clipping 02:21:06 learning rate scheduler: warmup + cosine decay 02:26:21 batch size schedule, weight decay, FusedAdamW, 90ms 02:34:09 gradient accumulation 02:46:52 distributed data parallel (DDP) 03:10:21 datasets used in GPT-2, GPT-3, FineWeb (EDU) 03:23:10 validation data split, validation loss, sampling revive 03:28:23 evaluation: HellaSwag, starting the run 03:43:05 SECTION 4: results in the morning! GPT-2, GPT-3 repro 03:56:21 shoutout to llm.c, equivalent but faster code in raw C/CUDA 03:59:39 summary, phew, build-nanogpt github repo

458

Edward Ahn · Jan 20, 2025 · 1:21 AM UTC

Edward Ahn

@edwardahn9

20 Jan 2025

tbh i don’t need LLMs to solve phd level problems. as a coder i want them to be up to date with the latest API changes. if i was a manager id want it to solve people problems. im not out here solving competition math problems at work but openai progress is cool nonetheless!

203

Edward Ahn · Jun 13, 2025 · 3:58 PM UTC

Edward Ahn

@edwardahn9

13 Jun 2025

If you're wondering how apple persona / meta avatars work — This encoder-decoder approach is dominant b/c it's not feasible to transmit high-res stereoscopic video streams at 90 fps, minimum for XR comfort fyi Quest 3 doesn't have avatars b/c it doesn't have face tracking cameras (ie. cameras facing downwards on the user's face). It's a hardware issue, not a software issue It's reasonable to predict that Quest 4 will have face tracking cameras to compete with Apple. That said, for an XR glasses form factor, I don't see how in the immediate future there will be 3D calling (or a need to have 3D calling) because of this limitation. 3D calling will be a headset-exclusive perk

Edward Ahn

@edwardahn9

11 Jun 2025

Replying to @benz145

571

Edward Ahn · Dec 21, 2024 · 1:04 AM UTC

Edward Ahn

@edwardahn9

21 Dec 2024

As a consumer/programmer just tell me if o3 is better than Claude Sonnet, don't really care about other metrics tbh

244

Edward Ahn · May 29, 2025 · 7:01 PM UTC

Edward Ahn

@edwardahn9

29 May 2025

Replying to @ReedSealFoss

it’s 3d, so there’s depth. it’s real-time, generating accurate facial expressions. combined is an overwhelming sense of presence you don’t need to go see people, physical distance isn’t a barrier anymore you and a friend just put on a headset, and you’re together as if IRL

1,351

Edward Ahn · Feb 19, 2024 · 4:07 PM UTC

Edward Ahn

@edwardahn9

19 Feb 2024

"NeRFs are just small neural networks that are tens of megabytes large. 3DGS, with its millions of Gaussians, occupies almost a gigabyte." However, "3DGS renders new viewpoints fast – more than 90 fps for some complex scenes," a whole order of magnitude faster than NeRFs. 4/5

1,975

Edward Ahn · Jan 7, 2025 · 3:10 AM UTC

Edward Ahn

@edwardahn9

7 Jan 2025

In 2024, I realized that when I'm low-energy and tired, the key was not to rest by doing nothing, but to actually spend more energy (assuming enough sleep) by: - swimming - biking/hiking/walking - unrelated side projects Wasn't intuitive to spend more energy to get more.

218

Edward Ahn · Apr 17, 2025 · 5:51 PM UTC

Edward Ahn

@edwardahn9

17 Apr 2025

Amazing demo! Wish my $4k AVP could do this. The missing piece: 3D personas for co-presence. Otherwise, using these devices is such a solitary experience, esp if you're on a headset (less so for glasses). Once the hardware is launched, I'll make sure co-presence is there :)

Bilawal Sidhu

@bilawalsidhu

17 Apr 2025

The future of AI + XR starts here. Glasses that can see and think alongside you. Here’s what Google’s been cooking — its first on‑camera demo, fresh off the TED stage.

881

Edward Ahn · Jun 6, 2024 · 3:51 PM UTC

Edward Ahn

@edwardahn9

6 Jun 2024

9 game-changing tips for multi camera calibration Do you work in computer vision/graphics? Turns out, camera calibration isn't as simple as running OpenCV's sample code. Don't be like me and save yourself several hours by following these (undocumented) tips. Especially #8…

442

Edward Ahn · Sep 6, 2024 · 3:07 PM UTC

Edward Ahn

@edwardahn9

6 Sep 2024

Maturing is realizing that maybe I'll switch to using Cursor over Vim... :'( Yes I can do all of the stuff in Cursor in Vim, but I just don't have time to get those plugins working

268

Edward Ahn · Jul 3, 2024 · 3:26 PM UTC

Edward Ahn

@edwardahn9

3 Jul 2024

Blake Scholl on building supersonic jets: "You just list all the reasons that this could not work and we systematically structured our development approach such that we were progressively, demonstrably reducing risk on every single one of those." Striving to do this daily 🫡

347

Edward Ahn · Feb 19, 2024 · 4:07 PM UTC

Edward Ahn

@edwardahn9

19 Feb 2024

"A NeRF generates an image of a new viewpoint by outputting a color for each pixel. It uses ML to learn what colors to output." "3DGS generates an image of a new viewpoint by drawing overlapping colored/transparent ‘splats’. 3DGS learns what to splat via ML." 3/5

1,816

Edward Ahn · Jul 29, 2024 · 5:57 PM UTC

Edward Ahn

@edwardahn9

29 Jul 2024

Obviously the environment is very controlled. The booth has built-in, clear, ambient lighting, and the table between me and the screen forces the spacing to be constrained. This makes sense; these displays don’t work as well in too close or too far distances.

1,146

Edward Ahn · Jan 7, 2025 · 7:16 PM UTC

Edward Ahn

@edwardahn9

7 Jan 2025

Inference costs going down so fast. When I noticed that Project Digits is the size of a Mac Mini...🤯 only a matter of time when we all have these to run local models. Which means as a founder I'll assume inference costs will trend towards 0 and focus on other problems!

unusual_whales

@unusual_whales

7 Jan 2025

BREAKING: Nvidia, $NVDA, announces Project Digits personal computer at $3000, that is approximately 1,000 times more powerful than the average laptop. The device is powered by an Nvidia GB10 Grace Blackwell Superchip, which houses separate, linked components on a single chip to reduce the time it takes to move data between them. The superchip features an Nvidia Blackwell graphics card and an Nvidia Grace processor, packaged with 128 gigabytes of memory and 4 terabytes of SSD storage.

251

Edward Ahn · Jun 20, 2025 · 3:44 PM UTC

Edward Ahn

@edwardahn9

20 Jun 2025

As someone who spends most time just SSHing into a remote Linux machine to do ML I always recommended getting the MacBook Air... But man it really sucks to do any sort of visionOS development on it. The simulator is much too slow

344

Edward Ahn · Mar 15, 2023 · 9:58 PM UTC

Edward Ahn

@edwardahn9

15 Mar 2023

A few weeks ago, I integrated ChatGPT with my design tool! Using natural language to automate tedious design workflows is promising; only possible by designing the entire app to be AI and LLM-native. More results coming soon! #ar #vr #webxr #ai #chatgpt #gpt4

267

Edward Ahn · May 30, 2025 · 5:12 AM UTC

Edward Ahn

@edwardahn9

30 May 2025

Replying to @MarcoKelly_23

not yet it's currently running on webxr though so when it is public, it'll just be as simple as going to a website on the headset's browser

623

Edward Ahn · Aug 17, 2022 · 6:06 PM UTC

Edward Ahn

@edwardahn9

17 Aug 2022

A month ago, I became super interested in creating #VR demos with real-time information. Here's something quick I produced over a weekend, where the planes you're seeing are all of the Delta Airlines flights over North America at the time of recording.

Edward Ahn · Feb 3, 2025 · 4:45 PM UTC

Edward Ahn

@edwardahn9

3 Feb 2025

Getting Windows, OpenGL/OpenXR, CUDA and PyTorch to all work together properly for a real-time, distributed application has got to be my best technical accomplishment to date. I never expected this to be so hard. Let me know if you ever run into bugs, I think I can help out

246

Edward Ahn · Mar 13, 2025 · 12:00 AM UTC

Edward Ahn

@edwardahn9

13 Mar 2025

Seeing so many guides on "how to vibe code" is so weird and hypocritical Just ask Claude, try the code, and repeat -- this isn't that hard, hence it's called vibe coding

273

Edward Ahn · May 30, 2025 · 5:13 AM UTC

Edward Ahn

@edwardahn9

30 May 2025

Replying to @chrisgrayson

very fair - but from a tech standpoint the torso, arms, etc. imo are easy additions but no one's got the head working well enough yet so we have to solve that first

512

Edward Ahn · Sep 9, 2024 · 5:39 PM UTC

Edward Ahn

@edwardahn9

9 Sep 2024

As an SF native I love seeing Apple do their events in the city. The city's so beautiful and honestly pretty underrated. I love it here 🫶

199

Edward Ahn · Jun 19, 2024 · 4:39 PM UTC

Edward Ahn

@edwardahn9

19 Jun 2024

Since I bought this 3rd party strap, I’ve been using my Vision Pro 5+ hours at a time for remote work. It converts the AVP to an open-faced design that doesn't uncomfortably press on my cheeks and lets all of the weight rest on my forehead. Highly recommend it! Link below.

311

Edward Ahn · Jul 29, 2024 · 5:57 PM UTC

Edward Ahn

@edwardahn9

29 Jul 2024

I wish the speakers were better; they were definitely the cheapest component out of the whole system. Having great surround sound could bring telepresence further, but obviously I’m asking for too much.

925

Edward Ahn · Aug 25, 2022 · 10:19 PM UTC

Edward Ahn

@edwardahn9

25 Aug 2022

Slowly getting better at prompt engineering! Crazy to think that I was able to do this on my phone in bed this morning #midjourney #ai #ml

Edward Ahn · Feb 24, 2025 · 5:25 PM UTC

Edward Ahn

@edwardahn9

24 Feb 2025

My annual "please vote for your favorite SF Mission burrito" post! whohasthebestmissionburrito.… (I definitely think the last few are better than the top few, but the votes don't reflect that.. 📷)

261

Edward Ahn · Oct 2, 2024 · 4:27 PM UTC

Edward Ahn

@edwardahn9

2 Oct 2024

Wow the @getviture Pro XR glasses are amazing. In use cases like flying, the Vitures beat the Vision Pro (and any AR/VR headset I've tried) by a mile. Highly recommend! Let me know if you want a referral link ($50 off). Below are reasons why I like them: (not sponsored) 1/

1,493

Edward Ahn · Mar 20, 2025 · 3:36 PM UTC

Edward Ahn

@edwardahn9

20 Mar 2025

So easy to take modern laptop power for granted Tried running a bg removal alg that runs 140 fps on macbooks on the browser but running it on the Quest 3 makes the whole OS stutter at 5 fps. The lesson is if you're processing Quest passthrough, you'll have to do it server-side

508

Edward Ahn · Jun 23, 2024 · 1:05 AM UTC

Edward Ahn

@edwardahn9

23 Jun 2024

Replying to @TrueGameData @Rainmaker1973

It's not except for the fact that it's called a "hologram" The term hologram is so over-used and could mean anything now smh

3,018

Edward Ahn · Jul 20, 2025 · 6:32 AM UTC

Edward Ahn

@edwardahn9

20 Jul 2025

nothing quite like a quiet saturday night with good music and coding

307

Edward Ahn · May 29, 2025 · 5:13 PM UTC

Edward Ahn

@edwardahn9

29 May 2025

Replying to @bnj

iykyk

1,926

Edward Ahn · Oct 31, 2024 · 4:32 AM UTC

Edward Ahn

@edwardahn9

31 Oct 2024

Unintended side effect of Apple's Continuity Camera (using your iPhone as a mac webcam) is that I can access it via OpenCV, which makes it really easy to prototype vision applications via Python on a mobile camera source. No iOS dev necessary and not compute-constrained. Amazing

235

Edward Ahn · Jan 19, 2024 · 12:54 PM UTC

Edward Ahn

@edwardahn9

19 Jan 2024

Anyone else up at this ungodly hour to get a vision pro..

155

Edward Ahn · Dec 3, 2024 · 7:08 PM UTC

Edward Ahn

@edwardahn9

3 Dec 2024

Martial law in Korea’s frightening I’m so thankful that living in the US as a founder unblocks me from doing anything. The only blockers are sickness, family/friends in need of help, etc Can’t imagine having an unstable gov hanging over my head So grateful and prayers to Korea

244

Edward Ahn · Dec 16, 2023 · 1:50 AM UTC

Edward Ahn

@edwardahn9

16 Dec 2023

Replying to @benz145

richie’s plank experience

333

Edward Ahn · Nov 10, 2022 · 7:26 PM UTC

Edward Ahn

@edwardahn9

10 Nov 2022

Despite the hate the Bay Area gets, it's truly humbling driving down 101 and seeing the legacy that so many people built in the past few decades in tech. Almost nowhere in the world do you have such a high concentration of high-achieving, risk-taking people. Extremely motivating!

Edward Ahn · Jul 31, 2024 · 9:24 PM UTC

Edward Ahn

@edwardahn9

31 Jul 2024

Is it even a conference without robots? #SIGGRAPH2024

281

Edward Ahn · Mar 19, 2024 · 4:18 PM UTC

Edward Ahn

@edwardahn9

19 Mar 2024

At #GTC24 ! DM me if you want to meet up - would love to talk anything 3D

266

Edward Ahn · Jan 29, 2025 · 7:46 PM UTC

Edward Ahn

@edwardahn9

29 Jan 2025

Replying to @UploadVR

Planning? Should be done already

273

Edward Ahn · Feb 14, 2025 · 11:48 PM UTC

Edward Ahn

@edwardahn9

14 Feb 2025

Replying to @nakul

@ashwinl ! He has a wealth of experience in computer vision, AI, robotics.

772

Edward Ahn · Apr 10, 2025 · 3:25 AM UTC

Edward Ahn

@edwardahn9

10 Apr 2025

When I do webdev it's often just to showcase my work, so I'm always years behind with frameworks. In college I used vanilla HTML/CSS/JS. 5 years ago I was astounded by how nice React/Tailwind was. Now I'm using Next.js and man, it's so easy to deploy. What else am I missing?

450

Edward Ahn · May 21, 2025 · 6:10 PM UTC

Edward Ahn

@edwardahn9

21 May 2025

Google Beam fosters authentic human connection via lifelike 3D presence while Veo 3 creates AI videos indistinguishable from reality. The cognitive dissonance is striking—one preserves our humanity, the other blurs it. We're building tools that both connect and deceive us.. 🤔

839

Edward Ahn · Aug 16, 2025 · 12:15 AM UTC

Edward Ahn

@edwardahn9

16 Aug 2025

Replying to @fewerwrong

i can't stop laughing and i'm not even fluent

1,043

Edward Ahn · Aug 29, 2022 · 6:51 AM UTC

Edward Ahn

@edwardahn9

29 Aug 2022

I spent basically the whole weekend debugging authentication middleware for a webapp I'm building. Shouldn't this be easy to do by now? The bugs I was running into were non-trivial, especially the ones related to preventing CSRF attacks :/

Edward Ahn · Oct 25, 2024 · 11:53 PM UTC

Edward Ahn

@edwardahn9

25 Oct 2024

ah, i've forgotten you my friend tfw you start sweating because an old program starts crashing because of a dependency install in the recent versions but that dopamine hit when you figure it out *chef's kiss*

172

Edward Ahn · Mar 30, 2024 · 9:08 PM UTC

Edward Ahn

@edwardahn9

30 Mar 2024

blew a fuse in my house because of my dual-GPU training while the heater was on. guess modern houses weren't built with deep learning in mind??? smh

387

Edward Ahn · Apr 2, 2025 · 8:36 PM UTC

Edward Ahn

@edwardahn9

2 Apr 2025

Replying to @Tropos_AR

That's the plan! I'm extremely bullish on 3D calling as a means to connect with loved ones (some of mine are overseas), and 2D to me doesn't cut it. Honestly I think the Apple Personas aren't bad, but the problem is more that not everyone wants to buy a $3.5k headset

222