working on something new • @carnegiemellon alum

San Francisco, CA
If you like 3D graphics, I wrote a high-level post on the difference between Neural Radiance Fields (NeRFs) and 3D Gaussian Splatting (3DGS). It's a ten min read, and aims to educate those not as familiar with the field of neural rendering. Link and some snippets below. 1/5
9
83
532
74,466
This is live. Real-time. On a Quest 3. Feels like you're actually talking to a real person. Because you are. Couldn't wait around while Meta burns another $60B on this. 🔥
58
41
391
64,892
Spot the difference? Left: our avatar system. Right: Apple Persona. Both captured on VR headsets, both 3D. Been quietly building something that actually looks like you, not your weird digital cousin 😏 Follow me for updates as we bring truly realistic 3D avatars to life ✨
33
12
196
17,213
again so bullish with this approach - feedforward gaussian splats -> very fast reconstruction of the environment, no lack of memory issue - large scale training -> only one camera needed so much hype around the recent 4dv approach (see link in comments) but i'm way more excited about this
4DGT: Learning a 4D Gaussian Transformer Using Real-World Monocular Videos Abstract: We propose 4DGT, a 4D Gaussian-based Transformer model for dynamic scene reconstruction, trained entirely on real-world monocular posed videos. Using 4D Gaussian as an inductive bias, 4DGT unifies static and dynamic components, enabling the modeling of complex, time-varying environments with varying object lifespans. We introduced a novel density control strategy in training, which allows our 4DGT to handle longer space-time input while maintaining efficient rendering at runtime. Our model processes 64 consecutive posed frames in a rolling-window fashion, predicting consistent 4D Gaussians in the scene. Unlike optimization-based methods, 4DGT performs purely feed-forward inference, reducing reconstruction time from hours to seconds and scaling effectively to long video sequences. Trained only on large-scale monocular posed video datasets, 4DGT can significantly outperform prior Gaussian-based networks in real-world videos and achieve on-par accuracy with optimization-based methods on cross-domain videos.
1
7
101
5,971
Wow. Just tried the Google Project Starline demo, amazing work by the team! It’s hard to describe how cool and futuristic the experience felt, and I can’t wait for the public to try it out. My impressions as someone working in 3D and ML:
6
7
92
21,143
Apple could really run away with the growing XR market if they produce a lighter headset (doesn't even have to be that much cheaper imo, "free is not cheap is enough" a la palmer luckey) visionOS just works and Apple actually shipped
Meta shipped the Quest v77 update knowing it had bugs, the company's CTO has acknowledged, despite his pledge to improve quality control. Details here: uploadvr.com/meta-shipped-qu…
12
2
79
11,271
Replying to @andrewpprice
I do a lot of work in this area. To start off, both methods are used for 3d reconstruction, or the ability to fully reconstruct a scene from any perspective given a few images of the scene. For people with no graphics background: nerfs predict the color at each individual pixel (point on the screen), whereas splats “splat” a bunch of colored blobs together until the picture is made. For people with some tech/gaming background: nerfs use ray tracing to create a scene, whereas gaussian splats use rasterization create a scene. they both use machine learning to “remember” the colors in the scene to rasterize or ray trace. For people with a tech background: nerfs are neural networks that predict each pixel’s color given a ray direction while splats create and modify millions of colored/transparent blobs (3d gaussians) until they form a scene using gradient-based optimization. both are trained from a limited number of viewpoints of a scene (the training data), and allows “novel view synthesis” (the ability to view the scene from new viewpoints not included in the training data). Still oversimplifying but if there’s interest I can expand on this in a blog post.
4
6
60
6,928
Reminder that headset weights are not trending down (yet) They will, but most improvement has been in display resolution the past decade ie. brace yourselves fellow XR folk, more winters to come. strengthen those neck muscles! p.s. AVP is actually weighs average
8
7
59
5,285
Can't wait to launch something much better soon... just a couple more bugs to fix :)
Quest's Horizon OS v76 PTC lets you use your Meta Avatar as a virtual webcam in video calling apps. Details here: uploadvr.com/quest-v76-ptc-m…
4
47
4,542
Replying to @AlanMCole
Hard disagree. Once you learn some advanced algorithms, you’re able to solve the cube intuitively, at which point each solve is a new puzzle. It never gets old! I still solve it a decade later. For beginners with no intuition though, I can see why it’s gets boring.
5
39
3,552
Continually surprised that there's not much coverage on the new Personas for the Vision Pro... it's SO GOOD, one of the best updates of WWDC imo The only problem is I have no one to call.. Anyone want to get on a call with me?
8
34
3,913
Replying to @dankuntz
this is very very cool is the code online or app released? would love to have it on my phone..
1
34
3,798
yea that alone turned me off. i want to own my devices.
29
2,531
Pretty crazy what you can do when you build an AI-native 3D design tool with #ChatGPT. It understood what a Rubik's Cube was and knew to only rotate the 9 cubes on the right (wrt the global x axis) by itself. Also, all on my VR headset in #WebXR 😎
2
4
26
13,121
Link: edwardahn.me/writing/NeRFvs3… I'll be writing these types of posts every couple of weeks, so if you're interested please give me a follow!
4
2
28
1,877
Replying to @leonsilicon
sad to not see vim or dare i say even emacs
2
22
12,946
Lots of people asking how it’s done or asking to try it out! I’m a bit in a time crunch right now, so I’ll address those in follow-up posts in a couple weeks. Follow me if you’re interested ❤️
This is live. Real-time. On a Quest 3. Feels like you're actually talking to a real person. Because you are. Couldn't wait around while Meta burns another $60B on this. 🔥
2
26
2,349
Really enjoyed attending and learned a ton of great insights from the @fdotinc panel featuring @Azadux @jmdagdelen @cixliv @JackSouthardVR. highly recommend @SVVRLIVE for anyone looking to do anything in XR and I’ll definitely be back!!
Great night at @SVVRLIVE tonight!
2
1
23
1,801
Like I keep saying feed-forward gaussian splatting is the future. Real-time reconstruction + real-time rendering from 3DGS means we can finally have real-time 4D content. The very content we complain about lacking in ar/vr!
GUAVA: Generalizable Upper Body 3D Gaussian Avatar Contributions: • We propose GUAVA, the first framework for generalizable upper-body 3D Gaussian avatar reconstruction from a single image. Using projection sampling and inverse texture mapping, GUAVA enables fast feed-forward inference to reconstruct Ubody Gaussians from the image. • We introduce an expressive human template model with a corresponding upper-body tracking framework, providing an accurate prior for reconstruction. • Extensive experiments show that GUAVA outperforms existing methods in rendering quality and significantly outperforms 2D diffusion-based methods in speed, offering fast reconstruction and real-time animation.
3
21
1,616
incredible and much needed!! for those that don’t know basically all performant splat renderers for the web/javascript were proprietary, meaning it’s surprisingly difficult to render splats on a production level this unlocks so much for the community
At @theworldlabs, we built a new Gaussian splatting web renderer with all the bells and whistles we needed to make splats a first-class citizen of the incredible @threejs ecosystem. Today, we're open sourcing Forge under the MIT license.
20
936
Replying to @jasonyuan
so cool to see tech helping human problems, problems that aren’t really quantifiable and don’t have direct, one-step solutions. as a founder, also cool to see that dot got you to open up more about founder life, so much so that you’re now posting publicly about it!
1
663
There's something beautiful about working so hard on something and finally seeing it come to fruition. Really excited to show everyone what I'm working on - cross-platform 3D avatars that enhances the feeling of presence. Avatars way better than Apple's Personas. Stay tuned!
3
19
532
Playing around with real-time face tracking and fidelity is much better than I expected! Thinking of putting this in VR for fun.. wonder how hard it'd be to show up with just this in VRChat
2
1
17
1,955
Replying to @andrewpprice
These are really good questions. Let me try my best... 1. The process is different. The only similarity is that for both, you feed it a few input images with different viewpoints of the scene/object you want to reconstruct in 3D. - For NeRFs, you train a neural network to output a color for a given ray (ie. a 3D position and viewing direction). Put differently, for each input image, you use a neural network to predict the color of each pixel (which is a ray if you think about it) in that image. You then compare this color to the actual color of the pixel in the input image. You continue training over all input images until the network can predict the same colors. The difference between this and normal ray tracing is that in ray tracing, you actually simulate how light bounces for each ray to get a color. Here, you have the neural network do that for you. - For 3DGS, for every step in training, you're either spawning a new 3D gaussian (a 'blob' or a 'splat') or modifying existing 3D gaussians (such as making it longer or changing the color/transparency). Then, for a given input image, you render an image via rasterization from the same viewpoint as your input image. You compare this generated image with your input image. You keep spawning/modifying gaussians until your generated images look the same as your input images. Trained 3DGS scenes can have millions of gaussians. 2. NeRFs can't directly be converted to 3DGS or vice versa. The NeRF is essentially a neural network (a file with a list of weights), and 3DGS is essentially a file with a list of gaussian parameters (ex. color of the gaussian, size of the gaussian, position of the gaussian). That said, there is work that can convert NeRFs to meshes, meshes to NeRFs, 3DGS to meshes, meshes to 3DGS... so I'm sure you could if you wanted to. 3. No, the comparison with ray tracing doesn't imply that NeRFs are more accurate than 3DGS. To explain a bit better, NeRFs and 3DGS use different rendering techniques to generate images. NeRFs use ray tracing, but instead of doing fancy math to find out which color the ray should return, you're getting the neural network to guess the color. 3DGS has millions of gaussians, and given these gaussians it uses rasterization to render images. In other words, the use of techniques doesn't imply which one is more accurate. For example, if you have enough gaussians in your scene, your scene could be more accurate than NeRFs. This is slightly different from traditional ray tracing. In NeRFs, we're just talking about how to render an image by predicting each ray to match our input images. In traditional ray tracing, you don't have ground truth input images. You just have a scene with meshes and you need to know how to color it properly to make your image look realistic. Simulating the way light moves does this pretty well. An eli5 explanation of this would be that we have different goals. In 3D reconstruction, we have examples of what the scene look like (our input images), so we just have to copy it, and both 3DGS and NeRFs are good at copying despite using different rendering techniques. In traditional rendering, you don't have examples to copy, and so without examples you do want to simulate light as much as possible to get something realistic. Hence, in traditional rendering, ray tracing is more 'accurate' than rasterization. Hope that makes sense, but having trouble explaining the 3rd question well. Obligatory disclaimer that for all of these answers, there's still some simplification, but the general gist is there.
2
17
1,295
Replying to @jmdagdelen
That is pretty annoying, as i’d like to have the ability to show people demos and use the headset without having to show them my personal stuff. I hope that changes.
3
16
1,750
Crazy cool!!! Note all of these scenes had ~20 different cameras filming at once, all time synced We need to have some large scale training so that anyone can create these scenes with just 1 camera Exciting times we live in, can’t wait for true memory preservation
Oh wow oh wow oh wow! 4D Gaussian Splatting built on top of the @PlayCanvas Engine by Chinese company 4DV. 😍
3
18
1,468
The quality bar at Apple was making calls good enough so you could deliver unfortunate news to someone. Imagine your boss firing you with a cartoon avatar. How does Meta want AR/VR to take off and then ship these avatars?? It isn't that hard:
Zoom is now freely available on Quest headsets, through an official 2D Android app on the Meta Horizon Store: uploadvr.com/zoom-now-availa…
2
1
16
1,461
There’s something quite magical about using 3D to create a better sense of presence. Here’s an old demo of a 3D avatar system we built that we haven’t rendered on a headset-less 3D display — until now. It’s wayyy cooler in-person when you can see the 3D. Details below: 1/6
2
6
15
1,047
Surprisingly, the most exciting thing I saw at GTC today wasn’t the robots (maybe because I used to work on robots) or the AR/VR headsets, but the stunning @LeiaInc displays. Even as someone who’s worked on the Vision Pro.. it’s exciting to see a headset-less future
3
2
15
1,232
Replying to @ScienceArt
it’s 3d calling on an xr headset that’s good enough to make you feel like the person you’re calling is actually there!
2
12
1,471
I gathered data on VR headsets released over the past decade and made some interesting discoveries! 🧵 1. Pixel density is increasing dramatically for both consumer and enterprise headsets. Higher PPD means higher resolution. We can't distinguish real vs. virtual from 60 PPD.
2
1
12
738
It is weirdly validating when I'm desperately trying to debug this problem by scouring the internet, asking Claude, etc... and then DeepSeek R1 says "Okay, this problem seems pretty complex, involving CUDA, OpenGL, OpenXR, and multithreading." Validation from AI is nice
3
13
358
No way in hell. The EyeSight display imo is one of the best UX choices Apple made. It makes it such that users can seamlessly interact with people in the real world without having to take off the headset. Otherwise you’re stuck taking the headset on and off like any other headset to have a genuine conversation. Or people avoid talking to you even though you see them via passthrough. Obviously EyeSight isn’t good enough yet but it will be, like any other technology. This is a bad take.
The most surprising takeaway from all the Vision Pro reviews/videos is how universally awful the EyeSight display is. Until today, I expected it to be super important to the “I’m still in the real world” experience. Now, I’m 95% sure it’ll be canned by the 2nd gen.
3
9
5,917
Founders overestimate their health. Like other founders, I worked a ton of hours, juggled too much, and neglected my health. Then a health scare hit. (all is well!) It's not worth sleeping little for that extra hour to push that commit. Two steps forward is nothing if health sets you three days back. I've gotten much less sick (and way more productive) by eating well, sleeping well, and exercising regularly. Go for that walk. Get that doctor check-up. Make a salad. Visit your dentist. Integrate health into your daily process and don't make it a "chore". Even a simple cold can set you back a week.
6
11
751
Hell yeah, so bullish on feed-forward 3DGS
F3D-Gaus: Feed-forward 3D-aware Generation on ImageNet with Cycle-Consistent Gaussian Splatting Contributions: • We pioneer 3D-aware generation using generalizable feed-forward Gaussian Splatting representation, achieving significant efficiency and favorable rendering quality on monocular datasets. • We significantly advance the capability of pixel-aligned Gaussian Splatting representations by designing a self-supervised cycle training strategy specifically tailored for monocular datasets. • We further mitigate the artifacts of 3D-aware representations caused by large viewpoint shifts by introducing geometry-aware video priors.
1
11
498
Did I say it just works? This is game-changing technology for tech-illiterate people, which is most people in the world. I don’t trust my grandparents to use a headset. I’m a big believer that there’s room for both this and AR/VR calling (and more) to solve telepresence.
1
11
851
The 3D effect is stunning. The massive 65-inch display allows parallax to be shown in a wide FOV, and because it’s autostereoscopic you just sit down and it works. No glasses necessary. Also, no discernible issues with frame rate and latency.
1
11
1,350
Coding shaders so much more unintuitive than I thought. I'm having to wrap my head around coding in an entirely new paradigm when I thought it'd just be learning how to write in a new language. I guess I'm surprised there isn't a higher-order language that takes care of this.
3
10
358
This is the kind of cool side project I miss doing 2025 will be the year of more side projects
A 17-year-old jailbroke his smart glasses to automatically show the best moves during his chess games
2
10
296
It does seem to be slightly jittery on the edges, and has subtle specularity issues that only a trained eye could see. But the feeling of presence removes those issues entirely. ie. I didnt see any technical issues that would cause people to not use it.
1
10
1,229
Used to think as an engineer that tools like this were dumb. But making the implicit details explicit using frameworks like this has saved me so much time as a founder. Also @HBO can you make another season of Silicon Valley starring ChatGPT pls
3
9
399
Replying to @cixliv @rek
the kind of thing where as a past robotics researcher i'd be explaining why you don't do a, b, c, d, etc. and be a debbie downer but the kind of thing where as a startup founder i say hellll yeah keep going 😎
1
9
2,731
Has everyone seen this before except me? Panasonic's "Silky Fine Mist" demo from last year, showing light projection on top of its mist. Super cool, and seems to have that 3D "holographic" effect. Use case is less clear, but damn I'd love to see this in-person.
3
9
469
Oh and here's a real photo of me if you don't know what I look like..
9
998
Replying to @hrafntho
Agreed, can’t find myself concentrating on work with these avatars. Luckily, this is what I’m working on - photorealistic 3D avatars! Public launch coming soon next month.
3
8
536
Thoughts on Google I/O's XR-related updates: - No updates on Project Moohan, but it's coming out this year? Surprised at how little details there are - Glasses were super impressive -- it's exactly what I want, where AI feels frictionless and a high throughput highway between my mind and the computer is established. But when will it be out? They said dev access will be available later this year, but this almost feels like Meta's Orion; nowhere near shipping. - No 3D calling. It seems only Apple has this 3D avatar tech. This could be because of a lot of reasons -- not enough users to warrant developing it, performance concerns for 3D calling, avatars still bordering research territory, or even something as simple as Project Moohan not having face tracking cameras. We'll see. - Is utility enough to spark mass adoption of XR? All of the demos show devices being useful to the user.. but how many times do I really need to translate something, use directions, or remember where my keys are? I think we need some sort of new social experience, as throughout history every new tech seems to have been adopted by the public via some sort of social network effect. Gemini might not be the killer XR app that Google thinks it is (I think telepresence is but I may be biased..) - Lastly, beating the dead horse but... I can't help but worry about privacy. I feel the most comfortable with Apple having constant access to my environment given their different business model, but obviously Apple's dropping the ball here (for now at least). They didn't talk much (if at all) about privacy measures The general feeling is this feels like Orion where it's more marketing for the tech. If this is true, it sucks for the industry because it continues this sentiment of "XR is here" when it's still years and years away from being consumer-grade. The hype cycle really hurts startups. That said, if Android XR is really released this year, I'm (cautiously) excited! Fingers crossed, I'd love to hack on it
1
8
608
Both NeRFs and 3DGS solve novel view synthesis: "given a few images of some 3D scene taken from different camera viewpoints, can we generate an image of this scene from any new camera viewpoint?" This has wide-ranging applications in AR/VR and 3D applications. 2/5
1
9
1,939
Hellllll yeahhhh. I’m one of the few people lucky enough to have worked on and with this prototype and I guarantee you you’ll like the product.
1
7
217
and did you know gaussian splatting engines are y-down and right handed? afaik this isn’t documented anywhere and just assumed which has caused me so much headache bonus: pytorch3d/opengl is y-up, right handed but opengl’s camera typically faces -z
Indeed Unreal Engine is moving to Left-Up-Forward coordinates everywhere, starting with UEFN, and coming to UE5-6 in an incrementally-adoptable way through UI settings and C++ helper functions/macros to ease the transition. This will align Unreal with Y-Up, right handed standards of USD and glTF. Why? Because future 3d tools and ecosystems will be increasingly interoperable and standards-based. There are a lot of missing standards we’ll need to propose, and Team Unreal will be far more successful proposing new things if we adopt and add to existing standards and conventions. The USD-glTF-Maya-Houdini quadrant is the center of mass for complex code-art-pipeline tooling that is highly sensitive to coordinates. (Flipping coordinates when exporting from AutoCAD or Blender is easy enough; changing a movie vfx pipeline is not). Coordinates based on project settings sound like a have-it-your-way compromise but are a combinatorial mess when projects are a mix of code modules and content packages from many independent authors. The best time to make this change would have been 1995, but I believe the second best time is now with the launch of Scene Graph in UEFN.
1
8
676
Replying to @benz145
From what I can tell from the research: 1. The camera images are encoded using a neural network into latent codes (list of numbers) 2. The codes are sent across the internet to the client 3. The client uses another neural network to decode the latent codes to produce textures
2
9
791
The apple demo (reaching out to touch an apple my caller was holding) is magical. Really demonstrates how quick you forget you’re looking at a screen and not a real person.
2
9
1,016
idea: in XR you always want a high fps to avoid nausea, etc. why not make it dynamic, where you render your environment (ex. skybox) at 120 fps, then render other assets at lower fps, like 50? it saves compute. from personal experience this seems to work. anyone else try it?
3
9
704
This is a subtle but huge difference. Most people, including me, are too self-conscious to wear birdbath optics in public. ie. mass market adoption won't happen with birdbaths
Aura looks cool, but I already see the misleading headlines calling it a competitor to Meta and Apple's coming AR glasses. Xreal devices use birdbath optics. They're designed to resemble sunglasses from the front, but sit much further out. They're not glasses in the same sense.
3
9
1,106
Can you do this for Gaussian Splatting tho
📽️ New 4 hour (lol) video lecture on YouTube: "Let’s reproduce GPT-2 (124M)" piped.video/l8pRSuU81PU The video ended up so long because it is... comprehensive: we start with empty file and end up with a GPT-2 (124M) model: - first we build the GPT-2 network - then we optimize it to train very fast - then we set up the training run optimization and hyperparameters by referencing GPT-2 and GPT-3 papers - then we bring up model evaluation, and - then cross our fingers and go to sleep. In the morning we look through the results and enjoy amusing model generations. Our "overnight" run even gets very close to the GPT-3 (124M) model. This video builds on the Zero To Hero series and at times references previous videos. You could also see this video as building my nanoGPT repo, which by the end is about 90% similar. Github. The associated GitHub repo contains the full commit history so you can step through all of the code changes in the video, step by step. github.com/karpathy/build-na… Chapters. On a high level Section 1 is building up the network, a lot of this might be review. Section 2 is making the training fast. Section 3 is setting up the run. Section 4 is the results. In more detail: 00:00:00 intro: Let’s reproduce GPT-2 (124M) 00:03:39 exploring the GPT-2 (124M) OpenAI checkpoint 00:13:47 SECTION 1: implementing the GPT-2 nn.Module 00:28:08 loading the huggingface/GPT-2 parameters 00:31:00 implementing the forward pass to get logits 00:33:31 sampling init, prefix tokens, tokenization 00:37:02 sampling loop 00:41:47 sample, auto-detect the device 00:45:50 let’s train: data batches (B,T) → logits (B,T,C) 00:52:53 cross entropy loss 00:56:42 optimization loop: overfit a single batch 01:02:00 data loader lite 01:06:14 parameter sharing wte and lm_head 01:13:47 model initialization: std 0.02, residual init 01:22:18 SECTION 2: Let’s make it fast. GPUs, mixed precision, 1000ms 01:28:14 Tensor Cores, timing the code, TF32 precision, 333ms 01:39:38 float16, gradient scalers, bfloat16, 300ms 01:48:15 torch.compile, Python overhead, kernel fusion, 130ms 02:00:18 flash attention, 96ms 02:06:54 nice/ugly numbers. vocab size 50257 → 50304, 93ms 02:14:55 SECTION 3: hyperpamaters, AdamW, gradient clipping 02:21:06 learning rate scheduler: warmup + cosine decay 02:26:21 batch size schedule, weight decay, FusedAdamW, 90ms 02:34:09 gradient accumulation 02:46:52 distributed data parallel (DDP) 03:10:21 datasets used in GPT-2, GPT-3, FineWeb (EDU) 03:23:10 validation data split, validation loss, sampling revive 03:28:23 evaluation: HellaSwag, starting the run 03:43:05 SECTION 4: results in the morning! GPT-2, GPT-3 repro 03:56:21 shoutout to llm.c, equivalent but faster code in raw C/CUDA 03:59:39 summary, phew, build-nanogpt github repo
1
8
458
tbh i don’t need LLMs to solve phd level problems. as a coder i want them to be up to date with the latest API changes. if i was a manager id want it to solve people problems. im not out here solving competition math problems at work but openai progress is cool nonetheless!
3
7
203
If you're wondering how apple persona / meta avatars work — This encoder-decoder approach is dominant b/c it's not feasible to transmit high-res stereoscopic video streams at 90 fps, minimum for XR comfort fyi Quest 3 doesn't have avatars b/c it doesn't have face tracking cameras (ie. cameras facing downwards on the user's face). It's a hardware issue, not a software issue It's reasonable to predict that Quest 4 will have face tracking cameras to compete with Apple. That said, for an XR glasses form factor, I don't see how in the immediate future there will be 3D calling (or a need to have 3D calling) because of this limitation. 3D calling will be a headset-exclusive perk
Replying to @benz145
From what I can tell from the research: 1. The camera images are encoded using a neural network into latent codes (list of numbers) 2. The codes are sent across the internet to the client 3. The client uses another neural network to decode the latent codes to produce textures
9
571
As a consumer/programmer just tell me if o3 is better than Claude Sonnet, don't really care about other metrics tbh
1
7
244
Replying to @ReedSealFoss
it’s 3d, so there’s depth. it’s real-time, generating accurate facial expressions. combined is an overwhelming sense of presence you don’t need to go see people, physical distance isn’t a barrier anymore you and a friend just put on a headset, and you’re together as if IRL
2
8
1,351
"NeRFs are just small neural networks that are tens of megabytes large. 3DGS, with its millions of Gaussians, occupies almost a gigabyte." However, "3DGS renders new viewpoints fast – more than 90 fps for some complex scenes," a whole order of magnitude faster than NeRFs. 4/5
2
1
8
1,975
In 2024, I realized that when I'm low-energy and tired, the key was not to rest by doing nothing, but to actually spend more energy (assuming enough sleep) by: - swimming - biking/hiking/walking - unrelated side projects Wasn't intuitive to spend more energy to get more.
1
7
218
Amazing demo! Wish my $4k AVP could do this. The missing piece: 3D personas for co-presence. Otherwise, using these devices is such a solitary experience, esp if you're on a headset (less so for glasses). Once the hardware is launched, I'll make sure co-presence is there :)
The future of AI + XR starts here. Glasses that can see and think alongside you. Here’s what Google’s been cooking — its first on‑camera demo, fresh off the TED stage.
1
6
881
9 game-changing tips for multi camera calibration Do you work in computer vision/graphics? Turns out, camera calibration isn't as simple as running OpenCV's sample code. Don't be like me and save yourself several hours by following these (undocumented) tips. Especially #8…
3
1
8
442
Maturing is realizing that maybe I'll switch to using Cursor over Vim... :'( Yes I can do all of the stuff in Cursor in Vim, but I just don't have time to get those plugins working
1
7
268
Blake Scholl on building supersonic jets: "You just list all the reasons that this could not work and we systematically structured our development approach such that we were progressively, demonstrably reducing risk on every single one of those." Striving to do this daily 🫡
4
6
347
"A NeRF generates an image of a new viewpoint by outputting a color for each pixel. It uses ML to learn what colors to output." "3DGS generates an image of a new viewpoint by drawing overlapping colored/transparent ‘splats’. 3DGS learns what to splat via ML." 3/5
1
7
1,816
Obviously the environment is very controlled. The booth has built-in, clear, ambient lighting, and the table between me and the screen forces the spacing to be constrained. This makes sense; these displays don’t work as well in too close or too far distances.
1
7
1,146
Inference costs going down so fast. When I noticed that Project Digits is the size of a Mac Mini...🤯 only a matter of time when we all have these to run local models. Which means as a founder I'll assume inference costs will trend towards 0 and focus on other problems!
BREAKING: Nvidia, $NVDA, announces Project Digits personal computer at $3000, that is approximately 1,000 times more powerful than the average laptop. The device is powered by an Nvidia GB10 Grace Blackwell Superchip, which houses separate, linked components on a single chip to reduce the time it takes to move data between them. The superchip features an Nvidia Blackwell graphics card and an Nvidia Grace processor, packaged with 128 gigabytes of memory and 4 terabytes of SSD storage.
1
7
251
As someone who spends most time just SSHing into a remote Linux machine to do ML I always recommended getting the MacBook Air... But man it really sucks to do any sort of visionOS development on it. The simulator is much too slow
7
344
A few weeks ago, I integrated ChatGPT with my design tool! Using natural language to automate tedious design workflows is promising; only possible by designing the entire app to be AI and LLM-native. More results coming soon! #ar #vr #webxr #ai #chatgpt #gpt4
2
5
267
Replying to @MarcoKelly_23
not yet it's currently running on webxr though so when it is public, it'll just be as simple as going to a website on the headset's browser
3
7
623
A month ago, I became super interested in creating #VR demos with real-time information. Here's something quick I produced over a weekend, where the planes you're seeing are all of the Delta Airlines flights over North America at the time of recording.
2
2
6
Getting Windows, OpenGL/OpenXR, CUDA and PyTorch to all work together properly for a real-time, distributed application has got to be my best technical accomplishment to date. I never expected this to be so hard. Let me know if you ever run into bugs, I think I can help out
1
7
246
Seeing so many guides on "how to vibe code" is so weird and hypocritical Just ask Claude, try the code, and repeat -- this isn't that hard, hence it's called vibe coding
1
6
273
Replying to @chrisgrayson
very fair - but from a tech standpoint the torso, arms, etc. imo are easy additions but no one's got the head working well enough yet so we have to solve that first
1
6
512
As an SF native I love seeing Apple do their events in the city. The city's so beautiful and honestly pretty underrated. I love it here 🫶
6
199
Since I bought this 3rd party strap, I’ve been using my Vision Pro 5+ hours at a time for remote work. It converts the AVP to an open-faced design that doesn't uncomfortably press on my cheeks and lets all of the weight rest on my forehead. Highly recommend it! Link below.
2
6
311
I wish the speakers were better; they were definitely the cheapest component out of the whole system. Having great surround sound could bring telepresence further, but obviously I’m asking for too much.
1
6
925
Slowly getting better at prompt engineering! Crazy to think that I was able to do this on my phone in bed this morning #midjourney #ai #ml
1
1
6
My annual "please vote for your favorite SF Mission burrito" post! whohasthebestmissionburrito.… (I definitely think the last few are better than the top few, but the votes don't reflect that.. 📷)
1
6
261
Wow the @getviture Pro XR glasses are amazing. In use cases like flying, the Vitures beat the Vision Pro (and any AR/VR headset I've tried) by a mile. Highly recommend! Let me know if you want a referral link ($50 off). Below are reasons why I like them: (not sponsored) 1/
1
1
5
1,493
So easy to take modern laptop power for granted Tried running a bg removal alg that runs 140 fps on macbooks on the browser but running it on the Quest 3 makes the whole OS stutter at 5 fps. The lesson is if you're processing Quest passthrough, you'll have to do it server-side
2
6
508
It's not except for the fact that it's called a "hologram" The term hologram is so over-used and could mean anything now smh
6
3,018
nothing quite like a quiet saturday night with good music and coding
1
6
307
Replying to @bnj
iykyk
1
6
1,926
Unintended side effect of Apple's Continuity Camera (using your iPhone as a mac webcam) is that I can access it via OpenCV, which makes it really easy to prototype vision applications via Python on a mobile camera source. No iOS dev necessary and not compute-constrained. Amazing
5
235
Anyone else up at this ungodly hour to get a vision pro..
2
155
Martial law in Korea’s frightening I’m so thankful that living in the US as a founder unblocks me from doing anything. The only blockers are sickness, family/friends in need of help, etc Can’t imagine having an unstable gov hanging over my head So grateful and prayers to Korea
5
244
Replying to @benz145
richie’s plank experience
1
4
333
Despite the hate the Bay Area gets, it's truly humbling driving down 101 and seeing the legacy that so many people built in the past few decades in tech. Almost nowhere in the world do you have such a high concentration of high-achieving, risk-taking people. Extremely motivating!
5
Is it even a conference without robots? #SIGGRAPH2024
5
281
At #GTC24 ! DM me if you want to meet up - would love to talk anything 3D
4
266
Replying to @UploadVR
Planning? Should be done already
1
5
273
Replying to @nakul
@ashwinl ! He has a wealth of experience in computer vision, AI, robotics.
5
772
When I do webdev it's often just to showcase my work, so I'm always years behind with frameworks. In college I used vanilla HTML/CSS/JS. 5 years ago I was astounded by how nice React/Tailwind was. Now I'm using Next.js and man, it's so easy to deploy. What else am I missing?
1
5
450
Google Beam fosters authentic human connection via lifelike 3D presence while Veo 3 creates AI videos indistinguishable from reality. The cognitive dissonance is striking—one preserves our humanity, the other blurs it. We're building tools that both connect and deceive us.. 🤔
1
5
839
Replying to @fewerwrong
i can't stop laughing and i'm not even fluent
1
5
1,043
I spent basically the whole weekend debugging authentication middleware for a webapp I'm building. Shouldn't this be easy to do by now? The bugs I was running into were non-trivial, especially the ones related to preventing CSRF attacks :/
2
4
ah, i've forgotten you my friend tfw you start sweating because an old program starts crashing because of a dependency install in the recent versions but that dopamine hit when you figure it out *chef's kiss*
4
172
blew a fuse in my house because of my dual-GPU training while the heater was on. guess modern houses weren't built with deep learning in mind??? smh
3
387
Replying to @Tropos_AR
That's the plan! I'm extremely bullish on 3D calling as a means to connect with loved ones (some of mine are overseas), and 2D to me doesn't cut it. Honestly I think the Apple Personas aren't bad, but the problem is more that not everyone wants to buy a $3.5k headset
2
5
222