Patrick O'Shaughnessy

Wasm, WebGPU, & WebNN: How compute abstraction are enabling client side AI

Wasm, WebGPU, and WebNN are the foundational compute abstractions that enable developers to build and deploy AI systems with almost limitless autonomy and control. In this talk, Thomas Nattestad, Senior Product Manager for Chrome, discusses high level advantages and disadvantages to doing your AI on compute abstractions, real world examples of how developers utilize them today, and cover some of the latest developments to each that will further enable your AI projects on the web.

Published
Published Nov 17, 2025
Uploaded
Uploaded Jun 13, 2026
File type
YouTube
Queried
0

Full transcript

Showing the full transcript for this video.

AI-generated transcript with timestamped sections.

1:39-3:22

[01:39] possibly are now possible to not only run on the web, but also interface with from other languages. [01:46] Thank you. [01:47] WebAssembly is fundamentally a CPU-based execution, so everything you do on WebAssembly gets executed on the CPU. But this offers still really great maximized performance, because it also has access to things like threading primitives in the form of web workers with shared memory, and SIMD for doing more optimized, efficient computational instructions. [02:05] Now, you might be asking yourself, like, Thomas, why are you talking about CPUs in an AI talk? Like, I'm pretty sure NVIDIA didn't become the world's richest company. [02:15] by shipping a bunch of CPUs, and you'd be right. [02:18] But the nice thing about CPUs is that they [02:20] offer a really reliable target. [02:23] in contrast to sometimes the GPU. And this is because CPUs tend to be pretty stable across a lot of hardware. [02:30] Whereas GPUs can really fluctuate. You can go all the way from a gaming PC with a super nice graphic cards. [02:36] all the way down to like a mobile portable device where the GPU is just barely struggling to paint pixels onto the screen in the first place. [02:43] And so, with Meet as an example, we'll see later, they actually end up doing a lot of their AI operations on the CPU just because the GPU is so busy doing all the other stuff on the system already. [02:55] Thank you. [02:57] Thank you. [02:57] Which brings us to the next computer abstractions, WebGPU, which is, of course, the new graphics API that largely mediates access to the GPU. [03:06] It's the successor to WebGL, but unlike WebGL, it has explicit support for compute shaders, which is obviously great for if you're trying to run AI operations. And WebGPU, in the form of GPU access, will generally offer you the highest throughput. If you're trying to do very large, complex,

3:23-4:56

[03:23] AI operations, web GPU is a really great option for that. [03:26] The other really exciting point that I want to mention is that WebGPU is going to be generally available across all browsers except Firefox Mobile in November time frame. So just this time next month, you should be able to ship reliably on WebGPU to all of [03:43] except Firefox Mobile, which should be coming later. [03:46] Thank you. [03:46] WebNN is the last one that you've also been hearing a lot about today. I want to underline that WebNN is still a very early stage API. [03:54] WebAssembly has been shipping across browsers for many years. WebGPU is now shipping across all browsers. WebNN is something we're hoping to do an origin trial, kind of like a beta trial. [04:03] around in like a Q4 of this year and going into Q1 of next year. [04:07] Um... [04:08] But the interesting thing about WebNN is that it actually offers access to the CPU, the GPU, and the NPU. And it does this by leveraging the OS provided [04:19] frameworks that already do a lot of this work. [04:22] And so in the example of if you're trying to run WebNN on a Windows device, it'll actually go through Windows ML runtime. [04:28] which will then further figure out which CPU, GPU, and NPU to use [04:32] based on the-- [04:33] knowledge it has about the operating system. [04:36] So, you might ask yourself, how do I actually use computer abstractions for AI? And while there are some experts of you in the audience, certainly, who will be using these computer abstractions. [04:46] The general answer is the neat part is that you don't. [04:49] But a lot of these frameworks that you've been hearing about do very much use these compute abstractions.

4:57-6:35

[04:57] And so let's walk through what that actually looks like. [05:00] So you're a developer. You have a model that you want to ship to all of your lovely users. [05:05] So you start with that model, and then you pick some web-supported AI runtime, like all the ones we've been hearing about, MediaPipe, Onyx, LidarT. [05:13] and transformers and more. [05:15] And then each of those frameworks in turn target one of these different compute abstractions, including optionally JS. [05:22] And all of these computer abstractions are browser agnostic, which means that they should work the same across all browsers. And at this point, both the framework and the model doesn't even have to know which browser it's necessarily trying to run on. [05:35] Then we get into the browser-specific pieces. I showed the Chrome one here, because that's what I work on. And here you'll have something like the V8 engine executing JavaScript and WebAssembly. [05:45] the Don implementation executing WebGPU. And for WebNN, it'll actually tie into these Windows ML, Core ML, [05:52] or on Android, LightRT itself, which gives a sandwich situation that we talk a lot about. [05:57] where LiDAR-T.js is a-- [06:00] web-facing developer framework, but there's also a native runtime that is used for doing a lot of these web-in-in operations. [06:06] just to make it really confusing. [06:08] So next question is like, OK, but why? Why do all this? And we've already hammered these to death, so I'm actually just going to skip this slide, because these are the things you've heard about like six times already. [06:20] And I'm really going to ask the question of like, OK, so why specifically compute abstractions? You've been hearing a little bit about these built-in AI APIs, like for the barbershop and lots of other great examples. And so I want to really draw the comparison and contrastion to inform you on when you should be using one versus the other.

6:35-8:10

[06:35] So I'm going to be a really bad product manager for my own area and start by telling you why you should not use compute abstractions. And the biggest one to really think heavily about for your use case is going to be bundle size cost. [06:47] And this is something where different use cases are going to have a very different tolerance for how big your bundle size can be. If you're something like a blog site or an e-commerce, you generally want to be staying in like the hundred of kilobytes in terms of site resources, like maybe very few megabytes. [07:03] A lot of these models get to like tens, if not hundreds, of megabytes pretty quickly. [07:07] So for those kinds of use cases, you have to be really careful about what and how you're shipping it to the user. If you're something like a Figma or a Photoshop or some other large studio-style application where you're expecting your users... [07:20] to spend maybe hours or days or years in your application, handling large downloads, and being able to cache them for your specific origin [07:27] could still be a much more viable approach. [07:30] It is also not ready to use out of the box for compute abstractions. These are low level building blocks and as such you have to actually use them to build up to something more. Luckily this is where those frameworks that we talked about earlier comes in very handy and they'll handle a lot of this complexity for you. [07:47] but by their very own nature, these are low-level building blocks. [07:51] The last one is also no automatic updates. With something like built-in AI APIs, the model itself will continue to improve and get better, and Chrome will handle all those updates, which in an ideal case means that your application just gets smarter over time. Because you control your own destiny with computer abstractions, you're then also responsible for those updates.

8:11-9:41

[08:11] All right, now for the much more fun part, and the part they pay me for, is telling you why you should be using this stuff. [08:16] And the first one, and maybe the biggest also philosophical argument, is just agency and autonomy. When you're using built-in AI APIs, the model is chosen for you for the most part, [08:26] planned to be, and you really kind of take what you get. You take a large dependency on someone like Google or someone like Chrome. [08:34] to handle a large part of that. This is part of what makes it really easy, but you also then take on that dependency. And with computer abstractions, you're in control. You can decide what to ship. You can decide what model to use, when to update, all these kinds of things. [08:48] The other really big advantage is immediate cross-browser support. Like we just covered, WASM and WebGPU are shipping and will be shipping across all major important browsers. And that's something you don't have today, at least, with built-in AI APIs, though, of course, we do hope to ship those across browsers as well. [09:04] There's also differentiation on model choice and quality. If you feel like your model and your AI system is differentiating aspect, then shipping that on computer abstractions is really your only choice, as for built-in AI APIs. [09:17] Everyone gets the same model. [09:20] The last one I'll mention is that computer abstractions can also enable very niche [09:25] models for the right job. There was a great example here where we were talking to a messaging application that wanted to do spam detection on incoming calls. They could have used an LLM, but like the memory and battery consumption of such a large system was kind of an issue, and they found that they could ship some really small, simple text-based model

9:42-11:17

[09:42] that was very specific to that task and used far fewer resources. [09:45] Um... [09:47] And so to keep beating this metaphor that we keep coming back to, the built-in AI space is really these fast paths, easy journeys. [09:53] Where computer abstraction AI is much more of an adventure and going down the untrodden path. It gives you a lot more leeway of where to go, but you also got to put in the work to do that. [10:04] All right, in the last five minutes I want to blast through a few different use cases. You all have actually seen a lot of these examples already, so I'll try to go somewhat quickly through them. And also talk a little bit about how well-suited computer abstractions are for each of these use cases. And the first I'm most excited to highlight is media stream manipulation, because this is really an area where doing it on computer abstractions [10:24] and client-side is just fundamentally better, in most cases, than doing it on the server-side. Because you have such a high latency requirement, or requirement for low latency, if you will, [10:36] And because you're handling such large volumes of data in these videos being [10:41] transferred between calls. Being able to do these client-side operations is a huge latency win and a huge cost-saving win, if you're trying to operate these at scale. [10:52] Next up, and again, you've already seen a lot of these, but like Whisper on WebGPU works really well. Speech to and from text is something that [10:58] like is ready for you to try shipping today on computer abstractions, and there are reasonable-sized models for you to go and try this out today. [11:06] Then there's image recognition, classification, optical character recognition. This is another really well-supported, strong area. This is something you can do and integrate for your applications today.

11:17-12:48

[11:17] And an area where I have this great example of how good SIMD operations are for improving the performance of WASM. This is where, like, [11:24] using these modern-day computer abstractions can offer you so much better performance than something like JavaScript a few years ago. [11:31] There's also photo and video editing. Here you see Photoshop doing not only smart object selection, but also smart object recolorization operation. [11:39] And these are handled client-side when the hardware is capable of doing so. And this, again, works really well. It's just like ready, [11:45] Another one to just keep blasting through is text classification. It's really hard to find a good visual for text classification, so I'm just showing you some code here. [11:52] But I swear, it works really well, and these are small models, and [11:56] It's also something that's ready for you to try. [11:58] Then we get into the photo and video generation, and this is where like if you're an e-commerce site or like a blog, it's probably not going to be [12:06] something you can very easily ship. The models do start to get larger. The resource requirements start to get bigger. [12:11] But it's very possible, and you've seen these examples earlier today where you can just generate these images on the fly. [12:17] We talked to one partner who was doing image manipulation, and he said, [12:22] Well, if I go to a wedding and I take a thousand photos of something and the [12:27] you know, clients want like a specific object, like the Bride or Groom, removed from all the photos. [12:32] then that's going to be much more scalably easy to do [12:35] if you're able to just crank through a thousand of these photos without thinking about the [12:40] requirements. [12:42] This one you've also seen before, writing summarization, Q&A text, like text manipulation,

12:48-13:32

[12:48] LLM-style functionality. This works, but here is where the models start to get really large, and you have to be really careful. [12:55] that this is actually something that makes sense for your use [12:57] And the last one I just decided to throw in here is coding. [13:00] I've never seen this be successful. Coding is such a complex thing with such large context windows where [13:06] The quality really, really matters to an extent that like, [13:09] It'll be a while before we see this stuff running on computer abstractions. [13:12] Thank you. [13:13] But with that, thank you so much for letting me talk at you. Please do come find me. I'm very reachable at this email. [13:19] very approachable if you have any even non-ai related topics in the space of wasm and web gpu [13:25] Please don't hesitate to reach out. I see people taking photos of my email. I love that. So please keep going. Thank you so much. See you all out there.

Want to learn more?