How Anthropic Uses Claude Fable 5 With Mike Krieger
Mike Krieger built one of the most consequential consumer apps of the last two decades as the cofounder of Instagram. He is now at the frontier of AI-native product development as head of Anthropic Labs, the team responsible for figuring out what the most capable AI models can do in the hands of real builders. When Krieger first got access to Fable 5 months before its public release, it was exciting and disorienting. “I feel like a total newbie again,” he remembers telling his team. The way he’d been thinking about productivity, strategy, and time management was out of date. The model had outpaced his workflows. Dan Shipper talked with Krieger for AI & I about what it looks like to build with a model as capable as Fable 5, including the new rhythms, challenges, and possibilities it reveals. If you found this episode interesting, please like, subscribe, comment, and share! To hear more from Dan Shipper: Subscribe to Every: https://every.to/subscribe Follow him on X: https://twitter.com/danshipper Get started with Braintrust at https://www.braintrust.dev/ Timestamps: 0:03 Introduction 1:48 How Fable completely reshaped Mike's workflow 4:48 When to use Sonnet versus Fable 10:06 What the media tracker Mike built over a weekend reveals about agent-native architecture 15:00 The cost to build has collapsed 19:03 Is software engineering over? 21:48 How Anthropic's engineering teams work today 38:39 The mechanics of verification 44:39 What people should use the model to build 47:24 Dynamic workflows Links to resources mentioned in the episode: Mike Krieger on X: https://x.com/mikeyk Anthropic Labs: https://www.anthropic.com Claude Code: https://claude.ai/code Every: https://every.to
- Published
- Published Jun 10, 2026
- Uploaded
- Uploaded Jun 12, 2026
- File type
- Podcast
- Queried
- 00
- Source
- share.transistor.fm
Full transcript
Showing the full transcript for this episode.
AI-generated transcript with timestamped sections.
[00:00] you [00:04] Mike, welcome to the show. Great to be here, Dan. Good to see you. So for people who don't know you, you're the head of Anthropic Labs and you're the co-founder of Instagram. And today what I want to talk to you about is Fable 5. So Fable 5 is dropping tomorrow, recording this the day before it's going to come out after it drops. But what I really wanted to do is bring you on the show to tell me about what it's like to use this model beyond the first day. I think when a model this powerful drops, it's so useful to have someone who's using it day in and day out to tell you this is where it's powerful. This is how what it actually changes. This is what it doesn't change [00:34] you kind of don't get the same AI psychosis type thing. You can actually think about, okay, this is how it fits into my life. [00:43] Yeah, absolutely. It's also just been interesting. We've had [00:48] some, you know, [00:49] models in this mythos class leading up to the Fable release for a couple of months now. I think it's very exciting to see how people will build with this externally. But I think you're also right that day one impressions, [01:01] I think it really comes from getting to use this over a couple of weeks. I think we've seen that even with previous models, like the December [01:07] into January usage, Opus 4.5 or Opus 4.6 was really important because people spend extended time on the model and then figure it out. Oh, actually, I wasn't pushing hard enough. I got to go further. I got to rethink what's even possible with this generation. Totally. I mean, I don't know. I feel like there are people internally at every who have been using it, who have been like, oh my God, I think I kind of need a new set of skills to use this model. And I think you can especially see this with people who are maybe more non-technical internally and who are more on the knowledge work side of things where they're like, I don't even know what
[01:37] use this for. And the people who are orchestrating agents are like, holy shit, I feel like there's so many new things I need to learn. So I'm curious for you, tell us about the difference between your impression when you first tried it and now. Yeah, I think that your point on adapting workflows is a really good one. Quite literally workflows, I'll talk about that in a second, but also just in terms of like, how do I like think about usage of the model? Because, you know, at first, the timing was interesting, because it kind of coincided with me transitioning from CPO [02:07] builder mode. And I think it was about a month and a half or two months into that, that we first had, you know, one of these models available internally. And I sat there and I was like, I feel like a total newbie again, because I feel like the way that I am prompting or even thinking about decomposing a task is really out of date now with this model. Like, it's no longer and it's even thinking about the time horizon or the sort of like interactivity model, I think has to evolve as well. Like going from, I think early on would be like, I have [02:37] an idea for this feature. Can we start by like, absolutely not right to great. Like, let me express more of the intent. And then just being, you know, I remember like, you [02:47] you know, [02:48] March, April be like, wow, on the one shot, it's already incredibly impressive. But then it also understands the intent around how we're going to evolve this and understands like the global context as well. So I think that's been a really interesting evolution till now where, you know, I was funny, I was talking to somebody this morning where, you know,
[03:09] I think about doing work at a flight and I was like, okay, I can do most of this work remotely. And I don't even worry that like the Wi-Fi is going to drop out because I know that if I set up the right, you know, context instructions, like, [03:21] slash loop, you know, I'll see it. It'll see it through. And I think my last two months have been full of a lot of times where I will, you know, wish Claude a good night, set it up on like a pretty complex task of something of this like model class and wake up to, you know, actually, it's usually done by like two in the morning. And I guess it just foodles and stones for the next four hours. But like, uh, [03:43] Thank you. [03:45] really impressive ability to like complete the swing get itself out of the situation where it's [03:50] All right, well, Mike asked me to do this complex task overnight. I got stuck because this remote service went down. I'm going to write a like scaffolded like back end for it for now. So, you know, I'll document that, you know, go all the way through. I have it like. [04:05] Good mental model of like how far that's going to get me. And then when it comes back online, I'll fix it. I'll keep track of that fact. It's just. [04:11] Like. [04:12] It is, I think the most impressive thing for me is like, you're just being able to like, [04:17] delegate that kind of level of task and just trust that the right thing will happen by the end and of course like you'll review the result and there's still like a whole verification thing that we should we can and should talk about because I think that's an important part of still completing the swing there. But it's really forced me to rethink, like, what is being productive with one of these models look like and it is much more like we've talked for a while about, you know, like, what is it like when these models are more of like a companion or a co worker? And it really feels like now it's like a teammate that I can delegate like a lot of work to.
[04:47] What is your day-to-day flow like right now? Because one of the things I notice is if you just give it a big task and you monologue into it and you just let it go for a few hours or overnight, it's like, [04:56] the most impressive model that I've ever tried. But you know, it's so slow and it's so expensive that you I feel like I don't want to use it for day to day tasks. So what is your actual flow like in terms of how you use it day to day and where does it slot in versus other models? [05:10] Yeah, I've ended up having a lot more... [05:13] um, [05:14] architectural planning conversations up front with it as well. So that's been like another interesting change where I think there's an era that I think all models need to continue to improve. And I'm really grateful for the Instagram experience of having to like start, you know, from our initial version that was like duct taped on a server in LA to like being able to scale it and eventually integrate it with like all of like the Facebook infrastructure because you kind of develop a sense of what [05:39] what infra abstractions and complexity are appropriate for each stage of it. And I still don't always go back and forth with Fable where it'll be like, Oh, this is a good, you know, implementation. Like, well, I do plan on shipping this like fairly soon. Like, I think we should probably think about more than one server. And kind of like that back and forth is important. But like a lot of that sort of planning and often actually ask it. It's kind of a thing I've realized is Fable can like be so, um, [06:07] Ah, that's really... [06:09] sort of complete in its thinking in terms of how much you are sort of planning with it. And often just saying, can you just like make an HTML page like that represents what we just talked about so I can share it with the team is actually valuable or even just a markdown document, but I like having diagrams. So that's been an interesting like use of like, let's plan with it, let's think it through. And then let's have some sort of document that we can align the team on because then this is a dynamic I've seen in labs and just teams beyond anthropic, which is.
[06:36] You can build a lot very quickly and it just, [06:39] forcing more of that early alignment, even if you [06:42] through an initial prototype and then back it out into more of a sort of plan architecture that works too, I think is really, really, really key. And it ends up being the place where the human to human interaction still stays very much part of the process. And then from then on, I think, you know, either overnight or during the day, like having it execute on those chunks of tasks is really important. And it just means having a lot more concurrent sessions than I did before, [07:12] I go back and forth between liking having one like very long running cloud code session and really asking it to do everything in background sort of forked sub agents. So the main thread stays responsive. And then other times just embracing like, I'm just gonna, [07:24] it's one of those days we're going to have like five or six tabs, like tackle, like long comprehensive work. But I do think that there's something to this, like, [07:31] long horizon and like, uh, don't, [07:35] you know, don't worry, I'm on it, it's gonna take me a while and like more of like this back and forth. And that modality, I think is something that we'll have to figure out in our products as well. I think you want to preserve both and they interact with each other in interesting ways. And like my preference is usually, I always like having at least one cloud that is high context, but also very, very fast response. And like its instinct is, right, I'm going to answer you and I'll kick something off if I need to. And if not, I'm just gonna, you know, hang tight and wait for
[08:05] fix this, you know, uh, [08:07] interaction question or something that's like very fine detailed, um, like fable will go off and think very hard about those things. And I think, um, fable is the first model where I've actually played more with the effort levels for that reason, where I've been like, okay, this is, I just needed to like, [08:21] tweak some UI. I'm not actually going to fall, but like, no, [08:24] put it to medium or something and see how that plays out. Didn't find myself doing that as much with Opus, maybe because the range felt less like wide, where it really can feel quite wide with Fable. What about like a quick question? Like you're on the go. Like, are you asking Fable, you know, random questions as they come to you? Because it feels like you're using a rocket launcher to kill a mosquito or something? Or are you flipping back and forth? [08:47] It's so funny you asked that because I had been and you know, you're like, it's thinking and thinking really hard about it. Then, um, since last week. [08:55] I was like, no, I was asking it something that like true. I felt embarrassed actually asking Fable about it was something like, probably something NBA finals related and I was like, okay. [09:05] I switched my iOS app to Sada and I was like, oh yeah, I use this all the time for fast questions. It's like a magnitude like in feeling of like, and it's actually not even the sort of like tokens per second. It's actually probably more around how much thinking goes into the answer and sometimes like the answer does not need to be fully thought through. So yeah, I am thinking myself through and I think this is a good product question for us too, which is. [09:30] you know, in general, you don't want people to have to be thinking so much about these choices. So ideally what we can sort of coalesce around in the longer run is sort of, you know, maybe like some more bucketable use cases that are really grokkable to people, or maybe it varies by surface where it's actually probably unlikely that most of the time with the iOS stuff, I'm doing Fable type tasks and, you know, having a sticky model selection per surface might be the way to do that. And we'll have to sort of explore what that means from a product perspective, but I've for sure,
[10:00] masks on it. Can you show us something that you've built with it? [10:05] Yeah. So one of the things that we were, we did this, this go around is we encouraged personal sort of like account usage for us, like, especially on the weekends, which is really fun because we have, you know, you can imagine like a lot of anthropic specific, you know, tooling, et cetera, but it was really good to start to step back and just like, you know, pure cloud code. [10:26] Let's like work on something over the weekend. And you're in the terminal app or you're in the desktop app? [10:30] That's a great question. I'm mostly still in the terminal app. It's interesting watching my wife, who's like not a professional engineer and more of a UX designer PM, like really fall in love with cloud code via the desktop app. And I think it's like sort of simplified some of the abstractions for her in that way. But for this one, I was still, is it ghosty or ghosty, ghosty and, and the terminal app. But let me show you. I am. [10:55] This is one of those like [10:58] everybody has some bespoke need around this. Like I wanted a good sort of media tracker experience. And I was like, you know, I'm playing games, like I'm watching TV shows, I get all these recommendations. I just wanted to build something like that was personal to me and like sort of fit some of the use cases that I that I had. And I got like the two biggest criteria that I started with was like one, like really easy to add things. And so like you can talk to Claude, Claude does the gentix searcher for everything and then puts the right things in. And then also proactively, like, you know, there's a new season or a new like sequel to a game. [11:28] that it could go off and research those things. Most of the UI was like, you know, Fable one shot, which was already impressive. But then the thread I've been pulling out a lot in labs this year is, how do you sort of bring the software team, which is Claude these days closer to the software itself? And so this was like maybe, you know, Saturday morning, I had a full weekend with kids stuff. So a lot of this was sort of kick off work, go do, you know, go for a hike with the kids, come back, you know, continue to do the work.
[11:59] sometimes check in on the work on the hike. I probably shouldn't. But you know, it was like, nice to like, pop into remote mode and see what was going on there. You know, I try not to do that too much. But I had this idea around, hey, like, could you could really like do a spike on I say spike a lot with these models. And we're like, can we do a spike on like, what if you could actually modify the software from within itself, which is, you know, and it was I built both. It was like a react native version. And then this version is just the web version. So I already had like a chat [12:29] to, you know, [12:30] add things by URL, which is like, [12:32] you know, I want every software to have this where I should never have to like navigate a menu to do anything ever again. And this is like, in many ways, Dan, like the I was trying to distill the like agent native architectures to like, it's like fullest degree, which is like, also have the agent be able to modify the app. So like, maybe like phase one of agent aided architecture, like every single thing in this product is, you know, accessible from the agent and like that tool calls, etc. That's like, you know, hopefully becoming table six, it was sadly not in a lot [13:02] And then there's a Brazilian, there's like a show about radioactive stuff in Goliad. And yeah, I did not remember what it was called and cloud was able to figure it out. It was like so much better than being like trying to figure that out intuitively. But then the next step I was interested in is like, what would it mean to actually be able to modify the software from itself on the go? And so if you long press this little chat thing, so what it actually, what I built, [13:21] with Claude built was a way where it used, uses our managed agents to basically take on like, edit requests, and then you can preview them. And I used like the Vercel live preview thing here. This, like this whole, like feature was also one shot, which was really cool. And I just added to it over time. But you know, it was like, it actually does like a little diff view if you wanted to, you can go into the managed agent conversation and see like what it did. Although I almost never do because again, it's like,
[13:50] especially don't particularly care on like the code quality of like, or the like long term maintainability of this software, you can see that I had a session in here too. But it's been really fun. So I'll be using it on the go and say like, you know, I had a feature request either day, like, oh, like the floating action button was too low on native iOS, but it was okay on there. Like, can you go off and do it? It did it. It was really fun. With some of the like expo tooling now and actually like live reloaded on my phone, which was also like a really cool kind of feeling. But it's [14:19] Uh-huh. [14:21] you know, [14:21] Does this thing need to be like a [14:24] you know, production level thing that's going to go to a million years. No, but it felt really good to have something where I felt like it didn't have to stop at just the weekend. And I could keep [14:30] working on it just by using it and having this like [14:34] kind of end to end close thing. So I felt like this was a good manifestation of both like fables building ability, but also like, I think a lot of both of I have been thinking both you and I have been thinking about like, how does cloud embed itself and like into software beyond just even the usage side of things. This is really cool. And I want I want people to understand like, [14:52] So, [14:53] This has been built, you could build something like this, maybe not the self-modifying part, but you could build something like this for like 10 years or 20 years or something like that. But the cost to build has gotten dramatically lower. So think about how much it would have cost to do this in the Instagram days. [15:10] versus now, like, can you help us understand, like, how that has changed? [15:15] Yeah, I think, and I think about this a lot when I think back to that, that time as well, because you know, I, I thought of myself as a very productive programmer in the early Instagram days, you know, I was like really into mobile development and we had like a good clarity of, of things. And I think the, the gap from.
[15:32] idea to fully realized version of like some complete product, like you were still looking at, you know, [15:38] four ish days of kind of my all-nighters, which was like, my natural state is up till four, you know, sleep until noon, which is not conducive to family life. So I've had to shift, but that was like my building thing. But yeah, I call it, you know, Instagram V1, which, you know, [15:51] Brian more features than this thing did, but not by an order of magnitude was like five days of all nighters. Me working on like the sort of front end and back end and Kevin working on initial filters to get that out. And and this was also like, you know, like built on already, you know, many years that I've been working on on iOS pieces as well. And then the iteration, you know, I think a lot about what we were gated on after that launch, when things went well, was we had all these ideas for where to take it. [16:21] up or we were just trying to like add the one incremental feature and, you know, hashtags take a week to build, but then there's like all the things that you want to continue doing on it as well. And so I think it's both that shortening of time, like there's still the time required for the idea and the concept and the iteration. And then the other piece, [16:40] you can then iterate on what you have. And I think a really, I think really fun, but also like very, you know, sort of in the flow kind of way. And then, you know, if now this is me as a professional, [16:54] software engineer, startup founder. [16:57] Beyond that, if you had that idea, you know, and I saw [17:00] multiple people go through this. Like, it was like, well, I'll try to find maybe a consultancy that will take this on. But like now there's like, it's a really lossy process of what I wanted. You know, yeah, they're gonna go raise money for it. And I think that, um,
[17:16] the thing that I think is like the most exciting part about these models getting not just more autonomous, but again, closing that gap between intent and execution is what I've seen it do to people's ability to build who are not like builders and, you know, [17:33] the trajectory of these models has been, you know, if something [17:36] able, you know, of this general mythos class is like in that class of models and eventually, you know, [17:42] bottles of you know that are [17:44] cheaper and more accessible to other folks become available too. And like, as that process happens, like, I just think it is just opening up so many, like I got a thing the other day, I get very excited about the stuff you can't tell from somebody internally. And we had built them an internal tool that kind of combined Fable and like access to some internal MCPs. And she said, like, it is the first time in my life and she works in recruiting. She's like, the first time [18:14] is now like they're right next to each other. Like I can just do it. And, uh, it was like very, like a meaningful moment to her because prior to that, like. [18:22] I remember these days, these days were five years ago or four years ago where that person, if they wanted a tool, would have to either make do or try to get an internal tools engineer that probably was overloaded with 50 other requirements. But instead now they like are just having the time of their lives building. And I think that is I think that's cause for a lot of like hope. I don't think that human capacity for creativity and what's possible is enormous. And I think like at our best, we are basically expanding the number of people who can then see that through to something that feels real.
[18:52] I totally agree, but I do think that there's a question in the back of my mind, and I think it's probably going to be in the back of the minds of some of the people listening. So I want to ask you. [19:01] Given everything you just said, [19:02] Is software engineering over? Yeah, I think software engineering is different. It is like dramatically changed and. [19:10] As I probably would have defined it if you had asked me around the Instagram time, like, what is software engineering? I'd probably say, like, all right, like thinking through the hard problems and like thinking about an architecture and then like spending a lot of time in, you know, like TextMate. I don't know what that came to me, but like, you know, like text editor, you're going to edit this in or Xcode, you know, and watching Railscast, you know. Yeah, exactly. Right, exactly. And understanding the intricacies of Django's like ORM layer and then like 15 bugs after you deploy it. [19:40] Like so much of that is radically different and collapsing into other parts of like product management. I think that sort of like PM and split, I think I see it even in our teams has become much more diffuse. [19:54] That's radically changed. But I think the overall, like... [19:58] like maybe zoom out from software engineering and think about like software production or, you know, software development, but not in like just a pure developer case. I think that is like alive and well and, and, and essential still. So I think that that is the moment that I feel like we are in. I think Fable is another step on the direction of [20:18] And I'm not going to call it the final step. Of course, a lot will still happen. But I think a pretty significant step in terms of like,
[20:25] the trust, at least I end up placing the model in terms of its capacity to see things through and even, you know, architect things reasonably is quite high. So that part feels like it is. [20:34] It is not ever going to be done, but it is [20:36] pretty, pretty done, right? Like it's gone really far. [20:40] But I think that the overall sort of [20:43] craft of... [20:45] What needs do you have? Like, what are you putting out? Like, is it actually good? I think still a very human endeavor, but I also sort of can see that that is not a transition that is sort of, I mean, [20:58] pain free in a way. Like, I think there are plenty of people who love the craft of like, actually, but I used to love stuff like I solved that problem so elegantly. You would dream about code. And if you were at the experience of like, you dream about the thing that you're working on, they like, wake up in the morning, like, I figured out how to solve this thing really elegantly. And, and that for sure has, has, has passed. And I think that there's, you know, there, there's, there is a feeling of loss, I think, in some of the like better engineers that I talked to, [21:28] So we're holding both ideas in our heads at once, I guess. Which I think is the most important part of this. Like it's normal to feel sadness for that kind of thing and excitement. But I'm curious, let's just take the thesis of software engineering is alive and well. [21:43] What does that actually look like inside of Anthropic? [21:47] Yeah, I think there's a few cases. I think there's still the crafting of [21:52] Well, I got to take it off from like the full software development cycle or like maybe what I see on a day to day, maybe I'll do a little bit of both. But I think there's still a lot of, you know, we all got together. We talked about the next way we want to, you know, evolve co-work. And now we've kind of broken it down into areas of ownership. I think that ends up still being quite important because there is still context that you hold.
[22:14] as a person that is sort of beyond cloud, right? Like, what is the actual intent of this product? How's it going? What do we need to know about the sort of other products that are coming down the pipeline that are going to be integrated in some interesting way? So I think that [22:29] aspect is really important still. And so, you know, though we have many clods to each human, each human, at least the way we've been working on tropics still kind of has, you know, we call them DRIs, like directly responsible individuals still has like a DRI ship over some part of the product or some area. I think that'll be the case for a while because I think there is value in not just this distributed, like we should all make co-work better, but instead like, all right, I'm thinking through how co-work does this particular task. And there's still a lot of, you know, the, we try to keep meetings minimal, but they still emerge and you still have these [22:59] conversations. [23:01] then like a lot of that sort of asynchronous delegation. I think what many engineers here have now found is they've they've all built. I think we solve this at some point at like a broader product level, but they've all built some version of all right, I'm going to now like [23:16] create a dashboard of where all my clods are doing and what's waiting for me and which pull requests like need my attention because, you know, either a human or a cloud code reviewer got back to me. So there's a lot of that sort of, um, [23:27] uh, meta maintenance of the, of the work that I think, uh, again, I think we'll standardize some, but I think some of it will always be a little bit bespoke to the way each individual likes to work just in the way that people organize their windows. Now they organize, um, their work. And then there is, I think also the, um,
[23:45] understanding how things work in production. And I think that is another, like, there's a few, like next frontiers, I think for the models. And I think one of them that Fable does, you know, make significant strides in, but I think there's, there's more work needed here is understanding what happens to code after it gets deployed, you know, because there's incidents there's, you know, this was all working well, but like this network link got cut, which is not in your usual failure mode. And like it manifested like so much of Instagram, like 2012 to 2016, it was like dealing [24:15] engineer still remains really key. And I think getting the reps in around incident response and understanding how to stay calm, gather data, like remediate what's immediate, but then like go off and work on longer term fixes like still a necessary part of it. [24:32] And I'm trying to think if there's any like other pieces that are that are notable as well. I think what's maybe the last the last thing to say is. [24:39] I really like the role that the engineering prototype now plays. You have to be clear when it's a prototype versus not. But, you know, the old phrase was like code wins arguments. And I never like loved that because like kind of sort of been big. [24:56] the person that could code could go do it, but actually like, why should they necessarily win an argument by, by, by default? But actually it's been really cool now where sometimes we will have some disagreement or some sort of debate about where to take a product. And often it's the PM that will say, all right, I just tried it and like jank in like these eight ways, but look, it actually shows like how this could work and that, that can open up some, some interesting pieces of conversation. So
[25:20] Almost all of that is quite different than it was six months ago. I think especially at the level of parallelism and the level of need for these kind of higher order abstractions of work. [25:33] But I think what hasn't changed is that ownership. - Lots of us are shipping AI to production, which is great for productivity, but it also comes with anxiety. You tweak a prompt, swap models, adjust parameters, and everything looks fine in testing, so you merge. And then three days later, or even sooner, the support tickets start rolling in. [25:49] The AI is giving your customers unexpected answers, and you have no idea when it happened or why. Braintrust is the AI observability platform that fixes this. It connects evals and observability in one workflow. That way you see what actually happened in production and can measure whether changes made things better or worse. Traces show the full execution path, evals define what good looks like, and experiments let you compare prompts and models side by side before shipping. Production traces feed directly into your eval datasets. [26:17] Every failure becomes a test case. You catch regressions in CI before they reach users. [26:22] and teams at Notion, Stripe, Zapier, Vercel, and RAMP use it to ship quality AI at scale. [26:28] Braintrust is designed for teams building production AI systems where silent regressions are expensive. It's built for any stack. They have SDKs for Python, TypeScript, Go, Ruby, C#. There's no framework lock-in or vendor dependencies. It's SOC 2, Type 2 certified, and GDPR and HIPAA compliant. Get started at braintrust.dev. That's braintrust.dev. And now, back to the episode.
[26:51] Fable is also very expensive and because of that, like when I was testing it, I felt kind of like I was a kid in a candy shop and I was just like, I'll do this and I'll do this and I'll do that. [27:00] But now that there's going to be a bill... [27:04] I'm going to be thinking about it because I have to pause before I do it to be like, is this going to cost me 100 bucks or whatever? And I do think that's going to limit who gets to use it and for what. So how do you think about that? [27:15] Yeah, I think it's most clear cut on the sort of professional software, you know, sort of [27:21] classic company doing work, um, [27:24] It'll be really interesting. It's like, you know, a lot of process that goes into pricing as well. There's like, it's both more expensive than Opus. And then also I'm like, [27:34] In many ways, it's really cheap. If you think about, you know, like how much incredible work it's doing. But of course, like everybody has their own economics around what they're, what they're, what they're working with. So anyway, most clear cut, I think, from most sort of software teams. And I think as an industry, if like phase one was, uh, [27:51] companies even struggling to get some of their employees to adopt AI coding, which models were early, maybe the tooling wasn't there. And then phase two was great. We'll create leaderboards and see who can use the most, which [28:01] you know, as you can imagine, creates like some like also like not ideal intentives to phase three, where people are like, okay, now we're just trying to figure out who's using it effectively and like letting them spend as much as possible, having a clear process for that, but making sure we're not doing things wastefully, which I think to me in general makes sense. Although I think you could like also over rotate that way too. I think something of Fable class should hopefully fit in well into that, where if you're demonstrating results and you're getting use out of the model,
[28:31] and perpetuates that. I think on the personal use side, it's a really good one. That's a really good question. I think where I've seen it, you know, even in my personal testing, because our personal accounts. [28:43] Okay, which is funny, like paying my own company I work at. But, but you know, you do become more, more thoughtful about it. Something that was interesting was this, the app that I built over the weekend actually fit in with like only a bit of extra usage. So it wasn't like, you know, thousands of dollars to build this thing that like is a personal thing to myself, but it was also spaced out a little bit more. Probably the, the in between of that, what we'll probably have to do the most thinking about is the sort of hobbyist or like independent, [29:13] like not, you know, within the larger company, but also is thoughtful about about the pricing as well. [29:19] I think like my overall advice is like, just give it a try and see how much it can do without you having to do a lot of follow ups. And it's like, I think measuring cost has gotten so multifaceted down because there's the per turn costs. And then there's like, what did it cost you not to just [29:35] do the task, but like complete the task to your satisfaction. And I think that's where Fable is really [29:40] shine for me, which is actually just does it right. So then I don't have to go spend the like, [29:46] nine, 10 subsequent turns be like, no, that was not quite what I meant. Like, can you also do this piece? It's been really impressive for me because you ask it to go do something and then it just does it does a thing and you're like, wow, you thought through all the little details of this thing in a way that I've never seen another model do. I don't know how much you can reveal about the training process, but what makes the model different? I mean, I think
[30:07] in many ways, a continuation of a lot of the work that the team has done. And I like bow down and total awe of our, of our teams, both, you know, on the pre-training and on the RL side, I think that the, the piece that it has, [30:20] evolved in that, at least I noticed the most is kind of adjacent to that as well, which is, um, [30:26] a sense of the system more than just the individual piece of the work. Like I will often be very positively surprised when it will write something and say, all right, but you know, [30:37] I know that like in production, this needs to be different. Like, and then it will keep bugging you. Like, have you turned on that? [30:43] like feature flag yet, like it's not going to work until you do. And, you know, sometimes being sessions that have gone on for days and be like, look, you still haven't done that thing. Like you better like I was like, you're right. Like I didn't turn on that feature flag. I should go off and do that or [30:58] if we change this, the contract will change over there. We're watching it. Actually, one of my favorite times of seeing it in action, I think where it demonstrates some of the some of the training is watching it respond to code review feedback, either from people or from from other cloud reviewers, where it doesn't just say, [31:15] Oh yeah, that's an issue. I'm going to go fix it. And actually... [31:18] really thoughtful around, hey, like for this level of like, [31:22] sort of [31:22] Fidelity of what we're building. [31:24] I'm going to accept this risk or [31:27] I see what you mean, other code reviewer, which is often just another fable model, like talking to you. Like, I see what you mean, but like, I'm actually going to push back. I don't, I think that that's actually not right. I think getting the model to have that.
[31:39] Judgment is really important. And I think if I had to pinpoint like an area where I feel like it's really progressed, it is that sort of not just immediate knee jerk. Yeah, yeah, that's right. I got to go fix it. And more, oh, I'll think about that for a minute. [31:53] No, I thought about it and I still disagree. You know, and I think that's a very useful [31:59] sort of ability. It's so valuable to have products like Cloud Code out there because [32:05] you have now like a living, breathing thing where people are like, this is where the model is doing well. And like, you know, we have like people who tested, I'm [32:14] count the every folks is like very, very high on the list. We're like, we really trust the feedback because it is being put to paces and like [32:21] repeated multi-day, you know, hard tasks. And that also like very much feeds into how we think about like, what do we need to improve on the next slide? Like what are the tasks that we need to specifically think about the model being better at? Is chat the right interface for this model? Because it's not very turn by turn. It's it's very like I'm delegating something for you. So how does that change how you should use it or how you think about the interface? I don't think like the fundamental like you are like sending messages and it is giving your message back is like [32:50] totally wrong. I think that there's ways we need to evolve, but like one is a [32:54] Maybe like [32:55] three that come to mind. Like one is... [32:58] your laptop the right place for it. So that's number one, where I mentioned with the side project I was working on how useful it was to have the mobile side. Boris, who created Cloud Code, and he's always like, you know, ahead of the curve on how these models get used.
[33:12] almost a year ago, maybe nine months. I was talking to me. He's like, yeah, I've moved a lot of my cloud code work to mobile. I was like, no way. And like, uh, it took me a while to get there, but especially with the family class, like there's oftentimes where, you know, [33:24] can keep the session going. And we use like kind of remote dev boxes and anthropic. Like, it is like, I'll have a thought and be like, okay, I need, [33:31] Can you keep keep up and doing that? So maybe number one is like decoupling the, uh, the, where the work is happening from where I'm talking to about the work. The second one touches a little bit on what I was mentioning earlier around like. [33:42] what are how do you take everything that Fable has sort of discussed or decided or proposed about something and make it comprehensible? And that's an area that we're thinking a lot about. Like there are some skills that are out there that we've used around like, all right, can you diagram this? Can you do that? So that's a place where the current chat UI [34:01] I think is insufficient where like it will experience this with people. It will give you like a lot of tech. You're like this. [34:07] I need to go, I take a walk before I'm ready to fully understand this. And I think that, um, uh, that is a piece of property. I have some things we'll do with fables. Like, okay, like, [34:16] you have a lot more context on this than I do. Can you like back it up? Like, like, let's do like more progressive disclosure of the complexity here. So I think that that that piece is interesting. The last one that I, you know, I think it's we're still early in pulling on is thinking through multiplayer, where, you know, at some level, like these, [34:33] the abstraction levels and like, because we have this sort of DRI and like ownership area, usually like a chunk of significant work, a human and a couple of clods like that.
[34:44] is still flowing together. But another case is that is less the case, right? Where it's, you know, maybe it's an incident response where multiple people are thinking about it. Maybe it's, you know, a project where there's multiple competing, or not competing, but like, [34:58] conjoining areas that are coming together and thinking through like, what would it mean for, you know, and we have like chat sharing, which gets you a little bit of the way there. But I think there is going to be a need for more like, all right, you've got an independent club that's doing a lot of work that was, you know, kicked off by somebody, but can it be keeping up with all the other work happening on the team? I think that is an interesting and underexplored sort of next frontier about how this work ends up happening. But I think it's really exciting because I think, [35:28] It's the... [35:29] It's the level of teammate, [35:31] collaborator that that the models are now capable of and we're almost holding them back by not having the right abstractions around them for that to happen yeah it makes me think i've i've mostly been using this for my own vibe coded stuff so so i haven't really had to i i haven't really had to think about this but there's there's a problem when you're using this inside of an organization which is do i really understand every part of this and therefore how do i transfer the context of what the model just did into my brain like that's that's one of the big bottlenecks how do you how [36:01] drawing the line, especially with a model like this, around how much you actually need to understand, and how to make sure that you have enough context on what it's done to feel comfortable. [36:09] I think there's like two big pieces here. The first is verification where I became like fully verification killed earlier this year. And now like almost in the same way, and actually it connects to how I think I used to do when I was sort of
[36:24] typing code more full time, which is try to find the sort of tightest dev loop that you can around the idea that you're trying to develop in. Like sometimes with Instagram that meant like [36:33] you know, actually, [36:34] build, making a new build target in Xcode that was just that screen with some sort of synthetic data and just doing that dev loop. And I would mentor newer engineers. Like if there's one thing that I can impart on you, like it is try to get that for any project you're working on and things will go much more quickly. I think [36:49] that is no longer exactly the case here. But I think what is the case now is anytime I set it up, like, how do I get like for every pull request that Claude is putting out, that there is an attached, you know, photo or video, whether that's an iOS PR, whether that's, you know, something in the UI. And that's, [37:06] I think that that helps you gain a lot of confidence because even now, you know, you might have like, you know, Fable go off and do work for a couple of hours and be like, [37:15] it's I'm done. And it's really useful to say, and here's the like full screenshot gallery of the full, right? Because you might say like, Oh, you know what? On screenshot eight, that error state, I've never actually seen it, but I can see how, you know, a person might hit it. Let's actually make that different. And so getting that comprehensive verification, I think, it's something we've been working on a lot internally. And like, [37:33] Sure. [37:33] publishing more and more skills and knowledge about, but I think is really a key piece there. And then the second one is, I think you ultimately as a person still need to stand behind the work that you are doing, especially if you're putting it into a production system, like a lot of people use cloud every day, there's still the accountability of like, oh, that's still [37:49] Claude might have written a bit like, you need to understand, you know, the, the, the, at least the, the general decisions that were made on these pieces as well. And so I have seen a fair amount of engineers actually adopt this practice where like, Claude will have done the work, but then there is like the follow up conversation around, well, can you like, can you, can I make sure I deeply understand like all the trade-offs that you made and, and, and that and whatever.
[38:12] lowercase a artifacts need to be produced in order to make that [38:16] comprehensible is important. It is really interesting though to be in meetings where somebody will say like, oh yeah, and I have this PR ready and somebody else has to like, oh, that's interesting. [38:26] Did you do X or Y and have that moment of positive? They're like, you know what? I'm not entirely sure I will find it before we merge this PR. And that's, you know, I think that adapting to that norm and figuring out and work with that is something we'll have to do. [38:38] Tell me more about the verification loop. It's such a hot topic right now. It sounds like one way that you do that is with screenshots and screen shares, but what are the other ways that you think about that? [38:47] I think part of it, it starts in, can you get to a place where you are, uh, exercising real, like, uh, sort of. [38:55] real flows that aren't just like a static injected piece. And this thing gets more complex, that gets more and more complicated. So we've invested a bunch of like even just [39:04] getting it so that the, the iOS app can log in to staging on a real account and like have real data, but you don't want it to then go through like an eight stage onboarding process every time everybody you're just trying to test like the second part of the screen. Um, so there's a lot of work around like, how do you, you know, [39:22] Is there a special affordance? Is there like some shared secret, whatever that is around getting the, the, the, the like app, you know, to really feel as human, you know, using the product as possible. So that's one, one aspect of it. Um, the second is like this mix of like well-known paths versus the things you're exercising in the exact moment, like the former being really useful for regression testing. And so we don't find the places where we've expressed like, uh, sort of ideal workflows in
[39:47] next basically and the Claude can repeatedly check that and then there's also and Claude does a really good job of this sort of expressing the intent of the current change at hand so that gets really really deeply exercised so I think that the combination of those two things is important the visual verification I mentioned as well um video has been really cool to see actually video is a very underexplored tool to give Claude as well like I think I've been prototyping is uh just giving Claude uh video captures of the thing that it has built and then giving it just basically an [40:17] scrub through and say, oh, this animation has some jank in it. I'm going to go fix that. And I never would be able to do it with like a screenshot sort of latency capture because it will have missed the moment. So I think that's another piece that's really, really important. And for the pieces that aren't [40:32] sort of easily testable intent because there is some more complex system. Getting cloud to go and build like as robust a sort of, you know, mock back end as possible or use ones off the shelf has been also really interesting. Like when I think about artifact, we [40:48] had really comprehensive tests. This is kind of pre LLM. And one of the ways that we were able to do that really robustly was that basically every piece of info we had, whether it was Postgres, Redis, you know, all the AWS things had a really good in memory implementation that you could just do really quickly in unit tests and kind of extending that to like clodland. Now, you know, I was working on something where it had like a pretty robust backend and for [41:12] kind of complicated reasons, hard to spin that up on my dev server, but it [41:16] was able to get on one shot a really good proxy for that by proxying like a substitute for that. And that was so valuable. And over time, it's been interesting as that like, substitute has evolved as the rest of the code has evolved, which is the thing that, you know, if you had pitched that idea to me before, I'd be like, well, that's gonna be really hard because the upstream is going to change. How are you going to keep it in sync? And I don't think about that anymore. I'm like, yeah, Claude will read the changes and it'll adapt the thing and it'll keep the two in sync. And that's, that's fine.
[41:43] there's some really interesting architectures around when you get a bug [41:46] it just automatically goes out and closes it. You know, the agent just gets kicked off, it closes it, and then it sends a message to the customer being like, it's fixed. Are you noticing with Fable any change in how that process works? [41:57] Yeah, I think there's a couple of things like, um, [42:00] on a very like, [42:01] human to human or human to cloud level. One of the things that I've seen it do, um, better other models of the cable, but I just needed to do it really consistently too, is if the bug report, for example, it came from somebody, you know, mentioning something in our like feedback channel and Slack. Um, and then like the thing that got fed into the cloud code session is like, oh, there's this. And because of the Slack MCP, you can actually pull the thread. Um, have it then actually post back, uh, you know, as me, it'll be like, Hey, this [42:28] I fixed it. Here's the you know, here's the pull request. But then I think in the previous clouds, I think it does really well is then say, but hold tight. It's not in production yet. I'll follow up when it actually is. And then like maybe a few hours later, like, oh, like this deploy went out, like you should go test it. Is it fixed now? Like that level of follow through, I think is new on the closing the loop piece. And it's five, I don't think it's long running cloud code sessions are basically like interacting as as me, I guess, but some disclaimer in there too. And the second goes
[42:58] It's one thing to say, there was a bug report, therefore I must go fix this thing. And it's another one to say, you know what, like this, like I hit this over the weekend, one of our internal systems basically had been running without restarting for a while. There was a memory. And it was a good discernment of saying like. [43:15] All right, Mike, like it's the weekend, like just rebounce the server. It's going to solve it for now. And like we'll work on the like, I'll asynchronously get the PR going to like fix this more long term. So I think if you're going to have Claude in the loop in this kind of like [43:27] Sorry. [43:28] close the loop bug report or system sort of issue to change. I think you really wanted to understand where, you know, as any good SRE or engineer in the loop would like, great, let's solve the problem at hand. Let's like defer the question of like, do we need a re-architect on top of a completely different language found and understanding that balance is really important. [43:48] One of the things that's like really exciting, mostly exciting to me about new models is it raises the floor so that everyone can kind of go build apps in one shot. But it also raises the ceiling for experts. So like if you're a software engineer or founder, you can just go do things you never would have been able to before because you have access to this really powerful model. So for me, I built this one shot version of Borges' Infinite Library, like a 3D game version of the library. It's wild. It runs right in the browser. It's so good. I can find like any every essay inside of it. I'll send you the link. It's sick. [44:18] But I think there's going to be this flowering of people doing things like, oh, I made a game or maybe I trained a new model or whatever that they couldn't do before.
[44:30] to give people some inspiration, some examples of things that they might be able to do that they might not be thinking to do with this model. What are some ideas that come to you? Yeah, I think a few. Maybe I'll start with the fun side and like riffing off the game piece. Like, I think people have a lot of creative ideas for how to express the complexity of what they are, like their world, like everybody has the thing that they know really, really well. And there's probably some level of like, how do I then explain that to somebody else? Or how do I apply techniques [45:00] go off and do. My wife is studying like environmental engineering, like studying geothermal, like very complex math and simulations. And I've seen like, as the models have gotten better, she has been able to apply even more complex techniques from even outside of that domain into that work. And I think what people should be able to do, you know, like full on PyTorch end to end simulations of that work in a way that wouldn't be possible. I think that maybe is one is like, bring the like beautiful complexity of what you have and either show it to other [45:30] or maybe making a visualization, which I've seen her do as well, or at least like make, you know, bring other techniques to bear. And the second piece is, [45:38] It's ability to compose software that like solves a really unique problem to you. I've seen that internally. A lot of the work that we've been doing is how do we get [45:47] as many of our internal systems like MCP, if I had with the right permission structure and the right deployment kind of set up, although externally, you have good options around some of these like platform as a service pieces, and you can just ask a lot about them and they'll like help you set things up. But like,
[46:01] I love that feeling of like that thing that you always wish that you had. And then what has blown my mind, there was a person who works in our go to market organization has been like building this like really like for deeply thought integration of cloud into every part of her whole process and [46:17] you don't have to stop at that one shot. Like, she's been working on it for months now, and she can keep going. And, like, I think one of the things that is maybe underappreciated about the models is I think in previous generations, they would eventually get... [46:27] to a complexity level where it was hard to iterate on it without feeling like you then would break the thing that they had, you know, like under or over abstracted. Whereas this is actually, you know, she's had access to something Fable or Fable like for a couple months. And like, you've just seen it keep growing and growing and growing and growing. And now she's like deploying it to the whole GTM org. And like, I think that is really cool. Like the ceiling of complexity that a person that does not start out as technical can now build for solving problems within their domain is like, is unprecedented. [46:57] My benchmark that I have is called the senior engineer benchmark. I just have it see if it can rewrite a code base from first principles. And the nearest model, the previous top was like a 62 or 63 out of 100. And this model got a 90 on the benchmark or 91, which is human senior engineer level. You can just keep going with this thing in a way that's really fantastic. I'm curious, though, one other thing that's really powerful that you mentioned is dynamic workflows. Tell us about that.
[47:27] I simply bugged the engineer who built it and be like, when are we shipping this publicly? Because I think people are going to really like it. I think there's a good reason why it was like built internally, but like we try to ship as many of these as possible. And dynamic workflows was like definitely that to me. I, the person who built this is an engineer named Sid, who's awesome. And I was like, Sid, like I want to get this out into the world because it's so good. But I think it's especially good with a model like Fable for two really big reasons. One, it helps sort of create the scaffold for like deep, meaningful work. [47:57] I did and used Fable for was I had an internal project that we had written in Python, but we needed it actually in TypeScript for like a really specific deployment reason. And having been internal to Instagram, we were like, should we write the whole thing into hack and, you know, port it to the PHP engine that Facebook, I was like, you never would have done that. Like, maybe they can now with the model. But, you know, at the time, it seemed impossible. But here I had, you know, pretty complex code base. And I was like, I'm just going to set up a dynamic workflow and just let it run over [48:27] And it did. And the workflow was so cool. It was like, all right, I'm going to do like deep understanding of the work. I'm going to create sort of like a, almost like a spec of how everything works. I'm going to go module by module. I'm going to translate these pieces. I'm going to have tested incrementally. I'm going to do another adversarial test. I'm going to go check for anything that I missed. And it was just like really cool, like series of steps that the workflow was able to, to orchestrate. And I came back and I was like, yeah, this thing is like. [48:53] TypeScript and Bund port of that thing. And it's actually better in these ways. And it was very, you know, sort of documented, like these are the things I couldn't port, but most of these were like
[49:02] very specific to the specific implementation. It wasn't worth porting. And I do not think you could have done that, A, with previous models. [49:10] at that level of success and B, without like the kind of scaffolding that Orgplos provides. So I think that is extremely exciting kind of combination of model capabilities and then our own ability to like orchestrate them over longer and longer time horizon with that feeling of like, [49:29] And you, you had a goal, you broke it down effectively, and then you were able to make it work. The other piece is, I think over time, we'll be able to also make some of those subtasks sort of tuned to the, have the model be tuned to the level of complexity of it. So you can imagine that some parts of the dynamic workflow don't need extra high thinking. They could use, you know, a medium thinking to get it done or even a smaller model. And I think that's really the future of where these things are going. So, yeah, I, I'm a huge workflows DA here. [49:59] Tell me about how you got that workflow made. How did you design it? How did you make sure it was good? [50:04] Yeah, it was-- [50:06] Pretty iterative, but sort of just started with cloud code like, hey, I have this complex, you know, kind of task, like, let's design a workflow to go and do it. It kind of showed me the plan. I was like, Oh, this is like close to what I want. I want to make sure that you do these three or four levels of, of like additional verification for missed features. Here's what you have, you're ready to go and it expresses the workflows in code, which I think is really valuable to see what it was about to do. And then what was interesting is it did the full port. And then I had like a couple of like follow up kind of questions that I had or like,
[50:36] like little tweaks. And I did those as sort of like mini workflows that built off the previous one as well. But I think that's like, [50:42] uh you know we talked a little bit about whether chat was the was the right interface so we've had that conversation over the last year and i think um workflows are a good uh middle ground of uh you can compose them using chat but they're expressed using code and then they're executed with like i think a nice clean ui around what's happening at every stage and like i think we'll start bridging longer horizon work with chat in ways like that over time mike this is such a great conversation thank you so much for joining and telling us all about this new model i'm really excited to get to [51:12] and really, really looking forward to what people think outside too. [51:15] Oh my gosh, folks. You absolutely, positively have to smash that like button and subscribe to AI&I. Why? Because this show is the epitome of awesomeness. It's like finding a treasure chest in your backyard. But instead of gold, it's filled with pure unadulterated knowledge bombs about chat GPT. [51:45] edge of your seat, craving for more. It's not just a show. It's a journey into the future with Dan Shipper as the captain of the spaceship. [51:54] So do yourself a favor, hit like, smash subscribe, and strap in for the ride of your life. [51:59] And now, without any further ado, let me just say, Dan, I'm absolutely hopelessly in love with you.
Want to learn more?