
Brian Chambers: Scaling Chick-fil-A's App, Kitchen, and Edge
Beyond the Noise
About the episode
In this episode of Beyond the Noise, Matt sits down with Brian Chambers, Chief Architect at Chick-fil-A, to unpack what it actually takes to run modern digital ordering and high-volume restaurant operations at scale. Brian walks through the company's evolution from dial-up connections and shift-triggered "daily transmissions" to today's cloud-era systems, where millions of customers, real-time operational signals, and kitchen workflows all collide.
Then we head behind the counter, into Chick-fil-A's edge computing strategy: why certain workloads have to live inside restaurants, how they design for reality when connectivity flakes out, and the deliberate trade-offs they've made to keep complexity from turning into a full-on kitchen fire drill. They also get into the spicy side of observability at the edge (including the time too much telemetry broke credit card processing). Finally, Brian shares his current focus: AI tools as productivity multipliers, and the often-underrated discipline of "radical simplification" in modern architectures.
[00:00:00]
Matt Klein: All right, everyone. Welcome to another episode of Beyond the Noise Signals, Stories, and Spicy Takes the show where we dig into the stories of the people shaping the future of app-based computing with a special focus on mobile. I'm your host, Matt Klein, co-founder and CTO of bitdrift, as well as the founder of Envoy Proxy.
Each episode we'll talk with engineers, founders, and technical leaders who transformed the way their companies build and understand what's happening inside their systems. We'll dig into the challenges, the breakthroughs, the lessons learned, and we'll wrap it all up with their hottest takes. So let's dive in.
Today we are thrilled to have with us Brian Chambers, who in his own words, "is a guy who does stuff and sometimes it works." He is the Chief Architect at Chick-fil-A. He's the co-founder of Edge Monsters and the Chief [00:01:00] Architect Network, and he also moonlights as the CIO and CTO for a nonprofit doing
community economic development, using business to help alleviate poverty in Latin America. Brian, welcome. Thank you for coming.
Brian Chambers: Hey, Matt. Pleasure to be here.
Matt Klein: I'm excited for this episode because I think, a lot of people out there may not know about all of the cool technology that a company like Chick-fil-A does.
And I, I think it'll be exciting to dig into that. So thank you again. How I typically start is I would love to learn a little more, um, about you and, and how you got to where you are today. And you're a, you're an interesting case, I think in the modern world of tech where people tend to move around every couple of years.
You've been at Chick-fil-A, I think for almost 20 years.
Brian Chambers: Yeah.
Matt Klein: Um, and you're now the Chief Architect. So I, I'd just love to learn a little bit about your journey and how you, uh, got to where you are today.
Brian Chambers: Yeah. [00:02:00] For sure. Well, happy to share. So... i've been at Chick-fil-A for, uh, it's coming up on 22 years actually.
Matt Klein: Okay.
Brian Chambers: Which is crazy.
Matt Klein: Yeah.
Brian Chambers: And yeah, I know that's abnormal. I started straight outta college and... I was- I've been thinking about this, reflecting back a little bit lately. I think I kind of came outta college like... Not really being good at anything. And, I could write some SQL queries, maybe, maybe do like a little, uh, little programming.
And I think the early part of my career was sort of just like figuring out who I was and how to do stuff and, um, and, and build a lot of confidence. And, and so a lot of my early roles I think were just like getting exposed to the real world of technology. And like in hindsight, I would say I, I think I had enough...
humility because I felt some of those things I mentioned, like I didn't, I was, I didn't think I was really great at anything. Um, I thought maybe even like, I wasn't good enough to do this as a job and I should do something else, early on, but. I think that let me learn from a lot of experiences, early in my career.
And I [00:03:00] think I, like we could go into the details if you wanna know more about what I did early, but I think for me, like the pivotal time was, probably, you know, 10 years in, um, that's about when kind of our modern cloud era started and I- I was really lucky, I would say, to be in a position where I got the opportunity to be part of leading the way that we, embrace cloud and, and a lot of the other things that come with that, at Chick-fil-A and helped shape the way that we approached it.
You know, it was conveniently timed in that we had to solve some new problems at Chick-fil-A, that everybody's familiar with, like... going into digital, order channels and things like that, having the Chick-fil-A one application. So we had some new, new problems that made cloud make sense and we got to think about like, what does it look like to do things differently in the company?
And so I was kind of like, well placed to, to be part of that. And, I think the- I was part of the architecture team that we had kind of like a centralized architecture approach at at Chick-fil-A. It always has been since it's existed. I got to be a part of leading that and I think it positioned me well.
I, I got to develop as a person, I got to learn a lot. I got to do a lot of hands-on things that I had never had the chance to [00:04:00] do in my career before. And maybe just like, wasn't really into... coming up. I don't have the, like, computer tinkerer, backstory, like I was more of a outdoorsy, sportsy kid. But, I think I became a nerd in, in that time range really.
And, and got like really passionate about how the tech worked, you know, bottom to top. And, invested a lot of time myself, in figuring things out both inside and outside of work. And, anyway, all that I think positioned me to be in a good spot, when there was an opportunity to, to be, become the leader of architecture for the organization, to be a pretty good fit.
And, and so certainly feel still lucky and, fortunate to have had that opportunity. But I think I'll... if, if I could say a couple things I did well, I think it was just like investing in myself and, and spending time learning things I didn't really have to learn because of curiosity and like ultimately that paid off to a large extent.
So, feel free to dig in anywhere you want.
Matt Klein: Yeah, I mean, I think, that's a, that's a great story. I think you know where I'd, I'd love to start is I think people out there, I think many of them have just never thought quite honestly about how much technology, [00:05:00] at least in the modern era, is involved in actually running a restaurant chain like Chick-fil-A, I mean
Brian Chambers: mm-hmm.
Matt Klein: There's a huge amount of technology and if you fast forward to today, just from an end user perspective, my, my wife is actually always complimenting Chick-fil-A. She's like, 'wow, you know, it's like you got the app and it goes in and you go through the drive through line and you know, everything is all magical' and all of that.
And I think given that you've been there for so long, maybe on, on the way of taking us through some of the cool technology that has been built. I think you keyed on the first thing that I'd love to learn about is you were there, I think through a transition, right? Where things went to being app driven, like people now expect there to be an app and to do online ordering and all of those things.
And that obviously was not the case when you started, right? So, even just to rewind a, a bit like from maybe some of the stuff that you worked on early on and I guess, [00:06:00] you know, like you were there for that transition, right? From that transition to, I would say, you know, like typical restaurant style operations where there's still certainly technology involved, but to the modern era of like, as you said, like digital ordering and apps and all of those things.
And, you know, I, I would love to learn a little bit. Even going back in time, even 10 years where you were saying that you got the opportunity to actually, you know, start working on some of the cloud focused digital initiatives, I guess, like what, what were the problems back then, you know, that people were actually attempting to solve, you know, and then maybe that's a good way of starting to dig into some of the technology that's actually been built.
Brian Chambers: Yeah... a lot of problems. I mean, so if I go back to the beginning of my career at Chick-fil-A, these are are fun stories. Well, back then we, we didn't take credit cards, in stores. It was all cash and, uh, checks, if anybody remembers what those are.
Matt Klein: Really?
Brian Chambers: So 22 years ago,
Matt Klein: even in the, even in the timeframe that we're
talking about, yeah.
Brian Chambers: 22 years ago, no credit [00:07:00] cards. Um,
Matt Klein: wow. Okay.
Brian Chambers: Dial up internet at all restaurants. Um,
Matt Klein: amazing.
Brian Chambers: So if you wanted, uh, to get data from a restaurant about something that they entered, which we have to, for a bunch of reasons, always have... that, that required, like them double clicking on an app on their Windows desktop, and then you know, the modem noise.
All that fun. Uh, and then like, it would send things across using, believe it or not, I don't know if you ever worked with Sybase databases, back in the day, but there were Sybase databases in the store, and then a consolidated version actually in our office, office building. And so we would like replicate, you know, do... outputs from the transaction, logs from the local databases, ship that stuff up and apply it to a consolidated database.
And then like ETL, downstream from there. So that, that's like the lay of the land. From a restaurant's pers- uh, systems perspective when I arrived was a lot of like client servery type stuff, but the, the way that data moved around the organization was like over dial up internet connections, you know, to the data center on the second floor of our office building, um, which is crazy.[00:08:00]
Matt Klein: And, and would people, like, would they do the dial up every night or were they instructed to like every hour they would do a sync job or something like
that?
Brian Chambers: They did it, uh, they did it certain times of the day that were usually triggered by like the end of a shift. So, they would like, they'd run the, they, we called it the daily transmission, though.
So they would run the daily transmission, and, and send all that stuff up. And then, uh, one of my early roles at Chick-fil-A was actually... I mentioned I was pretty good with like SQL and databases and stuff. So that was a big part of our restaurant architecture. So that worked out well. But I also did a lot of like, escalated support stuff, in my first couple years.
And so if you needed to do something like on the computer, like... check something out. There was no real observability, right? There was no real, uh, easy way to get a view. So you actually had to call the operator and ask them to run the daily transmission while you ran a ping against their ip. And when it showed up and started responding, you're like, boom, you know, remote desktop.
So... pretty, pretty fun, old school. But yeah, that, that's kind of the world that we came from. And so kind of, you know, moving towards [00:09:00] what sorts of problems? I mean, it was like... We wanted to be able to take credit cards 'cause obviously that's where, where the world has gone. And then like as we've stepped into, you know, the, the kind of 10 year ago, 10 or 12 year ago, timeline.
I think the big thing is... as mobile became a thing, was you wanted to be able to engage customers in the way that, that they're thinking and, and what's normal for them. And we, I wouldn't say we were on the leading edge of that at all. Like we came a little bit later, but we wanted that... app experience, I guess, to be like a good representation of, Chick-fil-A's brand and to be a, to be a great experience, to be an easy place, to order, to be a, a place for restaurant operators to be able to connect with their customers and, and things like that.
And so, a big, you know, driver of that was just like that, that was a new kind of scale like we dealt with... thousands of users typically, at most. And when you get into the, the world of customers, you know, the, the scale need goes up dramatically into the millions, of users that want to use your app.
And so we had many times that we did things that I'm sure you can go back and find articles, but we, we tried to do things [00:10:00] with old paradigms and they broke. And embracing cloud, was where we were able to successfully scale. And, you know, everybody has their issues still, but, been a much smoother, uh, journey,
ever since that. So, those are some of the big things I would say that that kind of stand out.
Matt Klein: But I would imagine too that, you know, there's, there's two sides, right? There's obviously the, the customer app side of things, like the people that are doing their digital ordering and all of those things.
But I would imagine too that over time, you have another side of it, which is you're investing in the technology within the restaurants themselves. And I, I don't, I don't know much about this, but, you know, you, you started out by talking about how you don't take credit cards and people are uploading data on a dial up modem and all of those things.
I would imagine that there are... there are things that can be understood and glean from having more real time access and understanding what's going on in like the back of the house situation, like what's happening in the kitchens or in the operations and all of those things. [00:11:00] So I think, you know,
did, did like, do these efforts happen in parallel or when you're talking about kind of like modernizing the entire infrastructure, does it all go together in terms of how, you know both customers are interacting with the system as well as how you know, the employees are actually interacting with the system?
Brian Chambers: Yeah, I mean, I think it's been a journey for us. So it's kind of like lots of small steps forward over time, that have, have changed the system. So it wasn't like there was one dramatic moment where everything changed all at once and forever. But I do feel like... you know, we, we had a period where we talked about like mobile first, just meaning like, we're gonna contemplate should an experience be mobile over, you know, a traditional web app that, you know, this is obviously after we moved away from a lot of the client server stuff, but in kind of that modern era, we had a lot of web apps at Chick-fil-A, but we started to ask questions like mobile first.
You know, for- in restaurant systems as well as staff facing things, you know, for our corporate staff. And then we kind of got into this chapter that was asking ourselves like, sort of like cloud [00:12:00] first, you know, paradigm. So I think we've kind of gone through some different periods where we've, we've kind of changed fundamentally how we're thinking, but a lot of these systems have just sort of... I'm, I'm thankful Chick-fil-A is a company that, embraces
like technology opportunities, as they come. And again, maybe not always the, the first to do everything. And I don't think that's even necessary, but I think we've embraced a lot of the paradigm changes, not with like, we're holding onto, you know, the mainframe world and we're scared to touch it, but I feel like we've modernized, pretty intentionally, over the years.
And, and, and like it's happened periodically, like step by step, not like all at once or we, we change, you know, completely the way that we operate or anything like that.
Matt Klein: Yeah I mean, you keep talking about that you, you weren't the first to do a lot of these things, but again, I'm speaking from a customer perspective.
Brian Chambers: Yeah,
Matt Klein: I, I mean, I obviously don't know what happens in the back of the house of the restaurants, but from a customer perspective, your technology is very good. I mean, at least compared to other competitors that I might go to. And I think what, what I wanna [00:13:00] understand from that is... do- like, do you think that there's some benefit in being a little bit later to the adoption curve and like, I don't know, like understanding how other people have done it and, and like how they failed?
Or do you think that you all have just executed very well? I guess what I'm asking is like, how, how do you think that, that you have wound up with a system that that works well, I
guess
Brian Chambers: yeah. I, yeah, I think it's a number of factors, Matt. Like I, I think there's an element where you, you can, get excited about the technology because it lets you do something, but like
organizationally, or customer adoption wise or different things like that. Like the timing may not be right. So I've never felt like we have been quick to rush into things that we could do and that might be awesome, but that we weren't fully ready to do. So that means like other people have, you know, been ahead of us in, in some of these digital areas.
But that's why I keep saying I think it's okay. I think what [00:14:00] we, we did well is we, we've acted at the time when we had the right organizational support for whatever the thing was that we were doing. Like people were really bought in and were working together. So you didn't have a lot of, I don't think we had a lot of like, obstacles.
I mean, there's always disagreements on how to get things done, but I don't think we had a lot of the obstacles, you know, that could exist. And, and we did get to learn some things about, you know, the industry, or you know, what- what made sense to do? It could have been customer, you know, learning from other people's customer experiences, or it could have just been, internal facing things for our operators,
having more time to get feedback, having more time for things to develop and be able to tell stories about what could be, and kind of aim at the right stuff. So, I think some of it is just like a little bit of patience. It's, it's still sometimes risk taking too. Like it may or may not work, but I think the patient side of it, you know, reflecting on it, I, I think that can be really valuable.
Just like, wait till it feels like that time is right, as opposed to needing to be on the front because somebody else, you know, did a thing. And, and I see that sometimes right now, I guess with, some of the AI stories that are out there or like, you know, huge, computer [00:15:00] vision deployments or things like that where, you know, maybe, maybe those things would be really great.
But, I don't know that like, rushing into some of them, especially when they're really big investments is always like the right thing for us. So I think we've been good about are we gonna get the value out of what we're doing? And then execution wise, like I think we've done a pretty good job at at that, I would say just as an observer.
Matt Klein: Yeah, I, I mean, I don't know personally much about the restaurant industry, but you know, I do recall reading an article recently about, I, I forget who it was, you know, someone was trying some AI ordering system and like it didn't work and they had to roll it back or something like that.
And I don't actually wanna get into that, but I think it's more... I think from a technology perspective, there's always a balance that we have to strike about working on shiny cool things like versus what's doing actually right for the customers. Um,
Brian Chambers: yeah,
Matt Klein: and it's, and it sounds like you all have, have struck a pretty good balance there.
Brian Chambers: I feel like that's true and, and we have been like maybe on the front end at times. And, it's always interesting to see what kind of [00:16:00] feedback you get when you do that. Like, an example would be one of the things I'm, very, was very close to, which is like our edge computing, you know, cube at the edge, in the stores architecture.
And like, there weren't a lot of people talking about doing that sort of thing at the time. And so, we were kind of on the front end and a, a lot of people have said they did things that were inspired by, you know, what Chick-fil-A did, in that architecture. That's cool. We did that because I think our business had some challenges, you know, operationally that we thought
that could help us solve by getting more connected things, by getting more real-time information to really busy operators and team members. And we have some, uh, you can find all this stuff publicly. Like we have a bunch of, volume challenges per restaurant at times because we do a lot more, you know, sales per physical
restaurant, you know, site than anybody in the industry by quite a bit. And we do that in six days a week instead of seven. So like, we're really busy and that creates certain challenges that make sense for us to think about addressing with technology where, I don't know, I could imagine maybe there's others where [00:17:00] they don't have volume problems and this would be crazy for them to bother investing in.
Like it just wouldn't be necessary. But, for us, those kind of things made sense. So that's why I say I think we're willing to be on the front end. With technology when it makes sense and when it's solving a problem that we feel very strongly that we have and we have an idea about how to solve it. But, but then other times we may be, you know, we may trail a little bit and I think both of those are perfectly okay.
Just gotta, kind of manage the situation and, and see what makes sense.
Matt Klein: Yeah, for sure. I mean, I, I would love to dig into that more, you know, for, for what you can share mostly because I, I think many folks that are out there listening, they are used to, I'll use air quotes, modern architectures where, you know, you've got some central system running in your cloud somewhere, and then you've got some apps, right?
Like it's a pretty typical architecture. It's like a
Brian Chambers: mm-hmm.
Matt Klein: Hub and spoke with your giant fast networking that's running in the middle, and then you've got some problems out. I think, you know, what you all are building is different than what a lot of people are used to, right? Like you obviously I'm sure have some services that [00:18:00] run in a central location, but you also have, what you just said is you have complicated systems that are running in each store.
So your architecture is actually substantially more complicated than what a lot of people are dealing with in terms of like a central technology hub. Then you have like distributed technology centers that might not have good connectivity. Then you have people on apps. Um, you know, for, for what you can share, just to tell people out there, I, I'd love to learn more just about the technical challenges.
So like when you, when you started to talk about, you know, you did this architecture where you're putting the, the, the cubes in the restaurants, would love to learn more about like what were the operational problems that you were trying to solve and, and, and like, what was that geared towards?
Brian Chambers: So for like from a business side, what were we trying to solve or
Matt Klein: Yeah, sure.
Or I, I mean, yeah, sure. I mean, like, we're all obviously, or at least we should be solving business problems. Um, but like you were applying technology to solve those [00:19:00] problems, and I'm just trying to learn more about, like what, what were, what were the technology issues that were blocking the business, you know, that caused you to invest in that.
And I think just to help educate the people out there. More on, on, you know, basically what I was saying, which is that your architecture is different, I think, than what a lot of other people are used to, right? So I, I think just for people to learn a bit more, like what are the technical challenges, you know, and the business challenges of trying to operate these things, I think would be very interesting.
Brian Chambers: Yeah, sure. So maybe like I can just quickly say like, some of the things are the same, so, we'll, we'll talk about the edge computing footprint and what we try to solve there. But, you know, like, I think it's really important for people to know, like we don't run everything out of that restaurant because it exists.
We have tons of things that are, you know, cloud deployed applications, SaaS applications, like a whole bunch of stuff that helps our restaurant operators and team members and things like that. So, you know, we- we intentionally built this edge [00:20:00] footprint, kind of small and fairly scrappy on purpose because the goal was we're trying to deal with a really small subset of challenges that exist inside of a very busy restaurant environment, and especially inside a very busy restaurant kitchen environment.
So most of the energy was really about, I mean, we have like front of house we would call it, like where we're serving customers. We have things that, you know, need to interface between front and back, back being the kitchen. But like a lot of the energy was really about, building more restaurant capacity, by trying to optimize the existing restaurant footprint that we had.
So you got a certain amount of square footage, you got a certain amount of equipment, you can fit a certain amount of humans back there doing work. And then after that, like those physical constraints, the only way you can really improve is if you bring different systems to bear or you juggle the way that those people perform actions, you know, change the workflows, things like that, right?
So, that was really the goal of the technology, was to help make that kind of change happen in a really busy environment, in a really busy [00:21:00] kitchen environment. So some examples, uh, that we've talked about, you know, publicly are things like, you- food quality is really important, and food safety is really important to us.
So, taking some of the, the cognitive load off team members when it came to, uh, hold times, you know, for product. So, that's a solution, like a business solution for a real problem that we rolled out, eight or nine years ago, to the chain that sat on top of this Edge computing solution. So I dealt with like, chicken hold times specifically for our different products.
There's other things, that I can't share as much detail about, but other things that are really about optimizing the way that we, make product. 'cause we try and make things, as quickly or as close to the customer, ordering them as possible. Without ending up with a really bad speed of service, right?
You don't wanna wait five minutes, 10 minutes, 15 minutes, you know, for your food. You wanna be quick. So there's this balance, delicate balance, right? Of product quality, food safety is like a way downstream concern, but product quality and food safety with, um, speed of service. And so I think, those are the kind of things [00:22:00] where if you can help humans who are working at capacity as hard as they can, as quick as they can, but having to like mentally juggle a lot of stuff like.
You know, they don't see how many people are coming into the dining room and how many cars are in the drive through. So none of those like signals you might be able to get if you were kind of floating above a store, so to speak. Those people don't know that they're really, going off of a screen that's telling 'em, here's what you need right now, and then off of their history, and then what people are yelling to them that they need to do, you know, because we're, we're holding on fries or whatever.
So really a lot of it is about making a better... human experience for a worker in the back by taking away some of the things that they would need to be thinking about and sort of like storing in memory in their brain real time as they work and trying to take those and put 'em in a place where it's like, you just think about this.
Here's what you should do right now. Here's what you should do next. They can always override it. They can always respond to other things, but really trying to help, us improve our operational processes by bringing technology as a helper, helper alongside people. So that's kind of the business challenge side of it.
Happy to talk about [00:23:00] some of the tech challenges that exist when you do that, but, um,
Matt Klein: yeah, I mean, would, yeah, tha- thanks for the context. Would, would, would love to dig into the tech talent, tech challenges, and I think one thing that comes to mind first, which would love to learn more about is, as you said already, you know, you, you, you have, services and stuff that runs in the cloud.
I would assume, and you can correct me if I'm wrong, that you know, you want some code to run in the restaurants because you're worried about connectivity issues or reliability issues or those kinds of things, and, and that's actually what I'd love to learn more about.
Brian Chambers: Yeah.
Matt Klein: Um, like, like how do you decide, right? Like which portions of the experience run in the cloud, which things run locally?
Because again, like, and I'm just speaking from an architecture perspective, I would imagine... the, the more stuff that you run in an autonomous basis in the restaurants, it adds a lot of complexity because now you have to sync stuff back and forth and do all this stuff, right? [00:24:00] So I mean, it seems like you'd have to be really intentional about like what code you run there, versus what you run up in the cloud.
I'm just trying to learn more about
Brian Chambers: Yeah, absolutely.
Matt Klein: How do you make those decisions? Like how do you think about it?
Brian Chambers: Yeah, I mean, I, I think the easy first thing is, we would say default to cloud if you can, and, and if you can, is gonna be a function of ultimately user experience, right? Like, there's some things that, that we always, are just gonna do in the cloud, like the, like, web application stuff that people need to do, like payroll related things, and all these kinds of services, right?
It just makes sense just default to the cloud on those. Then you've got things that are like, you really need the cloud to do them, like processing payments. You know, you can't process a payment, fully locally, because it typically goes back to some, you know, credit processor, Visa, American Express, whatever.
But you can do things like store and forward, you know, if you're offline and process later and you know, and you decide if you wanna take the risk of it could be declined and you gave something away. Like, that's just reality. So we've got, you know, situations like that and you got [00:25:00] things that like you just don't ever want them to not work.
Like point of sale. We've always been very intentional that we don't have, cloud dependencies or really out of restaurant dependencies on point of sale. There's been some, but they always have like payment has, you know, stored- store and forward. If you have something that's a real time dependency outside of the restaurant, there's typically like a
graceful degradation state, like something stored locally that may not be quite as good, but good enough. Those kinds of things. So, there's sort of this spectrum. Our thought on a lot of the things that we did in the kitchen was we're gonna be dealing with a lot of data and a lot of signals coming from different places.
So that's like computer vision, small computer vision deployment, not like we're looking at everything everywhere, but small computer vision deployment. A ton of iot connected devices. So like pretty much all of our kitchen equipment is connected and sending out, uh, data in real time, pretty much always over MQTT.
So like our edge is a hub for that. So like our thesis with the edge computing stack that we built was, if there's things that [00:26:00] are truly going to move the needle on our restaurant's capacity and we- and if we're gonna do them, they should be important enough to the people that if they're gone, that's a problem.
We probably don't wanna depend on, you know, connectivity being available, you know, round trip connectivity being available. And we wanna be able to run those things like very quickly with low latency in any condition, you know, where the restaurant has power basically, because they're like food production, you know, running the core business related.
So I think that's kind of like the spectrum in terms of how we think about it is... can it run somewhere else and get tolerable user experience? If so, then that's, that's probably pretty good. But some of these things that are core to the operations, taking orders, you know, taking payment, you know, running kitchen operations, doing things in the drive-through, like those are things that we wanna make sure those always work because that's like, not only is it user experience for the person behind the app, but like it's customer experience stuff, right?
Like if our stuff breaks there, you come to Chick-fil-A and you don't get a good, experience it- we, we have to turn you away or you know, we have to wait a long time or something. And that's what we're trying to [00:27:00] avoid.
Matt Klein: Yeah.
Brian Chambers: So I think that's how we think about it from a logical perspective.
Matt Klein: I mean, lots of people love to talk about their disaster, you know, different mechanisms that you know, that they use.
And I, I, I think a lot of people, talk about them, but don't actually test them. And one thing that I was thinking about when you were talking about all of that is like, do you regularly cut the internet to a, to a restaurant to see if it works? Like how do you, how do you know if it's actually going to work if you lose connectivity?
Brian Chambers: Yeah, we don't do it to real stores 'cause it's disruptive, especially with like digital orders. And now, like connectivity has become increasingly important I think in the, the quick service restaurant business compared to historically. And so we've, we've got like backup connections now too, but, you know, they're slower usually cell or something like that.
But yeah, we've, we have, you know, tested those types of conditions, plenty of times and they happen to us in the real world, like all the time,
Matt Klein: right.
Brian Chambers: Um, you know, stores are, are offline at any given time because a fiber line, you know, to the store got cut by construction or whatever. So, we get [00:28:00] the, uh, kind of the chaos engineering, done by the world instead of by us on a pretty regular basis, across, you know, 3,500 plus, locations.
But, I, I did wanna go back to something you said, which is... the word complexity. And, I do think, when you think about what happens at the edge, whatever edges, you know, for, for a listener, it could be our kind, where it's sort of like on-prem to a restaurant, or it could be like on mobile device or whatever.
I do think the further away you get from like the central point of control, you know, the, the, the more... your complexity can kind of compound in cost and like difficulty to support potential for things to go wrong. So like while people might say, well, you know, you obviously aren't thinking about complexity if you're running Kubernetes clusters in 3,500 locations.
I would say we definitely are, like our approach to that was really, like very strongly opinionated about keeping the footprint minimal. Like the reason for kube being there was great open [00:29:00] source ecosystem around it and we needed to schedule containerized workloads to run across multiple nodes. Like we didn't need to do a lot of really cute stuff.
There's some great things built in, you know, that we can help with - secrets management, and other stuff, but like, we didn't want to make something super complex. We don't do much with like, like data persistence at the edge. Like, we don't make a lot of guarantees. We do a lot of caching and like, hydrate, you know, like rehydrate from the cloud.
Send things out when you can. Like we made trade-offs. I think this is like really hard for people, but like, we made a bunch of trade-offs that I could talk about, like no service levels on durability of storage at the edge. We don't do, we don't do, like a distributed file system, at the store. No Ceph, no Rook, no Longhorn.
There's a lot of things that we intentionally decided not to do. To minimize the complexity of operating a solution like that. And we were able to find nice trade-off spots that we could land by paying attention to what the business really needed and then thinking really hard about like, yeah, it'd be really cool to have a story about running 3,500 Ceph clusters.
Somebody would probably wanna hear about that, but like, we don't wanna do that because [00:30:00] that actually probably creates more of a risk than a reward. And so, we- I feel like we made a lot of smart trade-offs in our solution that maybe they'd be embarrassing to talk about in the cloud world, but like for us at the Edge, I think they were really smart things to, to settle on that,
that's a good example is the persistence thing.
Matt Klein: Yeah, I, I mean, I'm a, I'm a big fan of keeping it simple. And, at least from my own perspective, I, you know, I think it's kind of interesting. I think, I dunno, people love to complain about Kubernetes. But I, I, I think it's matured a lot. I don't, I don't think it's actually particularly difficult to use, at least from my perspective.
And if you're doing what you're saying, you know, and you're keeping it simple and just using some stock stuff as a container scheduling system, it seems it, it seems pretty straightforward.
Brian Chambers: Yeah, it really
is.
Matt Klein: Yeah.
Brian Chambers: It's not that much to talk about. It's pretty lightweight.
Matt Klein: Yeah. You know, I, I mean. One thing that I, I think has surprised me and I wanted to know if it's still true 'cause I think I was talking to someone who is working at like banking, and I'm sure it [00:31:00] applies, applies to your situation as well, is that you would think in 2026 that the connectivity in
the type of stores that you're dealing with would be very good, but my understanding is that that's not, not actually true. And like why, why is that? Is it just, that like getting commercial grade connectivity in these random locations is just not possible still and that's why you have to do all this backup
stuff with different types of connections? I, I don't know, just, just, I actually would love to learn more about kind of the realities of like, getting connectivity to these places. And then I, I, I think part of what really interests me is, you know, as you go in this hub and spoke model, obviously from a cloud perspective, you have like a central operations mechanism.
Like you're almost operating effectively like three to 4,000 little mini data centers, right?
Brian Chambers: Mm-hmm.
Matt Klein: You know, so it's like you have to monitor those hardware breaks, you've gotta [00:32:00] send someone there and swap out the thing. And then, I don't know. So just with, with what you can share, would love to even learn from an edge computing perspective, like how you deal with the connectivity aspect.
I would imagine, based on what you said that if you have a major hardware failure in one of these restaurants, it could be like a local restaurant ending event until it's fixed. You know, it's like how do you deal with those aspects? Just because, like, you can keep the software simple as possible, but if you have a catastrophic hardware failure, you're also in a bad situation.
So, I would imagine that you have some level of redundancy across connectivity and compute and all of those things. Just would love to learn more about that.
Brian Chambers: Yeah, great question. Um, so the why connectivity thing I, I'm not an expert on, but a, a couple, postulations and a few things I've obs- observed. One, I think like new places where, development is happening, like, new suburbs, things like that, you tend to see like
good connectivity [00:33:00] come a little bit later. Like sometimes if there's not already a big footprint, you may not have fiber to the area. So, that's one thing. Like early in lifecycle, I think it's still very difficult. It's become a lot easier, I think for us to get good quality connections, you know, across the vast majority of restaurants.
But you still have like those events that happen, like we already mentioned, like the local event where somebody cuts the fiber line or maybe it's that your in restaurant, you know, fiber infrastructure part, like the, the thing it connects to that can break. You know, you could have the sort of cascading failures where something happens at a regional or like more of a local, hub, you know, with one of the providers
and so you have an outage because of that. So like, it still, it still happens and it happens all the time. And I think it's just a function, you know, just like, I think you and I have talked about with mobile devices before, it's like you think it's all going well, but like the actual edge experience of a user, may not be good for a, a slew of reasons.
You'd think like cell connections pretty much always work, but sometimes, you know, something goes wrong and, and they don't. So I think we just deal with that, to some extent as well. [00:34:00] So being able to be resilient to that stuff and just sort of like... keep on going. Obviously there's degradations, that do happen, but that's really important.
But then you hit on another thing, which is, is big, which is, software complexity is one thing, but when you deal with, you know, physical things in the physical world, like hardware, that is easy to forget about if you only work in the cloud... that stuff breaks sometimes. And, and you've gotta, you've gotta be resilient to it.
I mean, we, we run, you know, three devices per restaurant, from a compute perspective and already mentioned, you know, kube basically is responsible for scheduling workloads across those, using 'em all when they're all available and moving things around, you know, if something fails, that's the primary reason for it.
And then you've got, similar at the network level for us. So like, LAN connection is like, super important. So we actually have, you know, HA from like a routing and switching perspective. So like our three devices are actually across three different switches, that are, that are balanced across two different routers with two different, connections,
Matt Klein: interesting
Brian Chambers: So pretty, like you wouldn't expect that, I guess for like a, a fast food restaurant, right? But like
Matt Klein: No, it [00:35:00] makes sense,
makes sense.
Brian Chambers: Pretty big investment in, in that tier. I mean, it's, it's not free, but, that's good. So we've got stuff like that. So more resilience there and then on the connection side.
Yeah, multiple connections. But um, yeah, you do, you do still have failures or you know, you can have a power surge that breaks a bunch of stuff. So we still try and, you know, if you think about user experience from whatever it is that the user is looking at or touching and work your way back, we build as much like graceful degradation in
close to the user as possible and work our way back and just, what can you not handle? You know, push it back a step. Can you handle it there? No. Push it back a step and sort of work from that user, I would say, back up towards wherever the the thing needs to go. So,
Matt Klein: yeah, no. Yeah. Makes sense. Um, I guess. Tell us a bit more than, you know, since we started to talk about it.
Like how, how do you know if something is broken? Because as we said, you, you have a lot going on, right? I mean, you have the, you have the customer apps, you have the apps that the employees are using. You have, I'm sure point of sale [00:36:00] machines, like, I mean, you have a lot going on within the kitchen. Would, would love to learn more about...
yeah, I mean, it's like how do you monitor the system? Like how do you understand if things are actually working well? Because I think, again, many people out there who are listening, you know, they have a good understanding of, oh, you've got my cloud service. I'm gonna put some observability monitoring on that.
And, I think when you get into the edge and the mobile world, as we've talked about a lot on this podcast, things get a lot more complicated. Um,
Brian Chambers: yeah.
Matt Klein: And I, and I think your situation is more complicated than most, so would love to learn about that.
Brian Chambers: Yeah, I mean, it's, it's honestly still a challenge. Like, I'm, I'm gonna hand wave at all things like cloud, because I think that, like you said, that's pretty well known, but when it comes to like the, the physical restaurant edge site, things are still complicated and there's still a lot of challenges.
So, you know, like, I think it's very common for the, the business data coming out of any given application to be, you know, [00:37:00] 2, 3, 5, 10 x equipped by the operational telemetry necessary to support the same thing. And you know, typically in the cloud it's like whatever, you're paying some money to store that stuff, but maybe not a big deal.
Maybe it's worth it. But when you're dealing with like, you know, 3,500 locations and the connection and bandwidth challenges we talked about, like our bandwidth is pretty, pretty decent, but a lot of it is carved off for very specific purposes already, like credit cards and inbound mobile orders. So there's only so much that's really available, you know, for, for general use.
And, and sending huge amounts of telemetry, can break things like, I've told this story numerous times before, but in the early days of our Edge platform, one of the first things that we found was we actually broke credit card processing one time, hence the current segmentation, with too much telemetry, flooding, uh, the pipe...
Matt Klein: love that, love that story. I mean, and, and that, that story is not uncommon. I mean, that has happened to me. It's a, it's a very common issue. Yeah,
Brian Chambers: it's hard.
Matt Klein: Sorry. Keep going.
Brian Chambers: It's so, [00:38:00] um, I mean, we invented some stuff that worked for us, like using logging pipelines, because we, we have a whole like, fan in, in, in the restaurant, right?
Like we've got, you mentioned tablets and devices running local applications and I mentioned like iot stuff earlier, fryers, grills, et cetera. Like things can happen to those, or the software on those as well. And we care about a lot of that stuff because it can be useful to, deal with an issue before it gets, you know, before a fryer's completely broken or something and impacts operations.
So there's a bunch of data that we care about, so we have to fan in not just what comes from applications running on those edge computing nodes, but actually everything else in the restaurant, or most of the other things in the restaurant. So we've got a fan in, so we have a whole bunch of data that we have to deal with and we can't just send it all out and we don't know what matters.
So we typically say we're gonna send out errors. And, so logging error events typically, or you know predefined metrics that we try to get people to like scope down. So we're trying to do like behavior change, on people to get them to deal with more first class metrics instead of just shipping a bunch of logs.
[00:39:00] And then we can basically say, oh, based on the errors that we're seeing, from this particular location or set of locations, let's like turn on more logging and bring back more specific stuff from specific apps. But we had to, like we, we use Vector. Done a talk about this before.
Matt Klein: Yep, sure.
Brian Chambers: But we had to roll that stuff, that whole solution ourselves, and it just feels like, you know, 2025, end of 2025, like that would be... we wouldn't have to do that, but, but we did it.
So it's still challenging, like the reality of. You really just wanna know what you need to know when you're dealing with the edge, but you don't know what it is until something's happened. I think that that still remains like a challenge. I know one that, that you're,
Matt Klein: do I have a product to sell you?
It's no, no, no. I mean, I, I was, you were talking about that. And to be honest, if we ever did an ad for bitdrift, I would just have you record what you just said. Um, I, I just, you know, it, it is a very real problem and I, I think it's useful for folks, even [00:40:00] apart from the bitdrift side. I think it's really interesting just to hear about the challenges that people are still facing, because I think a lot of people really don't understand.
Sorry. They, they, they come from the, they come from the server world, which is what you said, right? Where effectively these days, the way most developers think is bandwidth is unlimited.
Brian Chambers: Yep.
Matt Klein: Storage is, is unlimited.
Brian Chambers: Yep.
Matt Klein: And for all intents and purposes, it just doesn't fail, right? And still in this day and age, when you're in the mobile edge world, none of, none of that is true.
I, I mean, it's, it fails all the time. Bandwidth is limited. Um, so, I think it's super interesting, just to learn how people have kind of, dealt with these challenges. Um,
Brian Chambers: yeah, I mean, I think we came up with a decent solution, but yeah, like, it, it just, it, it is, it's still hard and, and it sounds simple.
It's like, well, you have good connections. Why don't you just, oh, it's not pay more to up your bandwidth or whatever. But it's like you remember, [00:41:00] every number you add is multiplied by 3,500.
Matt Klein: Yep.
Brian Chambers: Um, it could get real, like really expensive to operate certain types of solutions across that kind of footprint.
And ours is still growing. Like, we're still expanding and opening new restaurants every year. So that like, number in front of whatever we're talking about, adding from a dollar perspective, that multiplier just keeps getting bigger too. So, I, I think, you know, just infinite bandwidth is just not tenable, in this kind, kind of distributed environment.
You know, just doing it all locally and putting a giant storage array or something there, like building S3 in every store ain't happening. Not there not happening. So just like there, there's operational realities. You just have to accept when you're gonna work at this type of edge that you, you have to, navigate.
So, yeah, obviously I'm, I'm a fan of what you guys are.
Matt Klein: Yeah,
no thanks.
Brian Chambers: Doing, thinking about,
Matt Klein: um, would, would love to pivot a bit just just to learn about, like what is on your radar currently? Just, you know, from a technology perspective, I would also imagine that as the [00:42:00] Chief Architect you're dealing with internal productivity, things of the engineers that work at Chick-fil-A.
So just would love to learn a more, more about, what are the big challenges, you know, that you're working on right now, or what are some of the things that you're looking towards in the, in the next few years?
Brian Chambers: Well, yeah, I feel like you tried to set me up to say AI, Matt.
Matt Klein: No, no, no, no, no, no. We, we like definitely don't have to talk about AI
I'm just
Brian Chambers: No,
I'm, I'm happy to.
Matt Klein: Yeah. Yeah.
Brian Chambers: Um, like, I, I think that's a reality, right? Like there's a lot of, there's a lot of business interest in it, everywhere. It's in the news all the time. It's like practically all people can talk about. I can tell you, like personally, what I'm excited about with it is actually
Matt Klein: Yeah, sure.
Brian Chambers: On the software engineering, side of things, like, and not that it's magic or anything like that, but I feel like, if- if you can get good at kind of wielding the tools and get really good at, asking them to do discrete tasks, there's lots of ways to do this. We could talk about it, but, if you can get good at asking them to do discrete tasks, then I think you can actually get a lot of, like a ton of benefit, out of the AI like coding tools in [00:43:00] particular.
Like, I'll, I'll transition over for a second to the the nonprofit moonlighting stuff I do. Like, I, I probably get, you know, I probably get 30 hours a week, behind, Claude code at the moment for that, and I get like, tons of great stuff done. But I have to put like, the, the energy for me isn't into thinking about how every line of code is gonna get written, but it, it, it instead is
effort that goes in on the front end into thinking about, not like the giant spec of everything that's ever gonna exist, but this one thing I'm trying to work on right now, like, how should it work? So bringing an architecture mindset to it and then letting it write some code and some tests and uh, and then, you know, putting eyes on it and evaluating it and then like iterating.
But I, I mean, same for Chick-fil-A. I can't go into as much about that, but like, I feel like that's gonna be just a huge opportunity in one of those places where, I don't think it's, I don't think AI is actually overhyped. I think they're, maybe it's on par. Like I, I think in the software engineering space, it's gonna be really helpful if people can figure out how to, like, how to do that discrete, discrete [00:44:00] task creation part effectively with a good architecture mindset and enough context in their head to give context to the, the AI, opportunity.
So that, I'd say that's one thing.
Matt Klein: Yeah I mean, I, I have, just from my own personal use, I feel like I have crossed the, I have crossed the barrier, where I, I am- I haven't used Claude code, um, but I'm now using Sonnet 4.5 with VS Code and some of the results I've gotten recently in agent mode... very good. I mean, it's like, and it, it's definitely what I've learned is that you have,
just like you said, you have to learn how to use the tool. It is a tool, and it can produce very bad results, but if guided well, it can do very well. And I, I've been actually very impressed recently in, in terms of some of the output that I've been getting.
Brian Chambers: Yeah, and I feel like it's, some of it's picking the right problems too.
Like if, if it's this, like,
Matt Klein: I agree.
Brian Chambers: I mean, I imagine if you were working on, you know, some really like core element of Envoy, which I'm not familiar with the code base, [00:45:00] but like you would, you'd probably want to like massage things to perfection because this is something that just sees like incredible amounts of load and things like that.
So like optimization matters.
Matt Klein: Yeah.
Brian Chambers: I'm not saying it can't do that at all, but I feel like in those cases you pick to... you, you choose to be really involved. But like when you're doing web apps and stuff like that, I feel like if you can give it a good task, it's like, maybe it's not a hundred percent optimal, but in a lot of cases when you have a few thousand users or something or a hundred users or whatever, does that even matter?
So I sort of feel like there's also this, like, do you know the problem you're working on like architecturally, but do you also know how to think about it as like how much attention to detail actually matters for it to get to a good working final state?
Matt Klein: Yeah.
Brian Chambers: And I feel like you can kind of make your choices a little bit based on that as well.
Matt Klein: Yeah, I mean, what, what we found at bitdrift is that it, it is actually really good at working on low level code. It just has to be guided a lot. It's like basically what you said, [00:46:00] which is, and I think this is where people go wrong or probably are going to go very wrong within the industry, is just to trust what it outputs.
But, I think if you, I, I think even in a, in a high reliability production environment, as long as you're willing to review the code, like you would review any other code and then you can guide it along the way. At least what we've seen is that, you know, when tasks are very targeted and you can work on a self-contained piece.
The, the results with the new models are good. Yeah. I mean, it is, it is good, but it's a, it's a tool like any other, um,
Brian Chambers: yeah.
I agree. Uh, I'm doing, uh, a lot of stuff on like an, an edge computing system for my nonprofit and it's low level system stuff, and I am having success with it, and I agree. You just have to guide it more.
But I think there's like little certain parts where it's like, this is, this is so important that it's really, really good, that I'm gonna like get a little bit more hands on with it. So it's kind of like a little bit of a spectrum is, is more what I'm trying to say.
Matt Klein: Yeah, for sure. So, so is [00:47:00] AI the only thing that you are thinking of right now or, or are there other things from an organizational perspective or a productivity perspective or anything else that you'd like to share?
Brian Chambers: I think it's a hot topic, but, I, I think a lot of the same stuff like, you know, that that is always true, is true. Like still, you know, helping people, focus on simplicity and not just add new layers and new abstractions and new things. Like that's a real, I think it's a real industry challenge.
I don't remember the exact name of the talk, but there was a, a Kubecon talk this year about like taming the complexity beast or something that I really resonated with. And like, there's, there's a lot of approaches that can be really great architecture, but that exact same architecture applied in a different organization or to a different problem is actually really bad architecture.
Like take, you know, Netflix microservices and apply it to your thing that has 50 users... um, not great. And so I, I think there's a lot of like, there's a lot of that. Like this is the cloud paradigm is doing all of this stuff. Like not just using cloud, but like bringing all of the things that anyone ever [00:48:00] thought about doing in the cloud to every single app that we're like starting to... we've, you know... we're, we're guilty of that, we'll say in, in places.
So this like, push to intentionally not add to that and be really thoughtful about what incremental things do become part of, of the stack. And then also like opportunities for simplification, like radical simplification, um, in places as well. So I think there's some of that, like undoing a bit of the, like cloud native thinking is great, but I think part of
cloud native thinking was like, you need to do all this stuff that you don't actually often need to do. And so it's like untangling that a bit and getting people to think a little bit differently. Especially with a lot of our like solution architect folks and, and stuff like that. I mean, they're feeling the same thing.
So, I think that like drive to simplicity is, is really big. And then, yeah, it's, I, I guess the other stuff is, is like a lot of, a lot of the same, it's like trying to build, you know, solid applications, uh, deal with the consequences of global, [00:49:00] you know, AWS region outages or CloudFlare outages or all the other things
everybody else is behind that take the whole internet down, and, and you know, all the security stuff that's always going on everywhere. So, I think it's a lot of that stuff that's maybe just like what we all deal with, but, trying to minimize how much of it we have to care about...
Matt Klein: yeah. Well, I mean, even on that one, you know, it, it's like we have these giant outages, and I think recently we've had three, we had the giant CloudFlare outage.
There was an Azure one, and there was an AWS one. But these things happen like every five or six years at this point.
Brian Chambers: Mm-hmm.
Matt Klein: And it, it is to me a really interesting question as to... just from a business perspective, it's like, obviously you all I'm sure have to ask yourself, you know, how much effort would go into making yourself resilient to one of these outages versus how often they actually happen?
And, to me personally, it's not a slam dunk that you should drop everything and go and, and, and, and like fix this thing for what is, uh, an event that is [00:50:00] very rarely gonna happen,
so
Brian Chambers: yeah, it's like, planning for Black Swan events. And I'm not saying they're black swans 'cause they do happen, like you said, periodically, but, that's more my worry is that people seem to think like
now that we've had this great experience in the cloud over time where everything just kind of works until suddenly it doesn't. I think that's become so expected. Like every outage is like, 'what? An outage? That never, ever happens. This is terrible. We must do everything to make sure that never happens again.'
And so I think like as an industry, we're sort of like getting ping ponged around by each outage and then making potentially architecture decisions
Matt Klein: I agree
Brian Chambers: off of those events, and I think that's a really unwise thing to do. Maybe there are things that make sense to consider having in not the main regions that have issues sometimes, like, you know, or use a second one as a standby or something.
But I guess what I'd say is like some of the old school patterns of just like a hot standby or something for the worst case emergency might still be totally fine. You may not need to be multi-region, [00:51:00] multi-cloud HA all the time and deal with all of the complexity and then uh, subsequent outages that come from
Matt Klein: yeah,
Brian Chambers: that need to use that architecture.
Matt Klein: Right. I mean, it might be fine or you might not even want to have that cost. I mean, it's really up to the business. But what I always find really humorous about this conversation is that I think for almost every organization that is not a giant one that has, you know, that, that is willing to incur the complexity, you're gonna incur more
outages and downtime, trying to build like an HA multi region situation, than you will from the outages that will occur every, every five to six years.
Brian Chambers: Yes.
Matt Klein: And I think people just don't, they, they don't, they don't think about that. And I think, you know, on this topic, uh, I'll just say briefly because I was, I was laughing while you were talking about like simplification, is that.
I think people always think, because I did Envoy, it's like, bitdrift must use service meshes and microservice architectures, like, no, we are a, we are a monolith zero service mesh. [00:52:00] Keep it simple. So anyway, but,
Brian Chambers: and like I'm, I'm finding myself when I'm building things right now, building, monolith first, and if there's a need to split off something like special, then so be it.
But like, you know, I like Go, but I build a lot of the things in Go and then I'm like, oh, this, this should just be a shell script. And it's like, I think there's... not, yeah, not everything needs to be a really highly decomposed, highly scalable, super elegant looking solution. It's like sometimes the really scrappy, simple stuff that we've been doing for 30 years, like actually works better and is because it's simpler.
Um, it works better because it's simpler. And I think that just like that, that's a mindset that I- I'm hoping to propagate, you know, inside of our organization. And I, I think a lot of others are on that journey right now too.
Matt Klein: Perfect. I, I could not think of a better way to end the episode. So, um, thank, thank you, Brian.
This was a fantastic chat. I think people are really gonna enjoy it. So that's a wrap for this episode of Beyond the [00:53:00] Noise Signals, Stories and Spicy Takes. Huge thanks to Brian for joining and sharing his story. You can find this episode and all past ones on the bitdrift YouTube channel. And if you had fun, drop us a review, tell your friends or yell your favorite hot take into the void.
Just make sure to tag us. I'm Matt Klein, and I'll see you next time.

