Microservices & Distributed Monoliths

00:00 David Boike

All right, welcome everybody. Good morning. Welcome to the first day, the first full day of talks. My name is David Boike, and I'll be presenting Microservices and Distributed Monoliths. So, before I get started, how many people are like, "Yeah, microservice all the things," in this crowd. Yeah. Applause, I've never gotten applause before. How many people are like, "Oh my God. I went to a tech conference. It's another freaking talk about microservices." A few. How many people just don't want to put up their hands because it's the first talk of the morning? Everyone else. Okay, I think microservices and monoliths are really important to talk about, especially at a DDD conference, because I think especially the DDD community has a lot to say and can do a lot to improve the state of microservices, because we all know that monoliths can turn into spaghetti code really easily, but what I've also found in looking at a lot of customers is when they try to do microservices without a lot of forethought, they take that spaghetti and they don't make it make more sense. They just kind of turned it into more of a mess.

01:13 David Boike

And it all seems to start out pretty simple. They'll go, "Oh! We just have these simple units of deployments, so we'll just have a little microservice, then we'll have another microservice." And then a few more microservices after that. And this all, "Okay, this is okay, but wait a second. Now that I'm not in a monolith anymore, all these things are going to have to communicate somehow, so how am I going to pull that off?" And so, of course, the person that just heard about microservices yesterday goes, "Well, of course, I'm going to use HTTP/JSON/REST, because it's the microservices way. That's just the way we do things, because microservices means rest and it means Docker, and it means all these things." None of which I think are true. This is why I say it's the microservices way. Sarcasm registered trademark copyright 2018 me all rights reserved, totally use a font, I hate, to drive the point home here.

02:06 David Boike

So let's step back a little bit because in a monolith, each microservices a module, each link would be a function or a method call and the compiler would link all those modules together, and we wouldn't have to worry about routing from method to method, we wouldn't have to worry about method discovery, we wouldn't have to worry about reliability. I mean, one method calling another method is going to be pretty reliable, there's not a lot of ways that can fail. But now that we have this distributed all over the place, we have to worry about all these API calls and a network. We have to worry about routing, we have to worry about load balancing, we have to worry about monitoring the health of each microservice, we have to worry about timeouts and all these other stuff. So you have to design for failure too, and it gets a lot more complex.

02:53 David Boike

And so Aaron Patterson said in a tweet, "Microservices are great for turning method calls into distributed programming problems." And done improperly I completely agree with that. So let's get back to our system of microservices and they all start communicate... They all start communicating over rest and JSON and RPC, and all this kind of stuff. That kind of looks okay, I guess kind of organize whatever, but you keep adding features and features and... Oh! Well in order to do this, I need to talk to this microservice, and now that needs to get data from that microservice, which needs to get data from that microservice, and then need to go over here and eventually starts looking like this. This is not better, this is a distributed monolith, still a monolith, just all over the place instead of in one spot. And as the developer of this distributed model, if that makes you the ape from 2001, just with your bone and just bashing stuff, not knowing what to do and it can lead to disaster and we don't want that.

03:53 David Boike

But back to the picture, is this really so bad? Let's say we can just set the messiness to the side. Let's talk about performance. How might the performance be? So let's do some math. I know it's nine o'clock in the morning, but you don't have to get out your calculators. I did all the math ahead. Let's pretend that all we can give every single one of our microservices five nines of reliability, 99.999% uptime. And so if you have one microservice that you're talking to, that's one hop five nines of uptime. But when you calculate that down to downtime per year, on a yearly basis that's already 5.3 minutes of downtime. So now if a micro-service needs to go talk to another microservice to get data, that's two hops.

04:41 David Boike

And you're up to 10.5 minutes of downtime. Three hops is 15 minutes. Four hops is 21 minutes in a year. And if you go all the way out to 10, I would hope we would stop at 10. 10 would be essentially four nines, and there's a bunch of zeros, and then some more numbers it's not quite that nice in math, but that's 52 minutes of downtime per year. I guess that's kind of acceptable I could maybe get away with that in some projects, but it's pretty hard to guarantee that 99.999% uptime. So just for fun, let's make it worth. Let's go four nines, you start off with 52 minutes of downtime per year, and as you escalate towards that 10 nodes, your downtime is now eight hours and 45 minutes per year. That's a whole day.

05:31 David Boike

I like going further. So let's try what you get with an Azure multi-instance VM. Microsoft says "For all virtual machines that have two or more instances deployed in the same availability set, we guarantee you higher virtual machine connectivity at to at least one instance at least 99.95% of the time." So with one hop that's four hours in 22 minutes of downtime already, you start to escalate that two hops, three hops, four hops, 10 hops. You're down for an entire two days almost, it's like a whole weekend or two business days. If those two days are like black Friday or something, you're really kind of screwed.

06:15 David Boike

Let's keep going. Azure Single-Instance VM reliability guarantees. These are their SLS where they will give you some money back if they breach it. So you start with eight hours 45 minutes, and as you start to progress it gets worse and worse until at 10 hops, it's three and a half days worth of downtime. And I promise this is the last one I'll do, but Azure will actually give you a 25% refund if they go below 99%. And so with one hop you're already at 3.65 days, two hops is a week, three hops is 10 days, four hops is 14 days and 10 hops you're out of a job.

07:05 David Boike

So you might be saying well, okay, I'm not going to be stupid enough to actually have 10 microservices all calling each other in series, there'll be in parallel. It will be fine because we won't be going that deep, it'll just be wider I'll have a bunch of calls. Let's say they go max three hops deep, so calling back to that earlier slide 99.97% reliability, we'll say we just have a bunch of those. So it's like 30 parallel calls and then they all need to go on average two hops beyond that. The math on that is four nines to the third power to the 30th power, which is four nines to the 90th power, which is .99104, which is already 3.27 days of downtime.

07:47 David Boike

And if you take that 99.95% reliability, that's 16 days of downtime. So building systems like this with HTTP calls that call other HTTP calls, which I... HTTP is hard to say a whole bunch of times. If you try to do that too many times, A you can't say it, and B you get a whole bunch of downtime and you can't build a reliable system that way. Personally I don't like to get woken up on weekends, in the middle of the night. So I would prefer to do it a better way. But let's talk about latency. Uptime isn't the only consideration what about latency? If we have one hop, let's just figure 100 milliseconds latency per hop, because when you're not doing a method call, but you're instead calling over HTTP, there's going to be some latency and going from one server to another.

08:36 David Boike

So this math is a lot easier because it's just linear, you get to 10 hops and that's one full second just waiting for latency. But remember, that doesn't count any processing time, that doesn't count waiting for any databases, that doesn't count whatever. And imagine if one of your microservices gets its data from SharePoint. Latency Matters. Amazon found every 100 milliseconds of latency cost them 1% in sales, Google found an extra .5 seconds in search page generation time dropped traffic 20% a financial broker could lose 4 million in revenues per millisecond, if their electronic trading platform is 5 milliseconds behind the competition. Latency matters and so that's not a way we really want to build our system. So now we have to rewind a little bit and go, if all of that is so problematic, then just what was so bad about monoliths in the first place? We built monoliths for a whole bunch of years. They weren't horrible.

09:39 David Boike

Well the answer is coupling. I'm sure this couple is going to be very happy together, but computer systems don't always work that way. And so let's think about coupling and how it works in systems. When I started programming a senior architect told me that I had to build three tier systems because that's the way it was done and so that's what I did. And so you still see that a whole bunch of times you have your UI layer and you have your business logic layer and you have your data access layer and it talk to a database, and all of these are decoupled from each other because, the architect told me we should be able to take that data access layer and just take it out and replace it with a new one and that's how we'll just magically switch from SQL server to my SQL. If we feel the need to one day, has anyone ever done that? Two people, three people.

10:28 David Boike

It doesn't come up often, so designing for it in the first place, is probably not the smartest idea. It kind of makes sense if you're a WordPress and your whole goal is to be able to be installed practically anywhere, and so you'd need to make kind of a provider model so that you can talk to any database, however you want to do it, but then you are already taking what you can do with the database, and you're boiling that down to the smallest common denominator of the feature set that those databases will offer. You can never really take advantage of the database hardware you're on, which means performance is going to suffer as well. It's all about trade off.

11:07 David Boike

So because of all this coupling what people sometimes do is they say, I know I have a great idea to get rid of the coupling what we're going to do is we're going to move that UI layer up. We're going to put an API layer in the middle problem solved. Except it doesn't solve the problem. There's still coupling, There's still all that coupling in the UI layer on the horizontal basis, and on the vertical basis, they're really coupled to each other because you're going to have methods in your business logic layer that only exists to take it from the UI layer and forward it to the data access layer. There's going to be all that stuff, you can't really separate those things. And adding the API layer doesn't really fix anything because as we talked about adding HTTP just adds temporal coupling, temporal coupling is the coupling where you need to go get something and you have to wait in time for that thing to be done. And if that thing is slow, you are stuck.

12:06 David Boike

So let's think about this again. What if we were to flip the tables on this? What if we decided, we don't need to do all of this horizontal coupling, what if we take a look at our UI and say that there are things that go from the UI all the way down to the database that are cohesive. Anyway, if we are going to support MasterCard as a payment, then we are going to need something in the UI so they can select MasterCard. You're going to need something in the business logic that tells you what to do with MasterCards and then MasterCards all follow a certain pattern and how to validate them. You're going to need something in the data access layer to say that this is a MasterCard and not a visa, you're going to need to store that data in the database. There is already coupling there on the vertical axis.

12:54 David Boike

So what if we just embraced that and we said, we are going to divide our system in these vertical slices, and then we'll have some sort of agent let's just call it a service bus for lack of a better name, and those vertical slices will be able to communicate in a lightly coupled way between them. Now it turns Out that this is pretty magical and allows you to do some pretty amazing things. As long as you beware database coupling. And this is one great thing that microservices has actually given us. It's no longer taboo to say I'm going to have a whole bunch of different databases. If you have all of your vertical slices and they're all in one database, it's only a matter of time before some developer goes, you know what? I could follow the rules, but it sure would be simple in the database to just go get that value I need from right over there. And then you reintroduce your coupling in the database and you're right back where you started.

13:50 David Boike

But if you follow this model and you do vertical slices, then you have completely autonomous services, and if all of these autonomous services own their own data, then you don't ever need to share. There's no reason for microservice to have to call another micro-service to call another microservice, to get what it needs, which is great because messaging will perform actually much better than RPC. The problem with RPC is everything is kind of queued, but it's in memory. So when your request comes in, memory has to be allocated to take that request and get the specific information out of it. And to figure out what is in that request so that it can respond. That memory is stored on the heap. Then you need to go call another microservices or a database or whatever, and what happens to that memory? Well, the garbage collector goes or comes around and goes, "Hey, are you busy?" And he goes, "Oh, sorry. I still need that memory." And garbage collector goes, no problem. "Got you and take your memory, and I'm going to put that down to gen one." Good to go.

14:54 David Boike

And so that request continues to go and eventually the garbage collector comes around and goes, "Hey, are you still using that memory? It's in gen one." Oh, I'm sorry. Yeah. We're pretty busy. Things are slowing down a little bit. So yeah, I still need that memory." Okay. It goes into gen two and now it's in gen two, which means it's kind of pinned there, and the only way to get it out of gen two is to do a global GC collect, which is essentially a lock of the world kind of thing. And so that freezes all the threads while the garbage collector goes through and cleans up all the memory that's in gen two. And while everything's locked up, the situation is just getting worse and worse. So the performance that you'll see with RPC is it'll... You'll have more throughput, more throughput, more throughput as load increases until eventually you're just going to hit a peak, and then you just fall off the cliff because there's just an absolute limit. And after that, you can't handle anymore.

15:46 David Boike

Where with messaging, if you send asynchronous messages between your vertical slices, it might start off a little bit slower because there is that overhead of, I have to write the message to a disc before I can process it, I have to send it over whatever. But after the inflection point, you see where the lines cross messaging actually gets faster because it's not hindered by the memory and the load characteristics that RPC is, and eventually when RPC falls off the cliff messaging has reached a nice plateau where it just continues to process messages reliably and continues to go.

16:25 David Boike

And so I've heard people say, but async/await fixes that because now we're not going to use all this memory, the thread is going to get out of the way because we're doing async/await and that fixes everything. No, it doesn't. That just means the thread gets out of the way so you can take on another request that actually just means you're going to fill up faster and start using more memory quicker and fail. Well, maybe fail quicker. Maybe they'll raise the ceiling a little bit, but it's not a solution. So where does this leave us?

16:57 David Boike

So layers and API is not the answer. Vertical slices could be better so that we don't get into this spaghetti distributed monolith, which is a monolith whether it's distributed or not, but how do we do it? Well, there's one novel idea. What the software industry needs is some middle ground between monoliths and microservices. Maybe we could just call them, services. There could be patterns and practices around them, a service oriented architecture, if you will. See the term SOA came around, but then it was kind of co-opted by all the SOAPs and the WSDLs and it was subsumed and became synonymous with HTTP, but that's not really true.

17:41 David Boike

Service oriented architecture can mean exchanging asynchronous messages and lightly coupling things. The problem get into is what to do when you hit your UI. When you think about your UI, your UI is at least conceptually tightly coupled there's all that stuff. In one view that you have to get from all these other services. That's why our microservices, we're going from one to the other, to the other, to the other. And so you have to ask yourself who owns this page? What service owns this page? How do you decide what service owns putting this page together from all of its component pieces for a good service oriented architecture the answer is, none. Instead, what we need to do is divide it up.

18:31 David Boike

The title of the book and the cover, those are product catalog information. We can put those in a product catalog service. We're not going to change the price and have that affect the title of the book. They don't belong together. Product ratings can go into a separate service, we could have pricing, be its own service, inventory controls that chunk of the page, shipping controls the part of the page that's, how am I going to get it shipped to you? And then there's that bar thing at the bottom, who else bought this book? And that's a cross sell, but that also presents a challenge because that's a grid. So within the cross sell, you have one source of data that says, if I bought patterns of enterprise architecture, maybe I also bought the DDD blue book, but that's just providing IDs.

19:21 David Boike

When you go down to it, you need to create that grid. And within each one of the items in that grid, you've got items from the product catalog and the product ratings and the pricing, but this is not an insurmountable challenge and we can do this without creating coupling in the UI. We can have page composition and at particular we've written some code that provides infrastructure to do this. That allows a page to be composed from all of its different components sources by having components of each service or microservice, present in the view layer, not coupled together, but just deployed together, all being responsible for going to the server and getting their own data. And so I need to challenge another commonly held tenant of microservices I guess that a microservices is a unit of deployment. What if it doesn't have to be? What if a service is not a unit of deployment, but a logical grouping of things, a logical grouping of data and business rules, all the way from the top of the UI to the bottom in the database.

20:28 David Boike

So if we keep that in mind when we're building a view, We can have microservice to components deployed in the web tier. I'm not talking about a web application that takes dependencies on a whole bunch of microservices. I'm talking about, making these components available and deploying them together with a devops process like octopus deploy or grunt or whatever. The microservices don't need orchestration to figure it out. They just say, Hey, on that product page, that route I have data I can provide for that route. The infrastructure just asks all of the components, what can you do for me? And then they all respond with, I can supply information for these routes.

21:17 David Boike

The infrastructure can then call each of these microservices view model appenders, which will then fetch data based on the routes. And they can populate a dynamic view model. And then once you have your view model, your HTML and CSS is owned by a branding service and is basically concerned with just putting the data on the page. So the composed view model looks something like this, where you've got a component from product catalog, a component from finance and a component from marketing, all adding information to products, and you get a composed view model that has products. That's a list and each list item contains, the product name from one service, the product price from another, and the product rating from somewhere else.

22:08 David Boike

So then the only question is, so how do you make that grid? This is especially troublesome because if you just go and get the IDs for the related items, and then you do that same thing you could get into a microservice SELECT N+1. So you need your components to be smart. And to be able to say, I'm going to go fetch the titles and the stuff for a whole array of product IDs. And the way you can do that is with a little more infrastructure. Your request comes in and the component from product catalog gets it first because it's registered that it is interested in that route. It can go to its data store and find the related items. And when it has those just a list of IDs, it can publish a related products found event. Now this is not an asynchronous message on a service bus. This is an InMemory.Net of multicast delegate event.

23:09 David Boike

So it's synchronous essentially. And the client side message broker can allow other components to subscribe to that in-memory event to say, Hey, when you have products loaded, then I can help you out filling that in. So now that the finance and marketing services have subscribed to that event, they can go get their data from their data stores. And then each of these view model appenders, starting with the product catalog filling in the product name and the cover image, and then finance filling in the price, and then marketing filling in the ratings, can complete that grid.

23:55 David Boike

Now, if that doesn't sound super easy and maybe a little complex, well, that might be true. It's not quite as easy as a monolith, but we chose the monolith because it's easiest to understand at the beginning. And then it grows impossible over time. It turns out that evented service architectures are somewhat more complex at the start and never get harder to understand after that. The key to all these, is to make effective use of your domain modeling skills to correctly define your service boundaries. It can be challenging at the start, like trying to complete a 10,000 piece puzzle. I mean, how are you supposed to put this puzzle together? All these pieces are blue. Where does that red one even go? But that's where being DDD practitioners can really help. In order to design effective services or microservices, I believe domain driven design is critical because correctly analyzing and defining the domain is the key to making this style of architecture work.

25:01 David Boike

Now, this talk is a little light in how to exactly go about that because we could talk for, well actually we did talk for two days about that the past two days, and it's probably not enough, but if you want a glimpse into it, my colleague Mauro Servienti is doing a talk tomorrow at this very same time in this very same room called all our aggregates are wrong. And he goes into analyzing a shopping card and how to break that apart into different services. So just remember when you're trying to do this, that it's not necessarily about speed, especially about the beginning in an engineering discipline, the important part is not to optimize how quickly we can build the first screen, but how quickly we can build the 100th screen, the 101st screen, the 102nd screen taking into account all the coupling we've created in the first a hundred screens. So you need to pace yourself.

25:58 David Boike

What we want is a sustainable pace, meaning that the ability to develop and deploy features at a rate that we will be able to maintain for many, many years and not get into that situation where a year and a half or two years down the line we go, there's a lot of coupling, we don't know what to do, we need to do a rewrite. So the only time you should optimize for the early stages, if you're building a quick and dirty throwaway prototype, like first startup for something like that, then you shouldn't be doing any of this microservices stuff, build it in Ruby on rails or whatever, figure out what stable, and then do all this analysis and figure out how to divide it up.

26:37 David Boike

And so whether it's singular or distributed, we want to avoid that monolith, by creating vertical slices not horizontal layers, dividing up the UI into different services and composing of U model that can power that UI so that we can avoid the monolith and build more successful systems instead. Now, if you want to see that in practice, we have a demo on GitHub at particular labs, NET Core demo. It works on .NET Core, and even in Visual Studio Code and the URL to get to it is go.particular.net.netcoredemo. So it does .NET Core and all that stuff if you don't care about that, not a big deal, but it includes all of these view model composition concepts, and you can see how all the different services interact to build this phone shopping grid, where the phone image comes from a different service then the price, which comes from a different service then the shipping stuff.

27:42 David Boike

Thank you very much. If you have any questions. If You don't have questions now, I'll be available out there too and all day. Thank you very much. Oh! Go for it.

28:08 Speaker 2

What's your experience or what's the weak point in that view model.

28:11 David Boike

The weak point in that view model, I think is probably that you're almost forced to use a dynamic, because if you try to strongly type it, you spend a whole bunch of time managing that strongly type stuff and it becomes a question of where do you put that and who owns it and not creating coupling based on that. And so that means that when you are actually putting stuff into the view model, you have no compiler telling you that yes, that is correct you put it in the right property. So that all has to be kind of implicitly understood contract that you fulfill. So you need to have unit tests around, doing stuff around making sure that all the things are filled in as the view model would expect.

28:57 Speaker 3

[inaudible]

29:05 David Boike

In this style, the server side view model composition. Yes, you are building the page MVC style. So all of those things have to happen basically in memory before it can come back. There's another style where you have an angular JS app or something like that, and then you can go direct to your services. So you still need kind of that client side pub/sub, so that you can get information from one service, publish an event and have other services go get their data, but it takes out having to compose the whole view model and allows you to go to each individual service individually. All right. Have a great conference, everybody. Thank you.

Microservices & Distributed Monoliths

🔗Transcription