MLA 019 Cloud, DevOps & Architecture

Jan 13, 2022
Click to Play Episode
The deployment of machine learning models for real-world use involves a sequence of cloud services and architectural choices, where machine learning expertise must be complemented by DevOps and architecture skills, often requiring collaboration with professionals. Key concepts discussed include infrastructure as code, cloud container orchestration, and the distinction between DevOps and architecture, as well as practical advice for machine learning engineers wanting to deploy products securely and efficiently.
Resources
Resources best viewed here
Designing Machine Learning Systems
Machine Learning Engineering for Production Specialization
Data Science on AWS: Implementing End-to-End, Continuous AI and Machine Learning Pipelines
Show Notes
Never Run Out of ML ContentGenerate Your Own Episodes
Want to go deeper on a topic this podcast didn't cover? Generate your own episodes—AI agents, transformers, diffusion models, whatever you're curious about. They appear right in your podcast app.Turn any ML topic into a podcast episode in your app.See the Workflow →See How →
Translating Machine Learning Models to Production

After developing and training a machine learning model locally or using cloud tools like AWS SageMaker, it must be deployed to reach end users.
A typical deployment stack involves the trained model exposed via a SageMaker endpoint, a backend server (e.g., Python FastAPI on AWS ECS with Fargate), a managed database (such as AWS RDS Postgres), an application load balancer (ALB), and a public-facing frontend (e.g., React app hosted on S3 with CloudFront and Route 53).
Infrastructure as Code and Automation Tools

Infrastructure as code (IaC) manages deployment and maintenance of cloud resources using tools like Terraform, allowing environments to be version-controlled and reproducible.
Terraform is favored for its structured approach and cross-cloud compatibility, while other tools like Cloud Formation (AWS-specific) and Pulumi offer alternative paradigms.
Configuration management tools such as Ansible, Chef, and Puppet automate setup and software installation on compute instances but are increasingly replaced by containerization and Dockerfiles.
Continuous Integration and Continuous Deployment (CI/CD) pipelines (with tools like AWS CodePipeline or CircleCI) automate builds, testing, and code deployment to infrastructure.
Containers, Orchestration, and Cloud Choices

Containers, enabled by Docker, allow developers to encapsulate applications and dependencies, facilitating consistency across environments from local development to production.
Deployment options include AWS ECS/Fargate for managed orchestration, Kubernetes for large-scale or multi-cloud scenarios, and simpler services like AWS App Runner and Elastic Beanstalk for small-scale applications.
Kubernetes provides robust flexibility and cross-provider support but brings high complexity, making it best suited for organizations with substantial infrastructure needs and experienced staff.
Use of cloud services versus open-source alternatives on Kubernetes (e.g., RDS vs. Postgres containers) affects manageability, vendor lock-in, and required expertise.
DevOps and Architecture: Roles and Collaboration

DevOps unites development and operations through common processes and tooling to accelerate safe production deployments and improve coordination.
Architecture focuses on the holistic design of systems, establishing how different technical components fit together and serve overall business or product goals.
There is significant overlap, but architecture plans and outlines systems, while DevOps engineers implement, automate, and monitor deployment and operations.
Cross-functional collaboration is essential, as machine learning engineers, DevOps, and architects must communicate requirements, constraints, and changes, especially regarding production-readiness and security.
Security, Scale, and When to Seek Help

Security is a primary concern when moving to production, especially if handling sensitive data or personally identifiable information (PII); professional DevOps involvement is strongly advised in such cases.
Common cloud security pitfalls include publicly accessible networks, insecure S3 buckets, and improper handling of secrets and credentials.
For experimentation or small-scale safe projects, machine learning engineers can use tools like Terraform, Docker, and AWS managed services, but should employ cloud cost monitoring to avoid unexpected bills.
Cloud Providers and Service Considerations

AWS dominates the cloud market, followed by Azure (strong in enterprise/Microsoft-integrated environments) and Google Cloud Platform (GCP), which offers a strong user interface but has a record of sunsetting products.
Managed cloud machine learning services, such as AWS SageMaker and GCP Vertex AI, streamline model training, deployment, and monitoring.
Vendor-specific tools simplify management but limit portability, while Kubernetes and its ML pipelines (e.g., Kubeflow, Apache Airflow) provide open-source, cross-cloud options with greater complexity.
Recommended Learning Paths and Community Resources

Learning and prototyping with Terraform, Docker, and basic cloud services is encouraged to understand deployment pipelines, but professional security review is critical before handling production-sensitive data.
For those entering DevOps, structured learning with platforms like aCloudGuru or AWS’s own curricula can provide certification-ready paths.
Continual learning is necessary, as tooling and best practices evolve rapidly.
Reference Links

Expert coworkers at Dept
Matt Merrill - Principal Software Developer
Jirawat Uttayaya - DevOps Lead
The Ship It Podcast (frequent discussions on DevOps and architecture)
DevOps Tools
Visual Guides and Comparisons
Learning Resources
aCloudGuru AWS curriculum
Learn Faster with a Walking DeskWalk While You Learn
Sitting for hours drains energy and focus. A walking desk boosts alertness, helping you retain complex ML topics more effectively.Boost focus and energy to learn faster and retain more.Discover the benefitsDiscover the benefits
Transcript
Welcome back to Machine Learning applied. In this episode, I'm gonna be interviewing coworkers from depth. Matt is an expert in architecture, and JT is an expert in DevOps or developer operations. These two skills combine in the deploying of a full fledged product that can be used by consumers, a web app, a mobile app.

Most of what we've talked about in this podcast series is machine learning, how to develop and train your model inside of a Docker container, or even developing and training your models on the cloud By way of AWS Sage Makers Studio notebooks now. Our skill sets are on the machine learning side, but eventually you want to get that model into the hands of a customer.

After you have trained your model, whether on local host or on SageMaker, you'll deploy your model through SageMaker to a SageMaker rest endpoint or to a model registry that can be called as SageMaker Batch Transform jobs. SageMaker serverless inference jobs. So now you have your deployed machine learning model ready to be used, but who's gonna use it?

That's where all the stuff from this episode comes in. Now, I know I have talked about a lot of the tooling and concepts from this episode in the past, and I promise I'm not gonna turn this podcast series into a full stack slash architecture slash DevOps podcast, and that's actually why I am creating this episode.

I wanted to talk to my colleagues who are experts in the field, stop bumbling my way through these episodes and wasting your guys' time and just say, Hey, Matt and tt, let's nail a coffin. How do you do this as a machine learning engineer? How do you productize and deploy your machine learning model as a full product in the cloud?

Should you do that? Now to get you thinking along that track, I want to talk about my journey with Noie. Noie was inspired by the publication and accessibility of these transformers, NLP models. The hugging face repository had all these models summarization question answering, and I thought, you know, these would be really great for use in a journal app to give people recommendations and resources.

Let me see what I could pull up. Crack my knuckles and I get typing away. I deployed these models previously as docker containers running on AWS batch. Now I'm moving everything over to AWS SageMaker. On SageMaker. You can deploy models that you train or you can deploy pre-trained models. As rest endpoints or for use in serverless inference jobs.

Great. Now that I had a machine learning model, I had to figure out how to get that to users so that they could start journaling. Well, the next step was to deploy a server using Python and Fast API to A-W-S-E-C-S or Elastic Container Services, and a wrapper on ECS that makes it easier to work with is called Fargate.

Fargate will manage the scaling up and down of your docker based servers based on load on the servers. Great. Now I have a server, but I need a database. So I created A-A-W-S-R-D-S or relational database services Postgres database, and I had to create an IAM role, attach that to my Fargate container so that it has access to my RDS database.

Then I had to make Fargate accessible from some other service, namely the front end, which we'll talk about in a bit. So I spun up an application load balancer A LB, that has an IP address. The IP address is registered as a domain name on Route 53 in AWS. Tie that a LB to my Fargate service. Now the whole thing is accessible by an IP address.

Finally, I needed a front end so that the users could write their journal entries and communicate with the server and the server's gonna kick off the SageMaker model inference jobs. So that front end was written in React, that is JavaScript, HTML and CSS. And then I dropped the build artifacts of that React code base into an S3 bucket.

S3 is AWS's service for storing files. Within S3, you can checkbox make this bucket publicly accessible as a website. Now, if you want your S3 bucket to be a public website, the buck doesn't stop there. You actually have to do another step, and that is to put a what's called CloudFront distribution over your bucket.

That CloudFront distribution's goal is to cache those files at various locations around the world so that it's more fastly accessible by users in various locations. That's cloud front's purpose, but it's actually a required step if you want to add a domain name again through Route 53. So I register a domain name, no fee.ai point.

It's a record to that cloud front distribution. The cloud front distribution points to the S3 bucket. The S3 buckets files are React, JavaScript, C-S-S-H-T-M-L that are making API calls to my Fargate container through the application load balancer. The Fargate container is sitting on top of ECS, which is managing the scaling up and down of that service.

The Fargate container has attached to it in IM policy that gives it access to the database on RDS and the Fargate container will make calls to the SageMaker model endpoint in order to run inference jobs using machine learning. Oh, and did I mention you don't wanna string all this stuff up yourself by pointing and clicking your way through AWS console.

The reason for that is it's hard to track changes of your architecture over time or to know what services you have running and which ones are affiliated with each other. Or let's say you make a change and you want to keep track of that change. Well, you won't be able to keep track of that change unless all that stuff is stored in code.

And so there's a project I'm using called Terraform. Terraform is what's called infrastructure as code. It lets you manage all of the deploying of your cloud services in a code file. You can then check in that file to GitHub. Then you can use that file to track changes to your architecture over time or so that other users can repeat that architecture on their end, or so that you can make subtle changes in, let's say your prod dev and staging environments just by way of code.

So I just mentioned a slew of cloud hosting services on AWS and how you would do that in code and is your head spinning? Because mine is, and so that was the impetus for this episode was that the last year's worth of bumbling my way through architecture and DevOps when my specialty is machine learning and data science, as I'm sure is the case for most of my listeners.

I wanted to get some experts on the line and say, what are the popular tools out there and which tools should be being used? Are all of these skills valuable to a machine learning engineer or expected of a machine learning engineer in the job market? Should we be bothering with this stuff or is it too big a pill to swallow and we should be leaning instead on our DevOps colleague or hiring somebody to do this for us?

And so that's the kicking off point for this episode. I hope you enjoy. Welcome back to the show. Today we're gonna take a diversion and we're gonna talk about DevOps or developer operations and architecture. And we have on the show two coworkers from Depth. Matt Merrill, go ahead and introduce yourself.

Hey, I'm Matt Merrill. I'm a director of engineering at Depth us. I come from a background of Java development and no JS development. I've dabbled in DevOps. It's something that I love. I'm usually the guy on the app team who is, is reaching across the aisle to the, the ops folks and things like that. I've been doing this for over 15 years and I'm happy to be here.

Hi, my name is, I am the DevOps Practice Lead at Depth. I started out my career as a Java Pro Python programmer, and I was doing DevOps before the term DevOps came out. Because of my varied interest, I would always, uh, talk to the ops 'cause I like Unix administration. I would always do the CIC, the Jenkins because no one in Dev wanted to do it.

I would always maintain the dev servers again because no one wanted to do it. And I started doing automation of our application because I didn't wanna do it by hand. And I learned that, wow, Terraform, Ansible, I love this. I pivot my career from Java development to DevOps.

So for my listeners, I promise I'm not going to turn the show into a DevOps slash architecture show or full stack.

I've been taking kind of a direction with that. In the last few episodes. I talked about SageMaker AWS, but this might be the last episode where I talk about DevOps. I've kind of been bumbling my way through DevOps and how it relates to machine learning for anybody in machine learning who actually wants to deploy their models to the cloud for a client where they wanna get their model online.

But in this episode, we're gonna kind of try to really dial down DevOps, the tools you'd use. What is it? How does it play into the machine learning ecosystem? Whether you've been developing models on SageMaker or local host, and now you wanna get that thing online. So we're gonna talk to the experts here.

T's gonna take us through DevOps. Matt's gonna talk about architecture and we're gonna see how does that fit into the picture from a machine learning perspective. So, t, what is DevOps?

So is a philosophy and the tools that the dev, the ops, can work together to help deploy code and applications safer, quicker into production.

So it may be good to highlight some of the problems that DevOps tried is trying to solve originally and and still happens today. The devs and the ops, even though they both work in technology and they both work in the same company. If they don't talk to each other, it is siloed. And we see that numerous times, even now when Dev does an engagement.

So DevOps is a way to have the common tools and common communication and process so that way these two silos can work together well. And as you know, we go on as a podcast, we could talk about some specific tools and some specific processes that that makes DevOps work.

Yeah, so a lot of my listeners, or at least I know a lot of machine learning coworkers or colleagues in the past, what they end up doing is either they write a model on local host, they're using TensorFlow or CAR or PyTorch site kit Learn.

Ideally, often they'll be using a Docker container, so if they're on a Mac, they'll have a Docker container that runs machine learning. They're not gonna be able to use A GPU because of the max limitations. If they're using Windows, they might be passing through to WSL two By way of Docker and writing machine learning models, they might be using a Docker container that inherits from Nvidia containers or hugging face containers.

More often the case, what I see is people are writing their models on uh, Jupyter Notebooks in the cloud. So they'll create a GCP or an AWS account. And they'll go to the machine learning toolkit. So in AWS, it's called SageMaker. They open up SageMaker, it creates for them a Jupyter Notebook. Right off the bat, they open up that notebook and they start typing Python code.

They write a model. They're, they're using K Os. They train it and they, and when they deploy it, SageMaker has all this tooling for deploying a model. You just say model deploy, and it will do all the stuff for you in the background. It will spin up. Who knows? EC2 instances. ESS instances, we don't know exactly sometimes using Docker, if you're inheriting from a SageMaker docker container, sometimes it does all the containerization for you itself.

And all you do is write some Python code. They call that bring your own script. But what happens is machine learning engineers, they know Jupyter, they write some Python, they deploy a model. Then the rest, it's like, where do we go from here? How do we make that available to the internet, to our client? And that's kind of where we stop in the podcast.

And you begin as, as somebody, as an expert in the orchestration land.

Yeah. So that's a great, uh, great question. 'cause I actually experienced this firsthand. So when I was working with the Cambridge Mobile Telematics, CMT, the dev machine learning team, they were, uh, playing around with the SageMaker in, uh, in the sandbox, in the dev environment.

And what became a like experiment became really popular to analyze data, process data, display data. So one day they came to me and say, Hey, we got this Dev SageMaker in Dev, and we want to deploy to production. And I didn't know anything about SageMaker. I barely knew anything about Jupyter Notebooks. The only thing I knew was like, uh oh, cool UI for Python.

That's basically all I knew about it. So, you know, one of the DevOps philosophy is collaboration, right? The devs and the ops working together to achieve a common goal and deployed it. So it was a two-way street. So they taught me about Jupyter Notebooks, SageMaker, things like that. And on my end, you know, I took a look at their dev implementation and I saw a lot of things that had to be modified before we could deploy to production, which is fine, right?

This topic is so complex that were not expecting machine learning devs to understand it. And so to just give some examples, I had to encrypt a disc because it had user data in it. There was hardcoded database, usually passwords on there. So we had to use this thing called IAM execution roles. To, uh, secure and, you know, get access to the, the database secrets manager to retrieve the secure, using password and to make backups in case things crash.

So things like that. Things that you have to do before you go to production, and that's why you wanna work with your DevOps team to do that. We're not expecting the devs to think about all the stuff that has to go before going to production.

Actually. Now that you mentioned, so you talked about IAM, you talked about Secrets Manager actually.

What is architecture, and Matt can chime in on this. What is architecture and how is that different from DevOps? I imagine there's a lot of crossover. So Matt deals primarily in architecture deals, primarily in DevOps. In what ways are those roles similar or different? Where does one begin in the other end?

Yeah, that's an interesting question. I've actually never heard DevOps and architecture compared like that. But let's start with what I think architecture is. And I'll step back by saying like, you know, there's this perception I think in software engineering that an architect is kind of like a, some kind of, and I mean this tongue in cheek, like God-like figure who sits on an ivory tower and sends down edicts for how a team needs to implement something.

And there certainly are those type of architects, but I think that that's probably the worst type of technical architect you could possibly have. I see architecture as, uh, systems engineering, how different systems interact with each other, and how can you make a healthy whole system that is as simple as you possibly can make it.

Maybe each individual piece is complicated within itself, but how are you going to make all of those different things in your business talk to each other, whether it be the customer relationship management system, feeding data into your machine learning model or whatever it might be. How are those things going to play nicely together?

How are you going to make sure that there are teams of people with the right skill sets to support those things? Thinking that far ahead, thinking about how these applications are going to get deployed and making sure that you do have a DevOps team or a cloud engineering team or whatever you want to call it that can support that type of stuff.

I always like to think of it as a cross-cutting technical person who is at the service of different teams who can help kind of roadmap these things. I really like the discussion that you and and Jira, what we're having that ette mentioned about like we can't expect machine learning engineers to know all this stuff.

You can't. Right. Like it, it's unfair. It's its own topic. It's, it's spot on. You need at least one person, if not more people to do these type of things. And I like to think that the role of architecture is kind of the person who could take a step back or people that can take a step back and think about how, how all this blends together before it goes out to users.

And I would say that most small companies, you wouldn't start with an architect, but once you start getting bigger, that's when you need to start thinking about having somebody in that role. And, and ideally you have somebody grow into that role organically who has all the, the context and history and things like that.

Probably commonly comes from maybe full stack knows how the server ties to the database ties, yeah. To the front end. How to get AEM mobile. That makes sense to me. So am I right to say that an architect blueprints the whole thing and then a DevOps makes it happen, at least in the cloud? And that's where my confusion was as I thought of the, those two roles as similar in the way that a DevOps would string these services together actually in implementation.

So I think it just depends on the architect and their expertise. So I've dealt with numerous different architects from numerous different clients and companies. Some architects are very high level, right? They'll draw a diagram and say, oh, okay, we have a container. These are the requirements. We needed five nines uptime.

We need a cluster. We need to handle this, load, this security here. Here's a diagram. DevOps, you figure out the implementation to make it work. Other, uh, architects, especially folks who are promoted from the trenches. Oh, okay. Well I want all this and this is how I want it, right? I want, you know, Fargate and I want Kubernetes or whatever tools that they like.

Here's the recipe. You just follow it. So it just depends on the architect and their skill sets and their personalities. I, I

like to think that the more effective architects are somebody who knows when to do each of those things at the right time. Uh, you not, you, you don't always want the blueprint because you might have an effective team.

I like to think it's as much, you know, art as science and also just human interaction as technical work.

Sounds like the conclusion then is that taking on DevOps as a role, as a machine learning engineer is too big of a pill to swallow. If you're gonna one man band a project as a machine learning engineer, maybe for a client, let's say it's a one machine learning engineer who just landed a, a client who may or may not have other people working on the project.

Taking on a bunch of DevOps, technical knowhow would be a, a lot to ask because in a way it's a, it's a role in its own right. It's not just a suite of skills that's valuable for any technical person to have under their belt. So, g, that, that being said, can you speak to sort of the level of complexity involved in doing DevOps?

Oh. And why maybe a machine learning engineer wouldn't want to take on all this burden themself.

Right. So especially security is the one that's scariest, right? Especially with machine learning, you're probably gonna be touching sensitive data, analyzing sensitive, sensitive data. So if this data is exposed, well, it's bad times for you, bad times for your client, the company you're working for.

So at that point, you probably want to hire a professional to take a look at your implementation. And you know, here at depth, we specialize in that to make sure that the application that you wrote on your laptop can be deployed quickly, safely, repeatably to, to production and security is probably the most important thing.

We've done that with plenty of clients where, and it's not just the application that needs to be secured, it's the actual entire infrastructure too, right? There's the network we see plenty of times where the network is public IP wide, open to the world, easily hackable. You know, there's these things called S3 buckets where you usually store your data that's open.

So security is the one thing where it pays to have a professional come in and take a look and tie it down.

Cool. Drew, what, I'm curious what you think of this, but I would say everything you said is spot on. Like that is not something I would ever advise an ML engineer to try to dive into half-heartedly.

But I think that there are certain things that an ML engineer can do and certain skill sets that they can start to brush up on. That will position an ML project well to be taken over by a DevOps team. And I think we're probably gonna talk about some of those that, so I'm curious if that rings true with you as well.

Since I'm not a DevOps expert, I just kind of, it's on my periphery.

So usually what I've seen is Terraform is pretty easy to pick up, right? So whether it's ML or Dev, it says, oh, okay. Hey look, I wrote this Terraform to spin up like a EC2 and A-B-P-P-C, and you get this Terraform, you think, oh, that's cool.

I'm, I'm glad you took the initiative to learn it. But then you had to take that terraform, turn your two modules, you know, put on the right settings, things like that. So I think it's great that any, uh, learns, doctor care form, Ansible, whatever it is, but it still requires a collaboration to make it professional fit the organization, fit the current code base.

Yeah, that's a good point. And, and back to your original description of DevOps too. ML engineers shouldn't think that they're just gonna take this and kick it over the fence to a DevOps person and be done with it. And the DevOps person is gonna magically, poof, make this appear in production. There's gonna be changes in conversations going both ways in order to make that happen.

And collaboration is the key to DevOps working together to, to achieve a common goal. That's that, that's definitely something that's going to highlight over and over in this conversation. The architecture, collaborating with the ops and the DevOps and the devs and everyone working together.

So you mentioned repeatability and in the the land of repeatability we have this thing called infrastructures code.

You know what, do we wanna just dive into the tooling? And I have Gerts talking point notes here and he's got a really good, what looks to be a journey through the history of and what inspires the creation and then use of these tools. So t yeah, go ahead

and take it away. So we'll get into the landscape of all these DevOps tools, but I think it's really important to remember what we, we just talked about, like, this is a full-time job, right?

And we're gonna go through all these different types of tools so that you have a lay of the land. You can hear these and be like, oh, that's the configuration management tool, da, da da da, da. We're not suggesting you go into a deep dive in any of these. This is so you can talk to your DevOps people, communicate with them better, or just be able to be a little bit more intelligent and ask good questions when these topics come up.

So listen to, it's an expert in, in a lot of these areas, and he's got a great, great, great overview of these. And then kind of let most of the details fall outta your head and just remember the high level concepts and you're gonna be in really, really good shape.

Alright? So we're gonna, uh, go back a little bit to the old days.

And the old days weren't that old. And describe the problem that DevOps is trying to solve or have solved for, uh, the most part. So in the old days there, people would go in, into the UI and just go clickety, clickety and create servers and create a virtual environments or do whatever they need to do.

They would do it manually. That is on AWS console

website.

Well, even, even before AWS right? You had VMware, you had, uh, command lines, right? Even with physical servers, people would go in, log onto the server and just typing commands manually. So it was almost like a dark secret of how things will be done, how things are configured.

And then when that engineer left, or that ops left right then, oh, no documentation, no idea what that person did. I. It's hard to, uh, manage a, uh, professional business that way. Or you even manage a

house that was a mess, just, oh God, what's going on in this server? There's trash all over the floor.

And then, you know, a classic example we see all the time is, okay, one dev or one admin goes into one server, does it one way, another admin goes into another server, does it another way, even though they're supposed to be running the same app, different, right?

Different servers, different boxes. So the manual part is how OPS used to operate. So what DevOps did was take some of the philosophies of dev, right? You have code that you write and you have code that you can review, and then ultimately check into a code repository like Git. So that way there's a record of the changes.

There's a record of what happened and you can, and other people can run it. So it seems like a pretty basic concept now, but when it was first entered, it was a game changer and there was a lot of resistance too, right? Almost like, uh oh, you're trying to take over my job. So what infrastructure is code does is there's tools like cloud formation, Terraform plume that well, instead of you going into the AWS console and going clickity, clickity to create a server to create SageMaker, you can write code to do what the Clickity Clickity did and then review it, check it in.

And if you have to make changes, then you could make that change in code review with your team. Say, oh, we're gonna change the instance to, you know, ML M five large to M five X large. Check it in, run it, and you could see the history of what happened.

So you have a code file. And it deploys all of the AWS infrastructure for you without, with zero clicks, you just run some script and it will spin up all of your EC2, your database, et cetera.

Is that what this infrastructures code does?

Yes. And the other game changer was what I mentioned before. It used to be that different absolute you things differently. Now that you have code, you can rerun this code in another environment, right? By just passing a variable. Let's say you ran this code in Dev, you tested it out, there was some buzz with the infrastructure happens, and then, you know, after a while you got, okay, this works.

So you could take, we could run that code, pass the variable QA or prod, and the exact same thing will happen. QA and prod. You could ensure what you test in dev. It's in qa. What you test in QA is in prod. In the old days, that was not guaranteed at all. Right? There may have been like 50 steps you have to run to deploy manually.

I guarantee you you'll skip like three of them, or you do three of them wrong.

Recently I deployed a service, um, using Terraform, a server, a database, and a everything in A VPC and the VPC block of code was, let's say, you know, 25 lines of code. It created a bunch of stuff I didn't know I needed to create.

So route table, internet gateway, network, acls, all these things I wouldn't have known to have done myself in order to expose the service to the internet. In AWS it seems like infrastructure code is code with their SANE defaults also helps with simplicity of, of deploying architecture.

Oh, definitely.

That's another great concept from Dev that came into ops. Modularity. Reusability. Oh, you have this module. We put all the complexity behind this box. All you have to do is pass two or three variables and we'll do the right thing. Alright.

You mentioned cloud formation, Terraform, Lummi. How do these all compare to each other?

So

cloud formation is AWS specific, and uh, Terraform is more generic. You could do multi-cloud and Terraform is the most popular infrastructure infrastructures. Code tool is the one that if you're looking for a job, that's the one you should learn. PMI comes in. So Terraform is very structured in the way it wants to do things.

It says kind of almost my way or the highway. PMI takes, uh, this concept and adds real coded to it. So if you wanna do a for loop, an if statement and be more generic, you can. So it's uh, a trade off. You could be more dynamic, but it leads to a lot of spaghetti code with PMI because it's, you know, real code.

But if you need that flexibility, it's good. I would adept, we recommend Terraform 'cause it's the most popular. And it's the most structured. It's easy to teach, it's easy to not mess up too much,

also get a lot of online support. You know, it has a very, very vibrant community from what I've found, and I'm fairly new to it, but a lot of support examples

and in independently, I, before I even talked to these guys, I came to the same conclusion.

I think Terraform is the way to go.

And also, uh, HashiCorp just had a $1.4 billion IPO. So it's a real company. It's not some open source thing that you know is unmaintained. So you do need professional support. You can reach out to Hash Corp or DET or IT to say, oh, hey, we have this Terraform. Can you help us?

And one cool thing about Terraform as mentioned is it's multi-cloud support. So if, if you don't like AWS, which is what we're gonna be harping on in this episode, you can use Terraform with GCP and Azure and others. But in the AWS world, there's cloud formation. And now CDK, if we're so AWS heavy, why are we saying terraform over CDDK or collaboration, right?

So, CDK within the, the DevOps community is a very intense debate on how to do infrastructure as code. There's a camp, one camp that just lay on the table I'm in where I believe that infrastructure is code should be structured, that you shouldn't do a lot of weird dynamic stuff to it. If you're doing that.

You're probably doing something wrong. That's why I cont towards Terraform and, and cloud formation is the same way. It's just pass me a bunch of con configuration and I'll figure out how to do it with Lummi and with CBK, it's taking some of the complaints of that structure and say, well, I wanna do this fantasy thing, or I wanna say no, if Dev do this, if prod do that and I want it more like how I code, you know, both CDK and Plumes supports Python supports TypeScript, I think Java's in there too.

Oh, okay. This is a procedural versus declarative code.

Right, exactly. So Terra Terraform is an HTML file ish. And, and CDK is a JavaScript file and Yep. Exactly. That's a very great, uh, analogy. Terraform says, this is what must be done. CDK says do it this way. Right. Okay. So that's infrastructure's code, that's Terraform.

And, and, and, and friends, I've heard a lot about Ansible. Where does that fall into the place?

So one of the bad confusing things about the DevOps tools is this thing. You have infrastructure code and then con configuration management and the two different specialties and the tools that does infrastructures code well, and the tools that does configuration management well is different tools, different companies.

So let me describe what configuration management does as a rear road example. And we see this pattern a lot. You would use Terraform to create an EC2 server in AWS, and once that generic EC2 server is in the cloud, well now you got to put your special business sauce into it, right? You got to configure, you know, install security patches, you got to add Unix users.

You got to actually install your, your business app. Doing those tasks, Terraform is not very good at at it. So there's another tool called Ansible, which does that where again, through configuration you can say, okay, install this patch. You know, yum. Install patch 1, 2, 3. Unix, ADD user, user, add xyz, then run, install sh.

So that's what Ansible does and that's what configuration management does. Once you have a generic server, the configuration management tool installs your business logic to it so that way you could actually work. Now, I thought we would do that with Docker, right? So a lot of these configuration management tools are getting deprecated because they were designed for the old days where you would actually have a, either a physical server or a virtual server, right?

You would have a Unix server that's a, a blank slate, and you would have to go in and configure it. So containers are another game changer in DevOps, where it's the old joke, well, it works on my laptop, right? The dev says it works on my laptop, and the ops says, well, let's just deploy the laptop. One of the great things about containers is that on your laptop, you can do everything on your laptop, right?

You could, uh, run a Unix server. You could test and deploy and do development on that Docker container. And once you're satisfied, hey, this works on my laptop. You combine it to a Docker image with Unix, the code, everything, and give it to the the DevOps team and say, deploy it. It works on my laptop. I tested it.

Take this entire thing and go.

The other huge benefit is that they start up almost instantaneously. Almost instantaneously, whereas these other methods can take a long time to start up.

The big game changer is what you do in dev is what you would do in prod. You wouldn't run an AC two server on your laptop, right?

You can't. There's tools that you could kind of mimic it, but Docker and containers are the way to go. It's, it was a major game changer when it, uh, was first introduced. Okay, so we have

infrastructure's code that is writing a code file that will create a whole stack on AWS, a whole stack on GCP. In the past, you would log into EC2 instances that you may have deployed with configuration management tools to install the patches and the software.

And you mentioned Ansible. What, what are the other ones?

Sure. There's, uh, things, chef puppet salt. Those are the, the four main ones, and they all have pretty much equal mind share. The big difference between Ansible and Chef and Puppet is that Ansible's push. Uh, you run a command from a server and you push out the configuration With CH and Puppet, you have a agent on the server and it pulls the configuration.

I prefer the push model just to show my bias because I like that control. They all pretty much do the same thing.

And now these configuration management tools are slowly being phased out in favor instead of docker containers. You can deploy Docker containers through Terraform to like ECS, right? Fargate.

And then there's an alternative Kubernetes. So we have Docker containers now. So talk to us about ECS Fargate Kubernetes, when you'd use one, when you'd use the other. What is Kubernetes? How much can you do everything with Terraform, et cetera?

Yep. So you made a very good point, right? Things like Ansible Chef are being superseded by a Docker file.

So the patches you would do with Ansible and Chef and installing your apps that you do in Ansible and Chef now, you would just do in a Docker file. And the good thing about a Docker file, again, you could check it in, it's a real file that you could pass around. So Kubernetes. So once you have a container on your laptop and you give it to, uh, DevOps and say, deploy this, that's where the real fund begins.

And we can provide a link to this, uh, infamous, uh, cartoon or a diagram or how do you deploy a container to AWS And there's literally 15 ways to do it, which leads to a lot of confusion and even among DevOps professionals, there's a lot of confusion about how, how to best deploy a Docker container. And everyone has their own personal favorites.

At depth, we recommend ECS, elastic Compute, I forgot the acronym. Elastic Container Service. That's what we recommend in a professional environment because out of all the options is the easiest. It has good dev support and prod support, but that's if you have a good DevOps team. If you are just like a one man show, things like App Runner, that's good.

That's simple. There is lasting beanstalk, that's nice and simple. It depends on what you're trying, trying to do. And then the big granddaddy of the mall is Kubernetes. Uh, and that has a lot of hype around it. The thing, the main thing I wanna stress about Kubernetes is it's a very complex ecosystem to deploy a container.

And people have careers just dedicated to Kubernetes alone without any other skillset, and they make great money. 'cause it's a very complex to get right. And what Kubernetes does is it takes a container and adds all the production values to it, right? Clustering, you can employ multiple containers of multiple copies of the container.

That way you could, uh, manage load, right? If there's, uh, a hundred users, you could have two containers. If you have a thousand users, you could have three containers, four containers. There's a lot of security around Kubernetes to have it, uh, safely talked to other containers or to other external services, and that's just scratching the surface of what Kubernetes provides for you.

Almost, I almost get the impression it's, it competes with AWS, it's its own entire server farm orchestration

service. Yep. So all the major clouds have their own implementation of the Kubernetes in AWS, it's called EKS, uh, elastic Kubernetes service, but GCP Kubernetes was developed by Google. So in Google Cloud, uh, the its Kubernetes engine.

Azure has their own implementation of Kubernetes. And the good thing about these, these called managed services where you can build Kubernetes on your own from scratch, right? And there's this famous blog called Kubernetes the Hard Way that will give you step by step on how to, if you want to build it by hand.

But what AWS all the cloud providers does is, okay, well if you want Kubernetes, you know, run this terraform, click this button, and we'll create a Kubernetes cluster for you that works with all the AWS services. So it's not so much competition, it's just another way of deploying containers into the cloud.

And it's probably the, the most complex way,

the most complex. Why would we want to use it ever if it's the most complex?

So, especially in a production load, like if you have a really, uh, a lot of intense production requirements, like you have a lot of teams, and these teams do a lot of microservices. So you, let's say you have 18 containers, 18 different images you have to maintain.

Kubernetes is great for that.

Mm.

You know. Automatically scaling up and down the load. It's great for that. And to be honest, there's a lot of hype around it, right? So tech people love hype. The the new shiny, right? Oh, it's a new toy. I like to play with this new toy. That's a, a huge driver in a lot of architect here at depth.

We subscribe to keep it simple. The simpler, the better. The simpler it is, it's easier to maintain, easier to manage, easier to teach other people. So Kubernetes is the opposite of simple. Mm-hmm. I mentioned the pay, right? So a Kubernetes engineer can command a, a pretty high salary. So a lot of, I've seen this so many times, well, I have an opportunity to use Kubernetes at work.

I'm gonna do it

because I want that skillset. The other benefit, like just full disclosure, I'm by no means a Kubernetes expert by any stretch. When I think of this from an architectural perspective, I do think about staffing and I think that what we're expressing here is actually a little bit of a hot take in the DevOps community.

I think there's probably gonna be a certain portion of people who are like, whoa, what do you mean not Kubernetes? What do you mean start with ECS? And I can get that because one of the advantages of using Kubernetes is even though those people do command a premium, that skillset set is quite available.

It's also open source. So y you know, the YAML files that you specify for Kubernetes, they're open source. They'll work largely, you know, in dear what might be able to correct me on this. Largely work between different, uh, managed service providers for Kubernetes, which is, those are nice benefits too. But I completely agree with what is saying is unless you have a team that's ready to do that and ready to deal with the complexities of managing, uh, Kubernetes services, it's much better to start simple.

I would talk to about my own personal journey with Kubernetes. At my previous job, Kubernetes was already there, right? So it was, gave me a good opportunity to learn it. And I made so many mistakes. Production level, mistakes, things like, you know, permanent volumes, security, networking. It's such a complex topic that it's not just like, you know, you think, oh, okay, well I did this tutorial, it works cool.

You actually deployed a production with real users real load. You'll learn soon enough that things can go drastically wrong. And it's, if you don't know what you're doing, it's hard to debug. And I'll, I'll actually, uh, send you a infamous, um, flow chart of if things go wrong in Kubernetes, here's all like literally 20 commands you have to run to figure out what's going on.

So you said Kubernetes is, is overly complex. And now I realize you're not just saying that Kubernetes to use Kubernetes is over complex. You're also saying the complexity of the environment you might be trying to orchestrate. Corresponds with the complexity of Kubernetes in use. So if you're a startup, let's say a hundred employees, you probably get by using all of AWS's services.

If your project is wildly complex, then you're gonna want the tool to task, which is gonna be probably Kubernetes.

And I definitely, before you go to Kubernetes, make sure that it's a informed decision and make sure that you talk to people who've done that, both pro and con, right? You'll have people who are passionate about Kubernetes in production, so talk to them too and get their viewpoint on this.

But it's a huge investment. Hard to hire, expensive to hire. So it's almost like a Lamborghini, right? We all love Lamborghinis, but it's hard to drive, easy to crash, eh? There's a lot of quirks, like if you Google. How to back up a Lamborghini. You can't use the rear view mirror. You actually have to open your door and physically look behind you.

We all love Lamborghinis, but make sure you know, you know how to use it.

I think another thing about Kubernetes, I always like to remember that Kubernetes was created to help run Google's infrastructure of containers. Google, right? Like it's inherently, it works very complicated. Like it's gonna be really good, but it's gonna be really complicated and it's meant for Google scale problems.

Maybe if you're successful, oh my God, I, my hat's off too. You, and you need that sort of complexity. Amazing. And you're just starting out. I, I don't know. I don't know if I necessarily agree with it.

If you start simple, you're not locked in, right? Your logic is in that container. So if you started with ECS and ECS is going to carry you pretty far to Matt's point.

If you ever get to the situation where, wow. We have Google Scale load One hats off to you. You've viewed a success. You could definitely migrate to Kubernetes. Alright,

what is ECS? Okay, I'm gonna, actually, I'm gonna read this flow chart. This may be weird in audio format. I'm gonna drop, of course, all of the images that Gerts talking about in the show notes for everybody to look it up afterwards.

But I'm gonna read this flow chart and see how it goes. It says, which AWS container service should I use? Where do you wanna run this container? On premise then Kubernetes. Yes. EKS. No, ECS. And then OpenShift, which I've never heard of. Rosa. And then it says, where do you wanna run this container in the cloud?

What will Container do? Run. Build jobs. Code build. Run. Batch jobs. Batch run apps. Serverless for real though. Yes. Lambda? No. Fargate Serverless? No. Who will manage this? Kubernetes will be E-K-S-A-W-S. Manage the, if you, I'm gonna skip that section. I'm gonna manage it. You're gonna be doing it a lot if you know then light sale.

Yes. Then EC2, and then it says if you want sort of both container on premises or in the cloud it says Greengrass. I would've actually thought we were talk, we were gonna say like a mix of Kubernetes in ECS, but we have a lot of container services that can be used here. We have E-K-S-E-C-S, Lambda, Fargate Beans, stock App Runner, light Sale, EC2 Code Build and Batch jtt Save

us.

Well, so even among people who are doing this day and day out, picking the right way to deploy containers, it's a a complex issue on itself. That's why if you're just doing machine learning and you wanna concentrate on that and not worry about this complexity, app Runner is a great way, right? You just, Hey, here's my container.

I. AWS does everything for you. Create the load balancer, create the network. All you have to do is just provide container. App Runner is a very good thing. My first time here, an app runner, god dammit, it was just introduced about a year ago, just before the pandemic, I think, or maybe right after the pandemic.

So again, that's the other thing about DevOps. Okay. What you're new five years ago may not be relevant and it's not relevant today. Right? App Runner is a very, it's their latest toy to deploy containers and they put all their knowledge of what went right and wrong with other products into App Runner.

And then you were gonna mention Elastic Beam Stock. Elastic Beam Stock is the probably one of the oldest ways to deploy container, and it does a lot for you too. So it's, uh, a little bit more mature, uh, a little bit more battle tested, requires a couple more clicks, right? But in the gay brand scheme of things, still very simple.

That Clickity clickity guides you through what you wanna do. It says, oh, hey, do you want, you know, Python, or do you want Java? You know, do you want machine learning or do you want to do, you know, web apps? So it's almost like a questionnaire that you could just click on and say, okay, got my stuff. So, coming into

this discussion, I thought it's, it's really boils down to Kuber, EKS, Kubernetes or ecs, but then we also have Lambda and Fargate.

So that's a

whole other topic, that there's this thing called serverless, and I'm gonna put the, you can't see in the podcast, but I'm gonna air quote serverless. There's, you know, servers still behind the scenes, but basically you don't have to manage the servers with Lambda. You just say, well, here's my code.

And it could either be code that you upload or it could actually be a docking container. And Lambda will say, okay, I'm going just. Charge you for actual usage. Like if, uh, five people hits your Lambda, then I'm just gonna charge you for those five users. So it's, uh, a bit of a cost savings. But Lambda is a very complex topic, and Lambda and serverless is a complex topic.

So I wouldn't recommend that for a beginner, more for cost savings and than anything else. Got it. And I wanna stress that even among people who have been doing this for a long time, I've seen so many bad implementations of lambdas, seen so many bad imple implementations of ECS, tons of bad implementations of Kubernetes, right?

It's like, oh, well that's, that's not right. It's so complex. And, and then to toss it back on me, I'm sure if you looked at my implementation of Kubernetes from a year ago, you say, uh, that's bad.

Mm. What you're doing there. One take correct to me if I'm wrong in any of this, the, the one way I've thought about Kubernetes versus hosted services, managed services in the cloud, like, like AWS Azure and GCP.

I've seen them as almost two different approaches. I have always thought of Kubernetes as a very general approach, almost the way Terraform is General cross cloud infrastructures code. I thought of Kubernetes as a way to take your Docker files and no matter where you're running it, and it can be on premise too, it in your company's data data center.

It'll run the same no matter where G-C-P-A-W-S and Azure or on-premise. And it's gonna be using open source tooling instead of the cloud providers service for that equivalent. So if you're gonna use Kubernetes, you'll be using a Postgres Docker container as opposed to if you're gonna use AWS, you'll be using RDS, their hosted database service.

So with Kubernetes, you have at your, your fingertips, all of the world of open source tooling. You know how those things work. You're not locked in, in cloud land using, let's say, Terraform on AWS. You're using their services. RDS is their implementation of relational databases. It's not open source. You don't know how their Postgres stuff happens.

In Kubernetes, it'll have its own secrets. Manager, AWS has its own secrets manager. There are pros and cons. In the case of AWS, by using their services, you get all the patches, the updates, everything's black box. So it's more managed and secure. By using Kubernetes, you have more visibility, more control.

Everything's open source and you can deploy it anywhere. Is that all right so far?

So you touch on some very good selling points of Kubernetes, right? It's supposed to be generic. So, uh, there's this command called, uh, cube Cuttle, k uh, cube, CTL, different people pronouncing this differently. Cube Control Cube, CTL.

But with that command and a yamo file, you can deploy your app, whether it's post on a w, s or T, uh, you know, GCP or Azure. At least that's to promise in, in reality. What gets you is that security model, right? So if you deploy Kubernetes in AWS in a production level environment, you're gonna have to tie into the IAM of AWS.

Ah, so you'll still be using these services even with Kubernetes if you're deployed in the cloud,

right? You just can't help it, right? Especially the security model. It's, you know, it's pervasive among all the services. Uh, and again, so let's say you take your Kubernetes cluster and your app, and you, uh, you know, you try to deploy it onto the GCP Kubernetes engine, I almost guarantee you it makes serious code that you have to access resources.

You're gonna have to hit that security model. And even if you don't, right? So let's use a perfect example, A real world app, it's going to write like an AWS, right? You'll probably write to an S3 file or S3 bucket. If you deploy this on, you know, GCP, you're probably gonna have to write it to the GCP equivalent of the S3 bucket.

So yes, Kubernetes promises making it generic and common, but in practice it's not. And then that kind of leads to this other hot topic, hot take of DevOps multicloud, right? You'll hear this a lot where, oh, our application or our product, you can deploy it on AWS or GCP or Azure. It's easy. It's not easy.

Hmm. That's a sales pitch. Anyone who tells you it's easy is trying to sell you something.

That's the idea of deploying a Kubernetes cluster on different cloud providers. How, how about Terraform with the way it orchestrates different cloud providers?

Right. So the, the good thing about Terraform is that it's just a configuration language, right?

So the, the good thing is the skillset that you learn to code Terraform on AWS. Is transferable to G ccp. One of the links that we're gonna show is a cloud compare in or compare cloud one. It's, it'll be the link, but it, it compares, okay, if you do this in AWS, here's what you have to do at G ccp, right?

Equivalent services.

So Terraform is not trying to accomplish universality, it's just trying to accomplish universal accessibility, right. To different services.

Right? And it's a common language, right? So you know how to do terraform in AWS, let's say a perfect example. Oh, here's a Terraform to create an S3 bucket in AWS.

Now I've had to create the equivalent in GCP, you say, okay, well GCP, here's the equivalent object storage. Same style of Terraform code, just the GCP flavor.

Alright, so speaking of multi-cloud, tell us about Azure and GCP.

Yep. If you're gonna learn one AWS, 90% of the companies are gonna use that if you, you know, interview for jobs, right?

You're probably gonna hit AWS Azure is the second most popular one in the business world because Microsoft doesn't want to beat AWS, they want us to beat Google. So if a, you know, a C-Suite can negotiate, well, you almost look at a w uh, Azure for free. Right? Don't just kind of give you the form just to get you on this, the, the platform.

The other one is, uh, that, the joke is if Microsoft acquires your company and you're on AWS, you won't be on AWS, you're gonna be on Azure. They're, they're gonna force you to go on there. So a lot of, I have some friends who are, uh, uh, got acquired by Microsoft and that's, they're like big project GCP, it's great.

From a UI level, GCP probably has the best UI of the three. The bad thing about GCP and Google in general is Google loves to kill products and you're not guaranteed backwards in compatibility in GCPP. Like AWS will bend over backwards to, to accommodate things you've done, you know? 10, 15 years ago. Right.

That's why you still have ECS classic. You still have, you know, the way you configured S3 buckets 18 years ago or whenever it, uh, S3 came out, you can still do it today. For the most part, Google hates backwards compatibility. That's why it's, it's the least favorite of among the three.

Have you, have you seen that then?

Have you seen services go non backwards compatible or entirely defunct?

Yep. There, there's actually a great list of GCP things, and we could find the link and post it, but there's a great list of, you know, hey, this is the things Google breaks, right? These, these are things Google Electric kill. From a UI perspective, it's, it's excellent.

It's like the best UI I've used. Uh, AWS is very clunky, UI wise.

The, the real conversation for our listeners is gonna be around the machine learning services offered by the two providers, and I think we should talk about Kubernetes Machine learning services. I personally use SageMaker on AWS, I've talked about it in the last few episodes, but just to brush up, it's a suite of machine learning tools.

It's not one machine learning tool on AWS. It offers training a model, including parallelization distributed training, monitoring the model training process when you're done deploying it to an endpoint. That endpoint can be a rest endpoint, one that you call as a batch endpoint. So if you're just gonna do one-off inferences on large amounts of data so that you're not charged for the running rest endpoint or recently serverless model calls, which is absolutely amazing.

And then a whole bunch of monitoring tools on top of your model. So as new data comes in from new users, does that new data distribution over time. Drift away from the type of data you trained on, and then it will email you if so, which is really powerful. And is there a bias? It does all this stuff for you and I'm completely unfamiliar with GCP or Azure's machine learning offerings in that domain.

Are you familiar with them at all?

Nope. Think that's where the collaboration comes. Is that like, even on AWS everything to describe is new to me. I see. Uh, and, and this is, you know, to, to two points of this podcast is one, you can't know everything, right? You can't, such a complex field, and especially, you know, if you're trying to learn both DevOps and machine learning, that's a big apple to bite, right?

You almost wanna pick one or the other. The other thing is collaboration. Where, oh, wow. Look, SageMaker can do all these things. And you know, if you go to a DevOps team and you explain that, right, it's probably gonna be new for 99.9% of them. So it, it requires a collaboration of, oh, hey, here's all the stuff you can do in SageMaker.

Let me show it to you what I did in Dev. And then the DevOps team can take a look at this and say, okay, well we need to secure this. You know, we need to, uh, do this with a container, install patches, things like that. Change your docker file to install this agent patch, whatnot. So it's a collaboration and a learning experience for both sides.

But yeah, from machine and learning perspective, I'm very ignorant and I'm hoping to, uh, to get better by listening to your podcast. Hey, back to the, uh, archives and, and learning

about it. Go through, uh, you might like the last two episodes on SageMaker. Everything prior to that is sort of machine learning basics.

There's not, not a lot of cloud anything until the very end there. And I do wanna mention, uh, Kubernetes, uh, a future episode. I'll talk about the competitive open source. Machine learning, pipelining tooling, because I just mentioned SageMaker. It has a whole suite of pipelining tooling, but it's all hosted in the cloud by Debus for you.

Kubernetes has its own, it's called cube flow and it can orchestrate a pipeline of machine learning tooling for you. That includes the data engineering, the data ingestion, engineering transformation, and then various forks in the road like analytics and stuff like this. So if you enjoy SageMaker and you wanna do it open source or run anywhere, you might look into Kubernetes cube flow.

There's other projects out there like Apache Airflow and some others.

Oh, the one thing I do wanna mention, 'cause it, uh, comes up a lot, is cost. And that's, you know, if you are deployed to a production, definitely your CIO or and your CTO and your CFO is gonna say, hey. What's this expense, but it's really dangerous when you deploy it to your personal account.

And I've got hit with this a lot that, oh, okay, Kubernetes, this is cool. I deploy it to my personal account, leave it on for like a day or two, 'cause I forget to tear it down. Next thing you know, you get a bill for, you know, 80 bucks, a hundred bucks. And that can be a surprise. So the one thing I definitely recommend is if you're trying to play with this, uh, on your personal account, every cloud provider has a, a bill monitor that says, oh, you've spent more than 20 bucks.

I'm going to email you. You know, you spent more than 80 bucks, almost like a credit card, right? I'm going to email you and inform you. So definitely before you do anything, turn that on or else, you know, you may get a surprise at the end of the month. So we

talked a lot about a lot of tooling thus far. A a lot of the history of DevOps, and we alluded to that.

DevOps might not wanna be a role that you take on as a machine learning engineer that it would be too big a pill to swallow. Stick to your guns in the machine learning space. Learn the SageMaker tooling through and through. So for my listeners, the machine learning engineers, let's say they create an AWS account.

They spin up a SageMaker project, they open SageMaker Studio, Jupyter Notebook. They start typing away training a K Os model. They learn, they, they really learn SageMaker nuts and bolts and they, maybe they can even deploy a SageMaker model now as a rest endpoint using Model Deploy. Now that machine learning engineer may be a, a one man band, but they still wanna get that this model out into the wild.

Maybe they want a a web front end, a ui, a react front end on an S3 bucket with CloudFront distribution in a route 53 URL. And that front end's gonna have to. Communicate with some backend, let's say a, a fargate python fast, API server that then kicks off the machine learning, uh, model, inference, endpoint.

So everything I just said is, I would consider totally doable by, by my listeners, maybe they would orchestrate that entire stack I just mentioned using terraform. So where does the, where does that wall start effectively? What would be a good suite of named tools in AWS presumably using Terraform that my listeners could get started with to actually deploy a product?

And then where does that wall begin? Where now you actually wanna really consider having a DevOps professional helping out with the rest of this process? I.

Yeah, that's a great question. So I don't want to dissuade anyone from learning Terraform or Docker an or Kubernetes. I just want to, uh, highlight that some of these tools may have a steeper learning curve than, uh, others.

That, and actually that's how I learned too, right? So I started my career as a, a job at Dev and I, I thought, oh, wow, Ansible looks pretty cool. I wanna play with that. Terraform looks cool. I wanna play with that. And eventually, it became my career. So I don't wanna dissuade anyone from learning it. The, for me, the demarcation point is security.

How valuable is your data, or how secure is your data if your data got hacked, leaked into the wild? Is it okay? Oh, it's just, you know, public data, it's no big deal. Or is it, you know, sensitive PII data, medical data, consumer data, identifiable data. If the answer is, if my, this data got leaked, it's bad kind for me and my company.

That's when you wanna hire a professional to take a look and make sure that the proper security precautions are meant.

Yeah, and I'll just add on top of that, like, you know, if you're gonna play around with a development instance with some fake data, that's great. You should do that. But, and if you're gonna release it to some friends or friendly people, you should do that.

We are try not trying to scare you, but like Ger had said, you have anything identifiable there. You definitely want somebody's eyes on it. From a security perspective, there's a lot of very easy mistakes that you can make, and you've seen this in the news where different corporations have leaks. You don't want to be one of them.

Even the people that are really, really good at this stuff, things slip through the cracks and it's just prudent, even for the best people to have a professional set of eyes look at the security of things. The other thing that just to consider is like if you're just using their EC2 instances. You want to make sure those things are locked down and people can't just go and use them for something like crypto mining.

I know that's kind of, it's a little bit paranoid, but that's a real thing. If you leak your keys or you know, open a port that you shouldn't, people can hijack that machine and run up your bill pretty intensely. So you gotta watch out for that type of stuff too. That's a more exception. Shouldn't scare you.

It's just things to be aware of.

And a lot of these DevOps tools with the infrastructures code and the configuration management, checking it into, into a repository, it's about working with other people, working with other teams. If you're a one person, uh, shop show, then you're a one person show. You don't have to document, you don't have to communicate.

So experiment, try it, learn it, you know, uh, we don't want to say, you know, you can't do this, right? You can't do this. Just be aware of the complexity out front, but like me, you may learn it and, and, uh, well, I, I like this more than Java development, so I'm gonna change my career a bit. Your career's a

journey.

Alright, so, so security, that's a demarcator or even like the complexity if, if you're in over your head and you'll know it when you see it. But as far as tinkering and deploying your project, as long as you're not dealing with secure data. PII, what does that, what does that stand for? Personal, personally identify personally identifiable information.

Yep.

Okay. So if you're dealing with PII, if you're dealing with security, get a professional, get these guys, Matt and j watch higher depth. And if you're just tinkering around and you just wanted to have some fun, you built your machine learning model on SageMaker, you deployed it as a rest endpoint. Matt, can you guide us through an architecture?

Can you talk us through like what services then? Our machine learning engineer would want to use on AWS to expose this model to the internet.

Yeah. So you already said it's SageMaker. I mean, if you asked me to come up with like a beard napkin template of, of something, I'd say throw your static site on S3 fronted by CloudFront.

Put your, I think you mentioned maybe it's a Python service. Put that in a Docker container running on a CS. Hook it up to a relational database service. I'm gonna presume you need to have some kind of state stored there. We always use Postgres. We really love Postgres. And then, yeah, Terraform that guy up.

I'm curious we if j what would change any of that though?

Yeah, those are all great recommendations. My advice is, especially if it's not secure sensitive data experiment, that's how we all learn, right? Try it out, see what happens. And one of the good things about AWS is that there's a lot of documentation, sometimes too much, but pick one, try it out and see.

But the advice Matt gave is, uh, is excellent. It's spot on. And one more thing to

add that we totally forgot, is that you should really put a automated, build a deploy pipeline in place too. So using something like Code Pipeline or you know, CircleCI or something like that, choose your flavor. That is something that will pay off in spas.

It's not a hundred percent necessary, but you'll find that it's gonna help you quite a bit. Take your code from source control, push it automatically out to your infrastructure. It's extra work though. Talk a little bit about that. So this is just CI ICD?

Yeah. Yes.

Continuous integration. Continuous

deployment.

Yep. Speed. Run the process. Like what do they get? Commit. Yeah. Well,

generally, I mean, there's, there's a million different flavors of it, but in general, it will either watch or periodically build or you can manually trigger from your source code repository. It will assemble your code, run tests, run security scans, whatever you need done to assemble your code.

Package it up into what's usually generically referred to as an artifact, which is package of files. Basically. That's the, you know, usually the continuous integration piece. And then the continuous deployment piece is taking that package and pushing it out to your infrastructure and turning it on and releasing it to users.

So it's really important to do that, to make sure that that process is repeatable, that you're not, it's not prone to human error, and it also just saves you time. I've seen a lot of projects in the past that, that skimp on that step in the beginning, and you just end up paying for it as you go along. So it's, it's usually kind of like my number one piece of advice is don't skim on your CICD.

When you start up, it, it, it is a, a bigger investment. It's not something you wanna learn along with everything else, but it is something, as you think about deploying a piece of code out to the wild, if you're gonna do it a bunch and this is gonna be become a real thing, that's definitely something that you want to invest in that.

And it's probably something that's from a, a selfish perspective you want to do too, right? This is how actually I learned automation, do some java code, Hey, works locally. Now I want to deploy to, to dev so that, you know, the QA folks and other devs can take a look at it. You know, I do it manually once. Oh great.

I do it manually twice. Okay. After 10 times. Like, well I'm not being paid by the hour here, so let me just automate this so I could just, you know, check it in. And then a machine behind the scenes does it for me. And I, I can go get some coffee.

Awesome. Thanks guys. Alright, well ship it IO podcast, another of Deb's podcast network link in the show notes, likes to do picks at the end of their episodes and I like that.

And if pick Matt, what is a pick?

Pick is just, it could be anything. Anything that you are currently into that you're digging lately. It doesn't have to be technology related. I tend to pick things that aren't technology related. I dunno. So I can go first if you want. All right. My pick is, uh, the show succession on uh, HBO Max or HBO.

I am late to the game on this. Some people are probably gonna laugh at me, but that show is fantastic. Jeremy Strong, the guy that plays Kendall Roy on that show is amazing. So I will pass to

Tyler. All right. Mine will be at propo of this episode actually. I created a Terraform script that spins up a cloud game streaming service, DIY, cloud gaming.

Go to, uh, my GitHub slash left near LEFN ie slash DIY Cloud Gaming Little Terraform script. Or you could just Google, uh, quest two AWS. They wrote a blog post on how this whole thing works, and what you do is you spin up a Windows 19 server on AWS that has N Nvidia graphics card drivers installed, ready for VR gaming.

You install virtual desktop on the server, and you connect to that virtual desktop instance using Oculus. Quest Twos virtual desktop app. Now you can do aaa, VR gaming without a pc, so you can play half-Life. Alex as Guard's, wrath, loan, echo, all the, all the really impressive VR titles that can only be played with a gaming PC can now be played without a gaming pc.

I'm keeping my eye on Nvidia with their GForce. Now service looks like they've got their Eye on Cloud vr. They're calling it Cloud xr. They have a server and client app that are actually in the works, so keep an eye on them. But you can actually start gaming without a PC and without any of these other services.

All DIY, maybe your first get your hands dirty with Terraform. All right, t well, mine's going be,

uh, a little boring. But there's this, uh, language called Go. And actually that's what Terraform is, uh, behind the scenes written in. And it's a very c like language. So I mentioned in the podcast, Hey, I used to be a Java developer.

Then I did, uh, some Terraform and Ansible. And looks pretty cool. And I pivot my career to towards that. After doing DevOps for a while, I was like, I kind of miss programming, like real programming. So learning Go, which is a new programming language, is my current thing. And it kind of ties into DevOps 'cause a lot of DevOps tools, especially from HashiCorp, it's written Go.

So I have to hack into Terraform

to do that. That's a fun language. There's something very freeing about actually creating a native executable and not needing anything else to run. It.

Isn't Docker or Docker composed Go based under the hood.

So Docker, so Doc Docker composed is a interesting 'cause Docker Compose used to be written Python, and then I think they converted like the, the latest version of Docker Compose is go based now.

I believe you're correct. It's, uh, it was from a Python to go and, yeah. And the cool thing is that now the, the reason Docker compose used to not work very well with Docker itself. It's because it was written two different languages, couldn't share libraries and things like that.

Now it can, I have a simple, when, when a friends ask if they want to get into dev, where should they start?

And I say, what kind of dev we wanna do web or apps? JavaScript. Wanna do data stuff? Python, you wanna do server stuff? Go. That's my quick do. Did I ruffle any feathers? Any

alternative statement? I just would've said Python and JavaScript. Just Python and JavaScript like Python for your backend. And then JavaScript, you'll never go wrong like JavaScript.

That's my, you didn't ruffle my feathers though. It's just my answer.

So it's funny because I had this exact same, my, my son is in college and he's about to graduate. So he is trying to enter the job market, right? And he has a computer science degree, and they taught him c plus plus as the main language, right?

It's a very traditional computer science education. And I told him, well, with c plus plus, your, your job market's a little limited. So I actually took him to Python, because Python, you could use it from like a real developer language. You could also, if you want to go to DevOps, you can, you know, Python is a very good system language too.

So I, I recommended Python to him. So I was teaching him, you know, talking about learning, right? I said, well, actually, if you want a job, let me just teach you Terraform and AWS. I could teach you enough that you can pass a intro level interview and, you know, hopefully get then fake

it till you make it, buddy.

Oh, anybody who wants to learn this stuff. A cloud guru. Oh, yeah, yeah, yeah. So they have, uh, lessons on cloud. So Azure, GCP and AWS and their lessons are pretty formal and dialed in because they're intended for you to pass specific certification tests first of not all of them, right? Yep. Uh, they have like a learning track where they're, they're like these really dialed professional video courses.

And so I'll post a link to their learning track flow chart, one of which. Takes you down the path of data science and machine learning specifically by way of AWS Obviously a lot of SageMaker stuff, but the whole, the whole path is like really professionally done. And, and it's how I've learned, it's, it's entirely how I've learned AWS

and the second, a car guru.

'cause my son is going through a similar path of learning. Initially I was trying to teach him, but that didn't turn out the greatest. So then he went to YouTube, he went to official AWS videos, O'Reilly Learning. Eventually the Car Guru is the one he likes the best. You know, that's, that's how I added him in the right direction.