AWS Local Development | Machine Learning Podcast

MLA 017 AWS Local Development
Nov 05, 2021
Click to Play Episode
Developing on AWS first (SageMaker or other)
Show Notes
Consider developing against AWS as your local development environment, rather than only your cloud deployment environment. Solutions:
Stick to AWS Cloud IDEs (Lambda, SageMaker Studio, Cloud9
Connect to deployed infrastructure via Client VPN
LocalStack
Infrastructure as Code
Transcript
[00:00:00] Welcome back to machine learning applied today. We're going to be talking about developing within the AWS environment. in other words, using AWS as your local development environment.

[00:00:13] Now, before we get started, let's reflect back on a prior episode about Docker. I did an episode where I said you can package up an environment and its dependencies and the projects source code into a Docker container and deploy that Docker container to the cloud. What we do is we write a Docker file, literally called Dockerfile with a capital D at the top of that Docker file, you specify the operating system you're going to be using.

[00:00:41] And then within the Docker file, you're going to specify any number of operating system level packages. You want to install like ffmpeg, or Cuda, CuDNN and you might install some PIP packages. You can either directly inline the Docker file, say PIP, install X, Y, and Z.

[00:00:58] Or you can have a requirements.txt file that gets copied into the Docker container. And then that thing gets kicked off with a PIP install of the requirements.txt.

[00:01:07] and then what you'll see in the Docker files is copy some host directory to the container directory, all capitals C O P Y copy space, the location of the source code on your computer. relative path. So if you're working within your project root and at your project root there are a handful of miscellaneous files.

[00:01:31] And then within the source directory, that's actually the source code for your project, all your Python files. And then within the Docker container, it's expected to be running out of the forward slash app directory. What you'll see is copy forward slash source space forward slash app. And what that will do is it will take your Python files from the source directory and copy them into the app directory of the Docker container.

[00:01:58] And then you'll typically have a docker-compose.yml file. And that file will spin up a bunch of different Docker containers. It will handle the orchestration of multiple Docker files at a small scale. If you really want to be deploying lots of Docker containers at scale and handle the networking of them all together, you'll be using Kubernetes, not Docker compose, Docker compose is for very simplistic environments like on local hosts.

[00:02:26] And Kubernetes is for large scale deployment of Docker containers to the cloud. But we're not talking about Kubernetes here. We'll talk about that. In a later episode, it's sort of mutually exclusive to the type of dev ops that I'm talking about in this episode, in the last two episodes, local hosts, you have a Docker compose file. It spins up a bunch of Docker containers and it handles the networking of those containers to each other. Now, as you're developing on localhost against your Docker containers, what you'll see in those Docker compose files is volume mounts mounting.

[00:02:59] It will take your local source directory and it will mirror it into the running Docker container. So the copy command will actually copy the files into the Docker container. One time, snapshot, fire, and forget, but the volume directive will Mount your local directory into your Docker container.

[00:03:23] Destination. And then that way you can actually develop within your Docker container on localhost, as you're programming in pie charm on your local host your changes to your source files, get mirrored to those source files location in the Docker container. So it makes editing your files on localhost while you're developing a breeze.

[00:03:46] But that copy directive is the thing that will actually be used when you actually build your Docker container, you run Docker build, and then you push that up to the cloud. That copy directive is what takes a snapshot of your files in the source directory. It copies them over to that location in the Docker container.

[00:04:06] One time builds the Docker container and then pushes that to the cloud. So the copy command is a one-time deal. The finalization of copying your files over into the Docker container. But the volume directive allows you to mirror your development environment into the Docker container so that you can edit within your Docker container on your local environment.

[00:04:27] now this seems like a lot of weird Docker stuff before getting into AWS, but you'll see why there's an analogy to this Docker and Docker compose stuff. When we're talking about developing on AWS in your local environment,

[00:04:40] the other reason I'm bringing up Docker again in this episode Is that Docker is still going to be used quite extensively in AWS. And in this episode, when I talk about using AWS as your local environment, Docker is used very extensively all over the cloud, all over the internet, all over AWS, even when you're using their managed services, like AWS Lambda, AWS, SageMaker, ECS Fargate.

[00:05:06] These are all things that you can use Docker within. And it is recommended that if you have that option that you do. So for example, an AWS Lambda, which lets you deploy single Python functions to the cloud as either a rest endpoint or some Python function that you'll be calling one-offs within your tech stack.

[00:05:27] Well, you can write your Python function and provide along with it. A requirements.txt file and deploy that as a Lambda. but if you have a lot of requirements in your requirements dot text file, It is preferred that instead you package this all up as a Docker container and you deploy your Lambda function as a Docker container, And that's for various technical reasons. one is that if you deploy Lambda functions in the traditional way, which is actually to build those PIP requirements on localhost, zip them, put them into a S3 bucket as a zip file. And then. You deploy your Lambda function from there, which is pretty sloppy in my opinion.

[00:06:11] And also there's a file size cap. And if you have a lot of requirements, you're going to exceed that cap very easily. and oftentimes the default Lambda environment say for example, the Python 3.8 Lambda environment doesn't have the same sort of operating system set up that is needed for a lot of these PIP installs Numpy being a huge one, Numpy doesn't work out of the box in the default Lambda environment.

[00:06:37] So getting it working in a default Lambda environment is kind of a pain in the neck instead, just dockerize your app. And you'll use that Docker container as your Lambda function because installing Numpy within a Docker container, well, it's all done for you. That's the magic of Docker containers.

[00:06:52] They're siloed environments at the operating system level.

[00:06:55] Okay. So a bunch of Docker buildups. So now let's talk about AWS. Here's a pain point that I suffer from when I want to deploy Gnothi to the cloud. Well, I have a Docker composed YML file on local host.

[00:07:08] It has a Postgres Docker container, a fast API Docker container, and a client Docker container. And the server is able to communicate effectively with the database just fine because they're sharing the same network bridge. It just accesses the port on localhost 5432 and the client is able to access the server because again, the containers, the fast API Docker container sets up all of the proxying of network requests and exposing the ports to localhost on my behalf.

[00:07:40] And so the client hits the server. The server hits the database and then database back to server back to client fine and dandy on local host. But when it comes time to deploy this to the server, that's not how AWS operates. AWS is much more complex than that. In order to get my server Docker container in the cloud, I have to first push the Docker container to ECR elastic container registry on AWS.

[00:08:09] Then I need to set up AWS Fargate, which is a sub service of ECS elastic container services on AWS ECS is for Docker stuff is for deploying your Docker containers in an AWS managed environment. I have to pull from ECR to fargate and then set up my whole far gate stuff okay, so step one was ECR. Step two is setting up far gate step three is getting network requests to my Fargate container. Now, how do I do that on AWS? You'll be using elastic load balancer, ALB, or you'll be using API gateway. And then both of those services need to be set up for handling network requests. proxying HTTP requests on down to your Docker container and those services have to be tied to a domain name.

[00:09:04] That domain name is going to come from AWS, route 53. And if you want SSL, if you want HTTPS then you're also going to need to use the AWS ACM service,

[00:09:17] Amazon certificates manager.

[00:09:19] My database is going to be deployed to RDS that's relational database service and exposing that database to my Docker container on far gate is no small ordeal. I have to set everything up within an AWS VPC and it's a virtual private cloud.

[00:09:40] I need a private, sub-net a public subnet with a internet gateway. The server needs to be placed within the private sub-net and in order to have outbound traffic, it also needs a NAT g ateway

[00:09:54] the database needs to be within the private sub-net and also needs to have database subnets associated with it. All the parts within this stack need to have security groups set up so that they can communicate with each other. And then the part that is exposed to the internet, either application, load balancer or API gateway is going to be within the public sub-net.

[00:10:19] So that was a whirlwind. Did that confuse you? Good. That's the point I'm trying to make? Is that everything you do on localhost is almost an entirely different language to how you're going to be putting this into the cloud. How are you going to be putting this on AWS? and so what I offer to you to do instead is stop developing on local host in a Docker compose file unprepared for AWS when that time comes and that time will come instead, develop everything directly within AWS, develop your entire environment from the get-go in an AWS tech stack.

[00:11:03] Now how the heck would you do this? You have your local computer. And then AWS is the cloud because that's for hosting things in the cloud in in a production or a staging environment. What do you mean develop in AWS? I'm developing on local host and AWS is the internet. That doesn't make sense. Well, it kind of doesn't make sense, but there are ways to do this, and that's what we're going to be discussing here.

[00:11:26] There are a number of ways to do this. And so I'm going to tackle them bit by bit. The first is the most obvious way to handle this. And that is to set up your environment in the cloud in AWS. First, what you'll do is you'll go on AWS console. You'll create an account, go into the console. And then the first step is to go into AWS VPC, A VPC or a virtual private cloud is sort of the entry points to all your AWS stuff. It's what encapsulates your services together into a single unit and allows them to communicate with each other through networking and keeps them private from the internet and safe and secure. And then you'll decide how you're going to expose that VPC to the internet as the case may be.

[00:12:13] So, for example, if you want a rest server or a web sockets server or graph QL, you'll set up your VPC by way of security groups and subnets and internet gateways and Nat gateways and all these things such that one port is exposed to the internet and strung up to some service that service may be EC2 or ECS or Fargate or Lambda.

[00:12:37] So you go and you create your VPC. You set up your subnets and your security groups, then you go on over to the IAM console and you set up your users and you set up your user and some permission stuff. then you head on over to AWS Lambda.

[00:12:53] And then from within Lambda, you can write your Python code as one-off Python functions. And then you tie that up to API gateway and you expose API gateway to the internet. and what's cool about Lambda is that there's actually a code environment on the Lambda console. You can actually write your code for your Lambda functions.

[00:13:16] In the web browser using their Python IDE or their JavaScript IDE. So this is option number one, option. Number one is not having a local environment whatsoever. You write all your code in the cloud in these Lambda function handlers, because there is an IDE on Lambda that lets you edit your code, your Python code, and then deploy these Lambda functions, test them, et cetera.

[00:13:41] There's also, an AWS service called cloud nine. It's an entire IDE it's sort of replaces Atom on the local host or PyCharm on localhost. Now cloud nine does not hold a candle to PyCharm. Let's be clear. Cloud nine is meant for simplistic editing of code in the cloud on AWS.

[00:14:03] But it is another way to handling this option. One I'm discussing, which is not having a local host whatsoever, and instead doing everything in the browser on your AWS console. So cloud nine, you can get clone, a git repository from GitHub and you can edit your files on cloud nine. And it keeps track of course of changed files from get.

[00:14:27] So then you can, after you make those changes, get commit and push those changes. And Cloud9 is strung up to other AWS services like ECR and Lambda so that it lets you run your services for testing your Python functions. So if you were serious about editing your code in the cloud, you would not be using the Lambda code editor as your bread and butter.

[00:14:52] You'd be using cloud nine. To edit your Python files and string those up to Lambda within cloud nine's functionality for connecting your code to Lambda. The Lambda code editor is more for quick edits to your Lambda functions. And then of course, per the last two episodes I discussed that SageMaker studio has I Python notebooks in the studio environment.

[00:15:19] So you can edit and run and test and deploy your machine learning code in the SageMaker studio. It's using a different IDE than the cloud nine IDE It's using iPython, these Jupiter notebooks. So Amazon didn't write the Jupiter stack.

[00:15:36] It's just, what's commonly used by data science and machine learning engineers. So they just set up an environment on SageMaker so that you can use the tools that you're familiar with. Namely Jupiter notebooks, I Python notebooks. So if you're writing, testing, running, and deploying your machine learning code, you can do that all in the cloud on SageMaker studio, in I Python notebooks.

[00:16:01] And if you're doing the same for web app for server application code, you can do that on cloud nine Now that's a cool option, developing everything in the cloud. There's a handful of benefits there. One is that you're not tied to your computer. You're not stuck with a specific operating system supporting some functionality that you need, or let's say you have a work computer and a home computer, and you want to be able to pop back into your development environment. Well, it's in the cloud. So it's going to be propagated from your desktop at work to your desktop at home.

[00:16:36] Cause it's just running in a web browser or something. I faced recently, actually my computer died, my Mac died. And so I had to transfer my environment over to my PC. And that was kind of a pain in the butt. If I had set this all up in cloud nine and SageMaker studio, I wouldn't have to deal with that.

[00:16:51] The downside of this setup is that Cloud9's and SageMaker studio. These are not. Enterprise grade IDEs these are not super powerful IDEs, and a lot of times we really like our local host environment. We like the tooling that we have on our computer. I love pie charm and I love data grip. Data grip it's for database management it's a JetBrains IDE for data. I can browse the tables. I can look at the table metadata, I can run SQL queries. I can sort edit inline, et cetera. And then PyCharm is my preferred IDE for Python development.

[00:17:30] It is just wonderfully powerful cloud nine in SageMaker studio. Don't hold a candle to pie charm when it comes to just raw Python editing capabilities. So I'd still rather use these things. And therefore, I don't use option one that I just discussed instead, let's move on to option two. Option two is that you set up everything in the cloud, like I just mentioned, you have your VPC and your IAM roles and your IAM user, you have your RDS database, your Lambda functions, you have your ECR Docker con Docker repository, API gateway, blah, blah, blah.

[00:18:06] But when you want to edit your code on local host and run some unit tests against services in your tech stack on AWS, you have to take one step. And that is to connect your local computer to your AWS tech stacks, VPC by way of something called a client VPN. So you can set up something called a client VPN.

[00:18:37] And what that does is it allows you to connect your local computer over a tunnel an internet tunnel to your AWS VPC. And then now when you want to connect to a database in your VPC on AWS, your RDS database, you can do so because you're operating within the VPC, you wouldn't otherwise be able to do that because your database wants to be contained within a private VPC.

[00:19:05] It doesn't want to be exposed to the internet. You only want your database to be accessible to services, running as servers, whether as Lambda functions or on far gate or ECS or , and those servers also want to be within the private sub-net of your VPC and only accessible to the internet by way of the internet gateway On the public subnet, and that is tied into either API gateway or application load balancer. So you don't have access to these resources, these services on local host, but you can get access to them if you connect to your VPC by way of a client VPN.

[00:19:48] So what would you do? Well, you would write your Lambda functions as Python functions. You would test those functions. Now these functions may be making requests to a database on RDS, or they may be. Grabbing secret access keys using AWS secrets manager. That's one way to store private data like database username, and database password.

[00:20:12] These are not things you want to store in a config.json file, or as environment variables. These are things you want to store in a secrets manager, which handles secret rotation and encryption and all these things automatically for you. Another thing that you don't really think of until you're starting to push your stuff to AWS.

[00:20:32] so your Python function can access your RDS database and your secrets manager secrets and maybe SQS and SNS, and these other AWS services. And you can write your Python function and you can run a unit test against that Python function. And it will work because you're operating within the VPC of the deployed AWS environment, running in the cloud.

[00:20:52] And then when it comes time to actually deploy those functions, then you do the step of packaging up those Python functions into zip files or Docker containers and pushing them up to AWS Lambda.

[00:21:07] And then finally, the third option, the last option is there is a service called local stack local stack. It is a open source project that replicates the AWS tech stack. And it's all running on Docker containers on localhost, you have a docker-compose.yml file that spins up a whole bunch of local stack services.

[00:21:35] And those services are running on local host, not on AWS, on local host and they are fake AWS services. So you have a local stack S3 service. So what you do is you'd go into your docker-compose.yml file. You'd enable the S3 service. You'd say Docker, compose up. And local stack, will spin up a Docker container whose end points that you will be accessing with your project, replicate everything that AWS is S3 service offers same for ECS and ECR and code, deploy, and code pipeline and RDS and every service.

[00:22:20] And almost every service that AWS offers. That's actually quite an undertaking. That's a really powerful service and I'm surprised that it works as well as it does that. They're accounting for all of these end points that you're making calls to the AWS stack by way of rest calls or CLI calls, the AWS CLI or Bodo three calls.

[00:22:46] This local stack project tries to replicate all AWS services and all calls that can be made to those services and also replicate how those services run all on local host using Docker containers. It's pretty impressive. And it's, a daunting and overwhelming undertaking. There's gotta be some loose ends.

[00:23:14] I imagine within the local stack project, I haven't used it extensively, but the amount to which I have used it has worked surprisingly well. So I'm going to keep using it for the time being. but one thing to note is that it has a free tier and a paid tier. And the free tier is all the typical serverless stuff like Lambda and S3 and dynamo DB, and the paid tier is all the rest of the stuff like RDS and ACM and route 53 and all these things.

[00:23:45] And it's 15 bucks a month if I recall correctly. And so if you're going to be using local stack extensively for setting up your AWS stack, you'll definitely want to use their pro version. And the benefit of using local stack over an actually deployed AWS stack is one cost savings because if you're deploying an AWS stack and you're getting into that stack from local host by way of a client VPC, then running these services in the cloud may cost you quite a lot.

[00:24:19] For example, RDS the database service, that's kind of an expensive service where previously I was running my Postgres database as a local Docker container. Now I'm running it actually as a hosted RDS. Database on AWS and the main reason being that I want to develop against what I'll actually be interfacing with in the real world, when I'm developing against a local Docker container hosting PostgreSQL, well, that's really easy to work with.

[00:24:54] it bypasses all the stuff that I would need to set up in the cloud. So instead I deploy an RDS instance to AWS. I client VPN into that VPC. And I actually have to make sure that my IAM policies and my subnets and my security groups are all set up correctly so that I can connect to the database.

[00:25:16] So it's making sure I do things right. The first time Measure twice cut once and running that RDS database is a little bit expensive. It's not terrible, but it's a little bit expensive. But if instead I ran all that on local host using local stack, which is using Docker containers, but the environment is set up such that it acts the way RDS would act in the cloud.

[00:25:38] So I still do need to set up those VPCs and subnets and security groups in order to connect to my local RDS database. Then I'm saving the cost of running that database in the cloud on paying 15 bucks a month for the pro version. But that's still cheaper than what I'd be paying for spinning up an RDS instance multiple times throughout the week in the cloud.

[00:25:59] And it all runs a lot faster because it's running on local hosts. There's low latency in making these network requests to local hosts, as opposed to the cloud. Connecting it and interfacing with these services will be a lot faster on local hosts as opposed to client VPN into a VPC.

[00:26:18] And if I make changes to my AWS stack by way of Terraform, for example, which I'm going to be talking about in a bit here Then deploying those changes is a lot faster on local host using local stack than it is on AWS. When you actually kick off infrastructure changes on AWS, It's actually like bringing online hardware and moving things around from one physical location to another physical location so that your architecture changes get reflected in the cloud. Before it's available to you to now VPN back into, to test your changes, doing that all on localhost through local stack, it just brings all the Docker containers down and brings them back up the way it needs to.

[00:27:03] And it's really fast. It's really easy to deploy your infrastructure changes using Terraform on local host with local.

[00:27:10] Now I brought up before the idea of mounting your source directory into your Docker container, as opposed to copying the source into the Docker container for a build. There is essentially an equivalent of that for local stack.

[00:27:25] There's a way in local stack for mounting your local host source files to your Lambda functions that are being deployed as Lambda rest end points in your local stack infrastructure. And that. Is the key here to treating AWS as a local development environment, because otherwise, without being able to Mount your code to Lambda functions that are quote unquote deployed, you're going to have to basically write some code and then deploy it to Lambda.

[00:27:59] Did it work? No. Okay. Tweak that code, deploy it to Lambda, did it work? Man that's going to be a really painful process. So another huge benefit of local stack is that it allows you to Mount your local Python functions, to your quote unquote deployed Lambda functions. And that way you can continue to inline edit your Lambda functions in PI charm while developing, running, and testing your code.

[00:28:23] So those are your three options. We have develop everything in the cloud. You don't even use an IDE. You use cloud nine and SageMaker studio that's option one.

[00:28:34] Option two is spin everything up in the cloud, but develop on local host. But while developing on local host, you're still connecting to services in the cloud. You're not connecting to running Docker instances on local host. No you're connecting to running AWS services or Docker containers running on AWS services.

[00:28:57] And you're connecting to those by way of a client VPN . And then the third option is that you're running everything on local hosts in local stack, which replicates the AWS tech stack. And now you can access it all on localhost and you can actually connect to it presumably with a client VPC as well. So if you wanted to set up your client VPC on local host to local stack so that you can do that again in your cloud environment, when it comes time, you want to just do a few quick tests against your deployed environment before kicking everything off, one other major benefit of developing against AWS first is that you become acquainted with the offerings of AWS. And you start to realize that there are a lot more offerings on AWS that can replace things that you would otherwise use in your project as PIP packages, for example.

[00:29:50] And that will actually get the job done better, more securely and hosted. And that if you had developed everything on local hosts in the Docker container, you might not have known about you might have forgone. And then when it comes time to deploy to the cloud, you wish you would have used that AWS service instead.

[00:30:09] So for example, in the SageMaker episodes, I talked about using data Wrangler. to transform your features and impute missing data and all these things. Well, if I had known about data Wrangler, when I was writing a lot of my machine learning code in the past, I would not have done that custom by hand in pandas because data Wranglers handling of it is going to be more robust, simpler to set up and scalable, importantly, with data coming in from some stream of the data lake I'm going to be using.

[00:30:44] Eventually deploying no fee, I'm going to need that data pipeline to scale. And the way it's written now is not scalable. so I will be transferring my custom feature transformation code to data pipeline so that it can scale so that there's various points in the pipeline I can tap into.

[00:31:03] for example, getting the journal embeddings at different entry points of my machine, learning architecture. I wish I'd have known about that first cause I wouldn't have written it the way I did. I assumed that this needed to be done in Python. another component.

[00:31:20] SageMaker experiments. They're hyper parameter optimization capabilities. I would rather had, I known have used experiments so that they can distribute and scale running my hyper parameter optimization jobs rather than the way I handle them all on one computer at present in my GPU based Docker container.

[00:31:41] Well, it doesn't stop at SageMaker. There's a whole bunch of services on AWS that can replace components of what would be, for example, your Python server. So Another Docker container in Gnothi is the entire web server. The whole thing is running on fast API and fast API handles a whole lot of things for you.

[00:32:02] And there are third party plugins for fast API for handling other things. One example is the proxying of requests. Well, API gateway or application load balancer will do that automatically for you. You don't need to be aware of like engine X and how we're moving http requests from the internet to your container.

[00:32:24] Another thing is actually load balancing, or for example, let's say you want some rest endpoint of your server to be throttled so that any one user on the internet can't hit that endpoint a million times a minute, Well, API gateway has that built in. You can just click a checkbox and say throttle this route.

[00:32:45] We only want one user to be able to hit this endpoint X amount of times per second, where previously I was using a fast API plugin for throttling specific rest routes. The plugin is called slow API. Get it fast API. Let's slow them down. Well, I'm going to gut that and I'm going to use API gateways built in handling of throttling.

[00:33:07] And then finally, another example is the storing of user accounts. This one's really important. Currently, I'm using a fast API plugin that handles storing user accounts to the database, their username and hashed password and email invalidating them as necessary.

[00:33:25] Sending forgot passwords if necessary, sending a activation code And I have to string all that up myself. I have to specify those auth routes as throttled using slow API. And I have to send up a JWT jot, we call it, which is sort of the authentication token that's commonly used in the web development space.

[00:33:52] When you log into a user account on the server, a jot gets sent down to the client that gets stored somewhere like a HTTP only cookie on the web browser. And then you send up that JWT to your server to authenticate rest requests for that user. And you also have to handle invalidating that jot if the user changes their password, you have to handle expiring that jot every five minutes or so, and then requesting a refresh token and all this stuff.

[00:34:24] All of that stuff around authentication took me weeks, even with the plugins and tooling available to me from the fast API ecosystem. And I'm just not comfortable handling authentication on my own. Even with it being managed as open source projects, I would rather there be a service dedicated to user account storage and authentication and all of the security and the emails of new account, forgot password and activation.

[00:34:55] And. Invalidation and all these things. I want a service that does this automatically for me. And lo and behold, AWS has a service called Cognito. I did not know that. I didn't know it until I started putting a lot of my stuff on AWS. If I had known that in advance, I would have developed the user account system in Cognito.

[00:35:15] Now I am migrating the user account system from fast API to Cognito. And so one benefit had I been developing against AWS first instead of a Docker based local environment is that I probably would have discovered this service early on in the phase. and I, and then I would've measured twice and cut once.

[00:35:39] In fact, as I'm going through my server module by module endpoint, by endpoint, when I'm finding myself doing is gutting the entire server stack, the entire server stack, and instead moving everything into single AWS Lambda functions. So previously I had one big old Docker container with a badjillion Python functions all as fast API rest end points.

[00:36:10] Well now with all of the tooling available to me by way of API gateway, API gateways, web sockets support their authentication and authorization handling by way of Cognito and the authorization header, hitting API gateways, JWT validator, I can just write these functions as single Python functions, without any tooling, you know, like database connection, pool management and stuff like this.

[00:36:40] And instead, leave it to AWS to handle all the meta tooling that I would otherwise be handling myself by deploying each function as single Lambda rest endpoints.

[00:36:54] In a way a lot of AWS is offerings sort of replace your not only your infrastructure stack, which is obvious because that's what AWS is, is cloud hosting, thereby infrastructure, but also replacing a lot of your modules and plugins and frameworks that you would be using at the server or hosting.

[00:37:17] Now setting up your environment, the traditional way is to go onto the AWS console in the web browser. And then you click around, you set up a VPC and IAM user an RDS database, some Lambda functions, string, everything together, put it all behind API gateway, clicking around and typing in things. This way is not manageable, very unmanageable, and it becomes more and more unmanageable over time.

[00:37:44] And you want to be able to track these change sets. These changes you make to your stack, in some way that you can replicate in the future. So for example, If you are spinning up a tech stack and infrastructure, we call it on AWS for setting up a database and API gateway and Lambda functions.

[00:38:06] And you're doing this to get away from your local docker-compose.yml file and you previously had your docker-compose.yml file in git, on Github will then future users and future you will miss the tracking of infrastructure changes in git and you would have to sort of describe to a future user, how to set up your tech stack with clicks, keeping track of and managing your infrastructure in the web console is a non-starter.

[00:38:36] It is not the way to handle managing your infrastructure on AWS. Instead, what we use Is a concept called infrastructure as code or infrastructure in code. and there are a number of projects out there for this, chef and Ansible, but a very popular one is called, Terraform T E R R a F O R M Terraform.

[00:39:00] That's the one I use and Terraform lets you write code that sets up your tech stack on AWS it's infrastructure as code. And then that way you can put your Terraform files into, git into your Github repository. And so that when you move from one computer to the next computer or another user comes on board and they want to set up the same infrastructure for testing the code and running that project, they can just run Terraform in it and then Terraform apply and it will take those Terraform files and it will set up the infrastructure on AWS.

[00:39:39] So what you do with Terraform. Is that you, the first step is to make sure you have your, AWS IAM credentials set up on local hosts, and it will use those. I am credentials to make sure it can spin up services in the cloud. It has the right permissions to create services, managed services on your behalf, on AWS.

[00:40:03] And then you write these Terraform files. One might be for setting up your IAM roles. Another one might be for setting up your Lambda functions. Another one for your RDS database. Another one for your VPC, your security groups, your subnets, your client VPN for setting up your client VPN and your certificates on ACM for SSL, HTTPS and your route 53.

[00:40:27] Domain name and all these things. You use some off the shelf, sample Terraform files, you modify them to your liking, and then you run Terraform in it and Terraform apply and it will set up your entire infrastructure, your entire tech stack on AWS in the cloud.

[00:40:45] And then you can connect to that tech stack with your client VPN. and then at the end of the day, when you're done with your development, when you're done coding against this stack, you can run Terraform destroy, and it will bring all of that infrastructure offline.

[00:41:02] So if it weren't for Terraform or CDK or. Serverless framework or one of these other things that I'll be discussing in a bit, these infrastructure as code projects. if it weren't for these projects, I would not be doing this episode because it would not be worth managing the setting up of your infrastructure, all through the web UI, because if that's what you had to do anyway, if you were stuck with stringing together, your entire infrastructure on AWS with clicks and typing in web forms, then the pain point that I mentioned at the beginning of this episode of Your local environment being different than your cloud environment? Well, that would be a moot point. you have to set up your cloud environment the hard way anyway, but because we have this infrastructure as code project, like Terraform, we can orchestrate the development of our cloud environment in code, and then we can put ourselves into that environment.

[00:42:02] And that way we, like I said, we measure twice cut. Once we set up our environment, the way it's intended that we can develop against

[00:42:11] we do the hard work first, and then we can replicate that environment to a staging environment, to a development environment, to a testing environment.

[00:42:20] And then finally to production

[00:42:22] and Terraform is not the only option out there. like I said, Ansible and puppet and chef. These are all popular frameworks as well. other frameworks, a little bit more popular and modern that I see in use today. There's Amazon CDK.

[00:42:38] That is the Amazon cloud development kit. It's essentially like Terraform. It is a infrastructure as code

[00:42:48] framework written by AWS for AWS, for deploying your services to the cloud, and then taking them offline. If you need in the future.

[00:42:57] Now, why the hell would I not use CDK? I'm championing everything. AWS in this, and the last two episodes and CDK is written by AWS for AWS. Why the heck would I not be using CDK? And instead I'm using Terraform? Well, there's a couple of reasons and they're a little bit more personal for my project. One is that at present currently, CDK is a little bit newer than Terraform. And one of the limitations I find with CDK is that it does not support non-managed services like cloud native services, for example. So one sticking point, I got hung up on when I was working on a project at so one point I got stuck on when I was working on a project at Dept was that I wanted to deploy a chat bot and I did this by way of AWS, Lex, AWS, Lex is a chat bot service. It lets you set up a bot with a handful of conversation flows and any number of responses to give an utterances.

[00:44:03] And then the slots, in named entities that we're going to be pulling out of the user's responses. So if you had a customer service bot and it said, hi, how can I help you today? And the person said, credit card help. It would pull out credit card as a slot. Well, setting all this up. This is not a service that you manage like or far gate in the traditional sense on AWS. This is a cloud native offering cloud native means these rest calls or these boto3 calls that you make and you don't have to manage the running and hosting of your servers.

[00:44:38] CDK does not support. Currently November, 2021 does not support setting up a Lex bot, but there's a bunch of stuff you need to do to set up a Lex bot sure. It's not a managed service, but there is still stuff you need to do to set it up. You've got to set up these example conversations and the types of slots we want to pull out and stuff.

[00:44:59] And so Terraform supports that, but CDK does not Terraform supports a lot of services, a lot more services than most of the other infrastructures code projects out there that I've found, including most of the cloud native and serverless capabilities on AWS and Azure and GCP. So that's the main reason I chose Terraform over CDK.

[00:45:25] Another reason. Lots of people choose Terraform over CDK is that Terraform is universal. It works on GCP and Microsoft Azure and other cloud services and other cloud hosting providers. Whereas CDK is just for AWS. The benefit of CDK is that it's easier to use and it's Amazon first. So first-class citizen.

[00:45:50] so CDK will continue to improve at a rapid clip, supporting more and more AWS services and tying into them very effectively, maybe in a way such that where Terraform, because it covers such a wide territory would give any one AWS service a little. Of second-class support just by the nature of it being such a big project and a big undertaking CDK would be a little bit more fine tuned and dialed in and first-class support for these AWS services.

[00:46:24] I don't know that that really is something you need to worry about. I have not hit any shortcomings on AWS with Terraform at present. It supports everything I've ever needed it to support in a very efficient and bug free capacity. But the biggest thing I see, the biggest difference I see is that Terraform is very verbose.

[00:46:47] It is complex. There's a huge learning curve where CDK is really dialed in and sleek and very efficient and streamlined. It's less code CDK is less code than Terraform, but Terraform covers more services. Take your pick. It's up to you. I'm not going to champion one over the other, but I personally use Terraform.

[00:47:06] And then there's this thing called the serverless framework, which is a really popular infrastructure as code framework, but it really mostly supports serverless technologies. It's really dialed in for AWS Lambda and dynamo DB and API gateway and less so for RDS, for example, RDS is not a serverless service.

[00:47:28] It is a managed hosted database in the cloud.

[00:47:32] And so the serverless framework is less likely to be compatible with these non serverless services and more compatible with the serverless services like AWS Lambda and dynamo DB. So that's another reason that I choose Terraform over serverless. Now, finally, we talked about local stack hosting your entire AWS tech stack on localhost, Terraform and CDK are compatible with local stack.

[00:48:01] So you write code in your Terraform files that tells Terraform we're actually pointing to localhost, not to the cloud. you set up a whole bunch of end points inside your Terraform file. You say S3 goes to localhost, dynamo DB goes to local host API gateway goes to localhost, and then the next time you run Terraform apply, it actually runs it against your local stack environment.

[00:48:28] And it works. It actually spins up a local host environment using Terraform infrastructure's code the way it would do against the actual AWS cloud. And then again, a huge benefit of Terraform of infrastructure as code generally is that if you want to make. Modification to your tech stack. you run Terraform apply and then it applies that edit just that edit.

[00:48:53] It just applies that modification and it will propagate modifications through your tech stack as the case may be, which is super powerful. So for example, if you make a modification to your VPC or to a sub-net within the VPC or a security group within. Well, if you were to do that in the browser, you would have to then go to all your services, every single service that needs to be made aware of that modification.

[00:49:25] And you would have to then click into that service and then adjust for that modification as needed. And some things are going to slip through the cracks. You're going to forget about some services that need to be made aware of those modifications. Terraform will not. If you make a modification to some deployed service in your infrastructure, and Terraform is aware that that is a dependency of other services in your infrastructure, it will first make the modification to that service.

[00:49:58] Let's say VPC, and then it will propagate the necessary modifications through your entire infrastructure to those other services. It knows how to make only the modifications necessary without doing anything too destructive.

[00:50:15] another example of nondestructive modifications that Terraform is cool at is if you write multiple Lambda functions on localhost, and let's just talk about Lambda here, this is really powerful. this will showcase how efficient Terraform is.

[00:50:29] Let's say you have five Lambda functions, five Python functions on localhost. Maybe a handful of them are web server functions. A handful are. One-off functions that you want to call from within some other area of your stack. Let's say one function is actually a Cron job. So you can connect your local host to your AWS VPC using client VPN so that you can run these Python functions that are eventually to become Lambda functions. You can run them on local host maybe you're running these functions in a PI test suite, a unit test suite. It's going to call these various functions and then it's able to be run within your VPC. So we have access to the services that you need within your AWS infrastructure. and then after you've run your tests against your functions, your content with these functions, and now you want to deploy these to Lambda

[00:51:31] and Terraform wi ll have these Lambda module chunks within your Terraform file where you point that block to that file. And it will handle automatically when you run Terraform apply, It will handle packaging that up as a zip file, putting it on S3, if it has to deploying a Lambda function.

[00:51:52] And then if you have specified tying that up to API gateway and setting up the necessary permissions. So that's cool. You're going to bypass a whole lot of the steps that you would have had to take, like packaging it up as a zip file, putting it on S3 and then deploying to Lambda from S3 that's a three-step process that you're able to skip just by using Terraform.

[00:52:14] But then the next step is let's say that you modify one of those functions only, and you leave the rest alone. You run some unit tests. Cool. We're back in business. And Terraform detects that only that file has been changed based on some hashing algorithm of like the modification date of that file on disk.

[00:52:35] And so it only deploys the necessary changes to a single Lambda function, not to the other Lambda functions and if necessary, it may propagate some additional changes. Let's say to API gateway where it's tied to that specific Lambda function.

[00:52:53] So everything I'm saying about developing against AWS on local host, as opposed to a docker-compose.yml file is only really feasible if you're using infrastructure as code like Terraform and has made especially feasible at least a lot quicker and save some money if you're using the local stack framework, but I haven't tested local stack super thoroughly.

[00:53:19] So as far as the amount of coverage of capabilities of AWS that they handle, your mileage may vary, but it's worth trying out first.

[00:53:29] And of course we're in the machine learning applied podcasts. So within your VPC connected through the client VPN, as you are, you have access to SageMaker, you can kick off SageMaker training jobs. So previous episode we talked about using SageMaker studios.

[00:53:46] I Python notebooks to write your machine learning code so that you can kick off machine learning jobs within the SageMaker environment. Well, you can opt to create a SageMaker studio project. You can set that all up, Either in the web browser in AWS's console using SageMaker studio, or you can orchestrate the infrastructure setup of SageMaker through Terraform instead, and then you can connect to your SageMaker environment within your client VPN so that you can kick off training jobs.

[00:54:23] And all this stuff on local hosts rather than being on SageMaker studio. Now, why would you want to do that? Well, let's say you're doing the rest of your web development and server development per the way I've described in this episode. Well, maybe you don't want to mix it up too much. So we want to keep our machine learning development on the same kind of environment, client VPN connected to our VPC so that we could kick off SageMaker jobs, but two, so that you could use pie charm.

[00:54:50] I don't like I Python notebooks personally. I don't like Jupiter nearly as much as I like jet brains Pycharm such a powerful IDE. So if I'm able to continue to use my IDE for developing my machine learning code on local host, but then I'm able to connect to the SageMaker environment so I can still run.

[00:55:08] SageMaker. training jobs and inference jobs and experiments with hyperparameter optimization and all that stuff on localhost, that's a win-win. And of course you still get all the bells and whistles of SageMaker. You can still then go into SageMaker studio on the web and look at the feature importances that get spit out from Shap in autopilot or in your experiments, or in your training job, the model monitor, clarify all these things.

[00:55:36] So you still get all the output based on the features that are provided by stage maker, but you can simply do your development on local host.

[00:55:44] So that's how you can develop against AWS on localhost measure twice. Cut once, Start building your project on AWS as if you were running these as like test databases or test servers, connect to it with a client VPN so that you're within the environment to run your test code.

[00:56:06] and that way, when it comes time for deploying your stack, you're not going to get sideswiped by all the setup you normally have to do, do this all on Terraform infrastructure as code or CDK or serverless framework because managing your infrastructure in the web interface.

[00:56:22] AWS or Azure or GCP is not going to happen, simply not going to happen for complex infrastructures and give local stack a try. So you might be able to do all this stuff on local host.

[00:56:34] Now, like I said, this is all a bit mutually exclusive to an alternative type of infrastructure orchestration systems, one of which is Kubernetes. I'm actually going to be talking to somebody at Dept here soon.

[00:56:51] To give me a rundown on how Kubernetes and Kubflow especially for example, for machine learning, pipeline orchestration compares to cloud hosted managed services, the way I'm describing in this episode, They're entirely different ways of handling infrastructure.

[00:57:10] Kubernetes. What Kubernetes does is. You give it a bunch of Docker files. So all of your services are written as Docker files. Okay. Rather than using, or relying on the managed services that are offered on AWS or Azure or GCP, you're relying instead on open source projects contained in Docker containers.

[00:57:35] So for example, in AWS, if you want a message queue, you'd use SQS. Well, the open source equivalent of SQS, there's something called zero MQ and there's rabbit MQ. So you could use a rabbit MQ, Docker container, and then for the server, you'd use either AWS Lambda or far gate, a hosted server, but I might use AWS Lambda.

[00:57:58] Running a server on AWS. Well, you would actually host your server maybe as a dockerized fast API container. Kubernetes takes all these containers, strings them all together, handles all the networking between them,

[00:58:12] puts them all into it's own version of a VPC and then deploys all of these services to EC2 instances . If you're going to be running this on AWS or the equivalent of instances on GCP or Microsoft Azure. So Kubernetes is intended for really relying on Docker. And open source projects to string all of your services together.

[00:58:38] You're not going to be using AWS as your backbone. Really. You're not going to be tapping into the various services on AWS. You are going to be hosting it on AWS with Kubernetes or GCP or Azure, wherever you want to host it, but the services themselves and the way they communicate with each other and the way they scale up and down, that's all managed by Kubernetes, not by AWS, it's all open source stuff that handles itself.

[00:59:05] And then you deploy the Kubernetes stack to AWS. And Kubernetes has what's called this master server on the control plane that does all the orchestration of your services with respect to each other. and it will do a smartly and as cheap as it can. So for example, if you have multiple services and they all expect some number of Ram and CPU and three of them are running on one EC2 container that actually has extra Ram and CPU available, then it will spin up a Docker container on that EC2 instance, rather than spinning up a new EC2 instance.

[00:59:45] So anyway, all that's to say that, Kubernetes is an entirely different mutually exclusive fashion of deploying your architecture to the cloud. One which relies heavily on Docker files and open source projects

[01:00:02] And what it does is it has this master server that keeps dibs on who's who within the stack, in order to decide whether we're going to scale up or scale down certain containers within the stack, as opposed to Terraform or CDK or serverless framework, which is infrastructure as code that is actually managing services that are offered by AWS.

[01:00:29] These are not necessarily open source offerings. They're not necessarily Docker containers though. They can be Terraform manages your stack on AWS is. Cloud offerings by AWS. So they're mutually exclusive Kubernetes. Actually it comes with a cost. There's a fee, a Kubernetes fee to run your control plane in the cloud.

[01:00:52] So that's one downside of Kubernetes. One upside is that it's sort of this universal solution. So you can move your Kubernetes stack off of AWS and onto GCP and AWS, GCP and Azure. They all support Kubernetes. they all have a service that supports Kubernetes. It's not just that it's supported out of the box.

[01:01:13] AWS is service that supports Kubernetes is called E K S and by running your Kubernetes cluster in the EKS service, that's it is in EKS that you actually incur that fee to run your clusters control plane. And the reason I'm bringing up Kubernetes here is that it is another alternative to infrastructure as code Terraform is a project for infrastructure as code.

[01:01:39] And so in Kubernetes, but that they handle deploying your stack to the cloud in totally different ways. And another reason I'm bringing it up is because as I mentioned, I'm going to be interviewing somebody from depth on how they use Kubernetes to deploy machine learning pipeline to the cloud by way of cube flow.

[01:01:59] And so rather than using SageMaker this person at Dept is actually using Kubernetes to orchestrate the entire data stack, everything I discussed in the last two episodes of ingesting your data and. Transforming the data, imputing nulls, feature store, everything at scales, microservices, data pipelines, training, monitoring, debugging, and then deploying this person rather than using Sage makers.

[01:02:28] Hosted offerings on AWS is using cube flows, open source containerized offerings in order to string together the entire end to end data pipeline in a universally deployable and open source fashion, which has its perks. No doubt. And so I will be reporting back here soon on how things like Kubeflow on Kubernetes compared to managed services like SageMaker and AWS, all orchestrated through Terraform.

[01:03:03] So hope to get you that episode in the near future. And I'll see you then.
Comments temporarily disabled because Disqus started showing ads (and rough ones). I'll have to migrate the commenting system.