MLG 007 Logistic Regression

Feb 19, 2017
Click to Play Episode

The logistic regression algorithm is used for classification tasks in supervised machine learning, distinguishing items by class (such as "expensive" or "not expensive") rather than predicting continuous numerical values. Logistic regression applies a sigmoid or logistic function to a linear regression model to generate probabilities, which are then used to assign class labels through a process involving hypothesis prediction, error evaluation with a log likelihood function, and parameter optimization using gradient descent.

Resources
Resources best viewed here
Andrew Ng - Machine Learning Specialization
An Introduction to Statistical Learning (ISLR) (2nd Edition)
StatQuest - Machine Learning
Show Notes
CTA

Sitting for hours drains energy and focus. A walking desk boosts alertness, helping you retain complex ML topics more effectively.Boost focus and energy to learn faster and retain more.Discover the benefitsDiscover the benefits

Classification versus Regression in Supervised Learning

  • Supervised learning consists of two main tasks: regression and classification.
  • Regression algorithms predict continuous values, while classification algorithms assign classes or categories to data points.

The Role and Nature of Logistic Regression

  • Logistic regression is a classification algorithm, despite its historically confusing name.
  • The algorithm determines the probability that an input belongs to a specific class, using outputs between zero and one.

How Logistic Regression Works

  • The process starts by passing inputs through a linear regression function, then applying a logistic (sigmoid) function to produce a probability.
  • For binary classification, results above 0.5 usually indicate a positive class (for example, “expensive”), and results below 0.5 indicate a negative class (“not expensive”).
  • Multiclass problems assign probabilities to each class, selecting the class with the highest probability using the arg max function.

Example Application: Housing Spreadsheet

  • An example uses a spreadsheet of houses with features like square footage and number of bedrooms, labeling each as "expensive" (1) or "not expensive" (0).
  • Logistic regression uses the spreadsheet data to learn the pattern that separates expensive houses from less expensive ones.

Steps in Logistic Regression

  • The algorithm follows three steps: predict (infer a class), evaluate error (calculate how inaccurate the guesses were), and train (refine the underlying parameters).
  • Predictions are compared to actual data, and the difference (error) is calculated via a log likelihood function, which accounts for how confident the prediction was compared to the true value.
  • Model parameters (theta values) are updated using gradient descent, which iteratively reduces the error by adjusting these values based on the derivative of the error function.

The Mathematical Foundation

  • The hypothesis function is the sigmoid or logistic function, with the formula: 1 / (1 + e^(-theta^T x)), where theta represents the parameters and x the input features.
  • The error function (cost function) for logistic regression uses log likelihood, aggregating errors over all data points to guide model learning.

Practical Considerations

  • Logistic regression finds a "decision boundary" on the graph (S-curve) that best separates classes such as "expensive" versus "not expensive."
  • When the architecture requires a proper probability distribution (sum of probabilities equals one), a softmax function is applied to the outputs, but softmax is not covered in this episode.

Composability in Machine Learning

  • Machine learning architectures are highly compositional, with functions nested within other functions - logistic regression itself is a function of linear regression.
  • This composability underpins more complex systems like neural networks, where each “neuron” can be seen as a logistic regression unit powered by linear regression.

Building Toward Advanced Topics

  • Understanding logistic and linear regression forms the foundation for approaching advanced areas of machine learning such as deep learning and neural networks.
  • The concepts of prediction, error measurement, and iterative training recur in more sophisticated models.

Resource Recommendations

  • The episode recommends the Andrew Ng Coursera course for deeper study into these concepts and details, especially for further exploration of multivariate regression and error functions.
CTA

Go from concept to action plan. Get expert, confidential guidance on your specific AI implementation challenges in a private, one-hour strategy session with Tyler.Get personalized guidance from Tyler to solve your company's AI implementation challenges.Book Your Session with TylerBook Your Call with Tyler

Transcript
This is episode seven, logistic Regression. In this episode we're gonna talk about classifiers, namely the logistic regression classifier algorithm. So remember where we are in the artificial intelligence tree, we've gone down to machine learning, down to supervised learning, and the supervised learning is broken down into two subfields classification and regression. We studied. Regression last episode with linear regression, which will give you a continuous variable output. A number, so the cost of a house in Portland, Oregon. And now we're gonna talk about classification, which will give you the class of a thing. Am I looking at a cat, a dog, a tree, a house, or in the case of binary classification, is this a dog? Yes or no? Zero or one. Now, you may have noticed right off the bat the word logistic regression. That is really confusing. Wait, I said that supervised learning's broken down into regression and classification. So those are two separate categories, and now we're talking about logistic and linear regression. Those both sound like regression to me. How is logistic regression. Classification. Actually, the term logistic regression is historical. It was a mistake, I believe, from what I've heard. Try to ignore it if you can. The fact that it has regression in the word logistic regression. I think what's going on there is that, as you'll see in a bit, we pipe our linear regression algorithm into our logistic function. So the classification step is a function of our linear regression algorithm. So I think that's why regression is in the title. It's like we're doing the logistic. Thingy to linear regression, logistic regression. And what does logistic mean? Well, what comes out of our classification function is what's called a logic, L-O-G-I-T. So I like to imagine it this way. Linear regression is for guessing numbers, like the, the cost of a house in Oregon and logistic regression is for guessing classes. This, that, or the other thing. And what you do, imagine logistic regression is like this machine, like this cartoon machine with conveyor belt going into it and a conveyor belt coming out of it. And so in comes our linear regression algorithm and it goes down the conveyor belt. It goes inside of of our logistic regression machine. And it kind of does that cartoon like, blah, blah, bing, bang, boom. It looks like there's a fight going inside of the logistic regression machine. And then out comes a number, which tells us how confident the algorithm is that the thing we're looking at is a house or how confident it is. The, the thing we're looking at is a tree. So outcome, these numbers associated with the class, and these numbers are called logics. So 0.7% probability of this being a house, 0.5%, it being a tree 0.3%, it being a dog. And then we pick the thing with the highest logic, with the highest probability. And we take that class and that function of picking the class associated with the highest logic is called ARG max. You'll see this in various machine learning libraries. ARG max, A-R-G-M-A-X. It says, find the thing with the highest number. Take that class. Now a random aside, you'll notice that I said 0.7%. Likelihood of house, 0.5%, tree 0.3%, dog, whatever. Those don't add up to one. Right. They don't add up to a hundred percent. So that's not a real probability distribution. That's not a proper probability. If your architecture needs a proper probability distribution, then you pipe those logics. So you go into another machine, you pipe those logics into something called soft max, soft max, S-O-F-T-M-A-X. It takes your logics and it transforms them into a proper probability distribution. Where all of your logics add up to one. We won't cover Soft max in this episode. We'll cover that in a later episode. So logistic regression takes into it linear regression. And then it does bing, bang, boom in the machinery, and then out comes lodges. 1, 2, 3, 4 lodges maybe. We have a four class system, either this picture can be of a house, a tree, a dog, or a cat. And each lodge associated with each class comes out of the machinery. And we pick the one with the highest value in this case house with 0.7 by way of a function called arg max. Now, like I did in the last episode where I made the episode simpler to visualize by working with uni variate, linear regression rather than multivariate linear regression. And I'm just assuming that you're going to take that. Andrew e Coursera course where you will learn the details of multivariate linear regression. I'm gonna make this episode simpler by working with binary classification. So is this picture a picture of a house or not? Yes or no? Zero or one. So in comes a picture, bing, bang, boom. Out comes one logic and it's gonna be a value between zero and one. Where zero represents no and one represents yes. So it might be 0.7, which is the logistic regression algorithm telling us that it is 70% confident that this is a picture of a house. It's not a hundred percent confident, 70% confident. And what we're gonna do with logistic regression is say anything over 0.5 is yes, and anything under 0.5 is no. So this is a yes. This is a picture of a house. We're just gonna guess that it's a picture of a house. Okay. The example that we're gonna be using actually for this episode is the same example for the last episode. We're piping in a spreadsheet of houses in Portland, Oregon. The rows are the houses themselves. Each column is a feature. So square footage, number of bedrooms, number of bathrooms, distance to downtown, et cetera. The last label from the previous episode was the cost of the house, $200,000, $300,000. That's Y or labels. So that last column, remember is called the labels or the Y values, the actual cost of the House. And we're gonna be using this spreadsheet. To train our model, we're gonna use this spreadsheet to train the pattern that we recognize so that in the future we can make predictions. Well, again, logistic regression is not linear regression. We're not guessing a number, we're guessing a class. And so in this example, what we're gonna do is instead of saying the cost of a house, which is a continuous variable, that's not what we wanna work with. Let's say, do we consider this house expensive or not? Yes or no? Expensive. Or not expensive. So zero will be not expensive and one will be expensive. And so we'll go through this spreadsheet ourselves Manually, anything. Let's just say over $300,000 will consider that expensive, and anything under $300,000 will consider it not expensive. So we're gonna modify our spreadsheet. We're gonna open it up in Microsoft Excel one row at a time. We're gonna say. Zero. 1, 1, 1, 0, 0, 0. 1, 1, 1, 0, 0, 1, 1, 1 0. Just replacing all these actual numbers with whether or not we consider it expensive, so we're working with classes here, in this case, binary classification. It could be one of two things. Now remember how the machine learning system works? We have a three, three-step process, predict or infer that step one. Step two is our error or loss function, and step three is train or learn. So we're gonna pipe in our spreadsheet into a logistic regression function, and it's gonna go through all the rows row by row by row, and it's gonna make a whole bunch of predictions, a bunch of random shots in the dark. That's step one, the predict phase. And then step two, remember we're gonna use an error or loss function, an error function in order to determine how bad we did, how off were we? And then we're gonna do step three, which is to train our hypothesis function. We're going to train these theta parameters, the coefficients in our function. We're gonna update their values until we have a function that fits our data accurately. Align on a graph. That fits our data accurately. Now it's not gonna be a line in the case of logistic regression, so let's dive in. Let's open up that machine, that cartoon machine, and zoom in. And let's look at these three steps in detail. So the hypothesis function. In linear regression, we remember we had kind of a scatterplot cloud of dots looking like a football pointing northeast, and we wanted to shoot a line straight through the center of that football. That's called your regression line in linear regression. I. Well, we're not gonna have numeric values in our case, in logistic regression. We're not gonna have numeric values. We're gonna have ones and zeros. So on one side are things that are expensive based on some combination of the features of the houses, and on the other side are things that are not expensive. So we need a function that. Somehow gives us zeros or ones or somewhere in between, and our linear function that that's a line going down the football cloud northeast that does not give us one or zero, that gives us a number, 200,000, 300,000. So the function we're going to use is a mathematical function in statistics. It's called a logistic function. Logistic regression, a logistic function or a sigmoid function. And the reason it's called a sigmoid function as an alternative to logistic function is that it looks like an S. Imagine if you take an S, you draw an S, and you with your fingertips, you grab the top right. End of the S and the lower left end, and you stretch it out. You stretch it out so that on the X equals zero on the X axis coming from negative affinity coming from the left. You come from the left, from the left, and then once you get towards the Y axis, you start curving up. Really fast. You cross over the Y axis at X equals 0.5 at one half, and then when X is positive, you start leveling out and then you get to X equals one and you go to the right towards infinity. So it's an S on a graph. The bottom of the S is on X equals zero. The top is on x equals one. Shoots off to the right towards infinity, towards the left, towards infinity. Crosses over the Y axis at 0.5. So we wanna fit this S curve, this sigmoid function or logistic function. We want to fit our data in the graph somehow to that function. What we want to do is create what's called a decision boundary that puts all the data on one side. If it's yes, and all the data on an on the other side, if it's no, we wanna learn what that decision boundary is, where we cross over the yes no axis, and we're gonna train our theta parameters. Remember that's from the linear regression episode. We have these theta parameters, there numbers inside the function that we're gonna learn. We wanna train these theta parameters so that we get this good decision boundary. So that's our hypothesis function or our objective function. It is the sigmoid or logistic function. So remember, hypothesis or objective function is the name for the function that we're using in the predict step. Step one, and depending on the machine learning algorithm you are using for the task at hand, that function will be a specific function in math. So in this case, in logistic regression, it is the sigmoid or logistic function In linear regression. I don't, I, I guess it's just a linear function, I guess that's all you call it. What you call it is just a linear function. Now let me just give you the formula for this function. The formula for the sigmoid function is one over. One plus E to the negative linear regression. That's kind of weird, right? So one over one plus E to the negative. And then we say Z, where Z is your linear regression function, or specifically theta transpose X, where theta is the vector of parameters that we're gonna learn or weights. And X is the matrix of examples. Your spreadsheet, and if that transpose word threw you off, that's a technical detail of the multi-variate linear regression step that I skipped in the last episode. But you're gonna learn that in the Andrew ing Coursera course. So you'll learn this whole stuff with vectorization and matrix algebra and all that stuff in the Andrew Ian course. So don't worry about that right now. But one more time, logistic regression function that gives you that S-curve on a graph. Is one over one plus E to the negative theta transpose X. So linear regression is inside of that logistic regression function. Okay, so step one is we have our hypothesis function, and we're gonna pipe in our spreadsheet and we're gonna map it all on our graph. And we're going to make a bunch of random guesses. Remember that Step one is to predict, predict randomly. So we're gonna be like, yes, no, no, yes, no, no, no, no. Yes, yes, yes, yes, yes. When the actual values are, no, no. Yes, yes, yes. No, no, no. Yes, yes, yes. And what we're gonna do is now we're gonna go to step two. Which is figure out how off we were, how bad we we were. That's our error or loss function. And just like in step one where our hypothesis or objective function is gonna be a specific function, depending on the machine learning algorithm you're using, in our case, it's logistic or sigmoid function. In this step, our error function will be a specific function as well. Ours is called the log likelihood function because it uses a log rhythm in the function and here's how it works. We can't use our linear regression error function because we're not working with numbers. We're working with binary classifications. Zero or one when the actual value was one, but my guess was zero. How bad did I do? Or vice versa if my guess was zero, but the actual value was one, how bad did I do? Or if the actual value was one? And the guess is one. How bad did I do? If that's the case, the error should be zero. If I guessed correctly, the error should be zero. Now remember that we're using logics a scale from zero to one, where anything below 0.5 is no, and anything above 0.5 is yes. And we may have guessed in our predict step 0.2, as in I am 20% confident that the answer to this particular case is no. Where the actual answer was yes. In that case, we're less wrong than if I would've guessed zero. And the actual answer is one. So our error function is what it is. It's this log function. It starts at zero and it goes towards infinity. Y equals infinity at X equals one. So it goes up and up and up and up and then up into infinity. So what we're looking at in our error function is a graph that starts at zero. It goes right towards one and often to infinity, often to y equals infinity before it ever hits. X equals one. The closer, my guess is to y equals zero, which is the correct value. The closer, my guess, is to zero. The closer to zero is the error and the closer, my guess is to one, even though the actual answer is zero, the closer to infinity. Is my error. Okay? This is very confusing and don't dwell on the details. You're gonna learn this all in the Coursera course. I'm just describing it to you now for thoroughness. Now, let's take the other example. Flip the graph. In the cases where the house is considered expensive, then here's how the error function works. The closer my guest goes towards one. In other words, I guessed that the house is expensive. I guessed. Correctly. The closer I go towards one, the graph becomes zero. In other words, the error is zero, and the closer, my guess goes towards zero, where I'm guessing that the house is not expensive, even though it is the closer. My error on the Y axis goes to infinity. So this one's like a sloping graph in the other direction. So it's like you're coming down from a ramp. On the Y axis, and you hit zero where X equals one. Again, I know this doesn't come out well in audio format, so just dive into the details in the Andrew ing course, but I just wanna step you through the process. Okay. So that's the visual representation of the cost function that we're constructing for our objective function, our sigmoid function. The cost function looks like two separate cases of a logarithm. We're gonna combine those two separate cases into one function, and what happens is that one of these gets canceled out depending on whether we're dealing with a yes or a no. It's hard to describe what the function looks like. Is why times the logarithm of your guess? Plus one minus y times the logarithm of one minus your guess. Okay. That's the error for one row. Of your spreadsheet. One guess. How bad did you do guessing for one particular row? Sum all those up and divide them by the number of examples. So it's the average of errors, and in this case it's a little bit complicated. We we're working with logarithms, but just go to the Andrew Ian course notes for week three. Okay. Whew. That was crazy. Step two was to figure out how bad we did with all of our yes, no. Yes, yes. No, no, no guesses. How bad were we off? Now remember, the point of our cost function is to tell us how bad we did so that we can train our hypothesis function. We can train the theta parameters to get a better graph. More accurately depicts the way things are with all of our data, and that's step three. Step three is to train our hypothesis function using our error function. Okay, so our hypothesis function goes. Into our error function. Our error is a function of our hypothesis, and then our error function goes into our train function, namely gradient descent. Remember, gradient descent, the function of gradient descent is to take the derivative of your loss function. The derivative tells you which direction you need to step with all of your theta parameters, which direction each theta parameter needs to change, maybe some negative value or some positive value up, down, left or right, and by how much so your derivative says. How much each of your theta parameters in your hypothesis function needs to change in order to reduce your error function. And we're going to keep doing that one gradient step at a time. Keep taking the derivative and changing your theta parameters until our error function is at a minimum, at the smallest point that it can be. So the hypothesis function goes. Into your error function and your error function goes into the derivative function. Remember that the derivative itself is a function and you repeat the derivative step one step at a time until your error function gives you a small value, the smallest value that it can give you, which means. Your hypothesis function. Going back one step now is ideal. In our case, it means that our sigmoid function has a good decision boundary that can separate all the yeses on one side and all the nos on another side. And then in the future, when you make a guess with a new house you've never seen and you don't have the label, is this house considered expensive? By our relative definition of expensive, it will throw it on that graph. And if our function gives us. Anything greater than 0.5, then the answer is yes. And if it gives us anything less than 0.5, the answer is no. Okay, so gradient descent trains your theta parameters by taking the derivative of your loss function, which tells you how big of a step to take in which direction. Over and over and over until your error is small. And the gradient descent formula is for each of your theta parameters. You have your theta parameter, what it was before, minus alpha over M times the sum of all of your guess minus. The actual value times that feature in that position. So theta J equals theta, J minus alpha over M times a sum from I equals one to M of your hypothesis for that row minus the actual value for that row times feature J for that row. Again, you'll learn this in the Andrew ing Coursera course. Oh man. That was wild, huh? So let's run through this one more time. Remember, we have supervised learning broken down into classification where we're trying to guess the class of a thing. Is it a cat, dog, tree And regression, which is where we're trying to guess the value of a thing, the continuous variable, or numeric value of a thing. And then inside of classification, you have any number of algorithms such as as a decision tree or a Bayesian classifier. And we're focusing in this episode on the 1 0 1 classifier, which is called logistic regression. Logistic regression takes a spreadsheet of data. Whose values or labels, the why column is yeses and nos. 0 1 0 0 0 1 1, 1 1 0 0 0 1. In the case of binary classification, in the case of multi-class classification, it will be any number of classes, but we're not gonna talk about that in this episode. We pipe that spreadsheet into our logistic regression algorithm. Our logistic regression algorithm goes over that spreadsheet and makes a whole bunch of guesses that step one is predict. Step two is determine how bad you did with those guesses. And step three is to take your error function from step two and apply repeated applications of the derivative of that function to tell you how much to change your hypothesis function. Theta parameters so that you can get more accurate and more accurate over time until your error function finally reaches a minimum value. Now you have a hypothesis function that is trained on your data, on your spreadsheet or your matrix, and now when you get new samples in the future. You can pipe it into your hypothesis function and it will give you a guess, and that guess will be more accurate. The details of each step is that the hypothesis function in our logistic regression algorithm is called a sigmoid function or a logistic function. Inside of that sigmoid function is actually linear regression, so logistic regression is a function of. Linear regression on a graph. Our logistic function looks like an SA stretched out s, and we're trying to find a decision boundary that puts all the yeses on one side and all the nos on one side. Our error function in step two is called a log likelihood function, and it tells us how off we were with our guests. From the actual value and the function is actually quite complex. So I will just refer you to the Andrew in course to look at the equation and to watch his videos to understand the equation. But in summary, it just tells you how bad you did. And then of course, the training step just applies repeated applications of the derivative of the loss function. And that loop doing repeated applications is called gradient descent. We're descending the error graph to the bottom of the graph where the error is the lowest. That's why it's called dissent. You're descending. Now, let's sort of take a very big step back and remember what we're trying to accomplish. I mean, artificial intelligence in the very general sense of the term. Remember that artificial intelligence is being able to simulate. Any mental task. Now we dove down the details. Rabbit hole of linear and logistic regression, talking about mathematical equations and graphs and charts. And the training process or the learning process was like taking these lines or these S-curves and. Altering them in some way that just, you probably feel like you're very far removed from artificial intelligence by now. So let's take a big step back and let's remember the goal simulating any mental task. Remember that artificial intelligence is broken down into multiple subfields, one of which is machine learning, and that I said machine learning. Is sort of the, the most interesting and essential, in my opinion, subfield of artificial intelligence, in that it affects all the other fields. It's almost like any mental task could be boiled down to learning, boiled down to storing a pattern about how the world works so that you can make a prediction in the future, an inference. Now, in our examples, we're storing a pattern or a model. Of the costs of houses in Portland, Oregon, that doesn't feel a lot like artificial intelligence yet, or whether a house is expensive or not. Yes or no logistic or linear regression. That's a pattern. And then we can make a prediction with that pattern in the future. But if you step back a bit and think about other more high level sorts of. Machine learning tasks such as, let's say you're on the African Savannah and you're looking, you're looking in front of you, sort of like taking a picture, visual picture of what's in front of you. Oh, there so happens to be a lion. Now you use classification in order to determine what class of objects is in front of you. Is this a lion, a tree, a house, or food? If it's food, I want to eat it. If it's a lion, I want to run. Okay, so my classification algorithm has determined by way of my stored model that this is indeed a lion. Now we go to another learning algorithm. What action should I take given the circumstances you may have learned? You know, in machine learning, you may have learned. That Lions will eat you either verbally from your parents or maybe one took a bite outta your shoulder one day when you were on the hunt. So you have learned that lions will eat you and that the predicted course of action now, given that there is a lion in front of you, is to run away. So here we have vision. Turning into action, and if we want to translate this into a machine learning situation, we might use a convolutional neural network in the case of classifying what you're looking at, okay. With vision, and we might use a Deep Q network in order to determine what course of action or policy or plan to take given our determination. So everything in machine learning sort of boils down to this learn and predict cycle, but we have to start at the very bottom with linear and logistic regression. The building blocks the Legos in order to work our way up to the more advanced high level topics of things like how to take actions in a, in an environment given your state or advanced algorithms in vision and classification. Now I wanna go on a little detour. I said that linear and logistic regression are like Legos or building blocks in the grand scheme and that you're learning the Legos are building blocks right now and that's why it's important. Machine learning you will find is a very composable branch of engineering composable if you come from a software engineering. Background or maybe web development or even mathematics. You might be familiar with this thing called functional programming. Functional programming. It's a style of programming and it's used in languages like Haskell or Lisp, where you have a function, function A, and it takes as its arguments, other functions, functions, B and C, and let's say that function B takes as its arguments. D and e. Functional programming is like Russian dolls, where you nest all these functions inside of each other and then eventually at the very bottom you have to sort of give it a number or a string or some constant, okay? And then you can like start the process and it's like opening these Russian dolls one at a time. You open the Russian doll and what's inside another Russian doll. You open that and what's inside another, and they open that. And what's inside, this is called Composability composability. Your functions or your equations are composed of other functions or equations, which are composed of other functions and so on, so everything's nested inside of each other. You already saw this in machine learning by way of logistic regression being composed of linear regression. It is a function of linear regression. So we took our linear regression algorithm and we put it inside of logistic regression. We also saw this in the steps one, two, and three process of machine learning. We have our hypothesis function and we put that into our error function. So our error function is a function of our hypothesis function. Our error function is composed of our hypothesis function, and then we put our error function, we put that. Into a derivative function. That's the gradient descent step. That's step three training. So in the case of logistic regression, here's how it all unwraps. We have our Russian dolls. The very outer Russian doll is our derivative. Pop. That open inside is our loss, pop that open. Inside is our logistic function. Pop that open, and inside is linear regression. And you will find that everything in machine learning is this way. Now that's kind of a thing in mathematic. It's kind of the mathematical nature of machine learning. Remember that machine learning is kind of like applied statistics really, and calculus machine learning is highly mathematical. Mathematics is highly composable, so it's like this by nature, but this is also a very useful and necessary attribute in order to scale machine learning. Once you are actually deploying these architectures in code, putting it on Amazon Web Services AWS, and scaling them horizontally. If you know anything about functional programming as a software engineer or architect, you know that a proper, horizontally scalable system needs to be functional by nature, and you'll find that machine learning needs to scale. Indeed. A lot of these algorithms, especially once we get into deep learning, are very, very computationally expensive, very heavy algorithms, and in order to deploy a service that will be used by. Any number of people, you're going to need to be able to scale horizontally, and in order to do that, the nature of the architecture must be functional. Okay, that was a long-winded digression. One of the reasons I wanted to point out this composability aspect of machine learning is the following. You're probably chomping at the bit to learn about. Deep learning. That's all the rage in machine learning. And if you came to this podcast because you're excited, you've seen all these articles and discussions on Hacker News about artificial neural networks and deep learning and all the stuff that's happening in that space. Well, patience, my friend, because we will get there and we'll get there sooner than you think. We'll get to deep learning. But in order to understand deep learning, you have to understand logistic regression and linear regression because logistic regression. Is a neuron, a neuron in a neural network. So that composability paradigm is at play here. A neural network in deep learning is a function of logistic regression, which itself is a function of linear regression. So everything's. Composed and nested inside of each other. So before we can get to deep learning and neural networks, we're gonna need to learn all these little basics, these linear units and logistic units, because they're gonna become neurons inside of our neural network. So that's kind of cool. Deep learning is a function of shallow learning. We call it shallow learning. These. Simple algorithms like linear and logistic regression. Okay, so that was a very technical, long-winded episode I believe. Don't quote me on this, but I believe that the next few episodes won't be nearly as technical. The next episode specifically, we're gonna be talking about mathematics. I. We're not gonna go into math. We're gonna talk about the branches of math that you need to know in order to succeed in machine learning, and how much of these types of math that you need to know. What are the resources that you can learn these things, et cetera. Because that's a common question that comes up. What type of math do I need to know? How much of it do I need to know? Can I go into machine learning without knowing any math, et cetera? So we're gonna do an episode on that sometime soon. I'm gonna do an episode on languages and. Frameworks. So Python versus R versus Matlab, TensorFlow versus Theano versus Torch. And then we'll do a high level overview of deep learning and all these things before we finally get back into the technical details. So do not fear my entire podcast series will not be like this and that linear regression episode, which are super, super technical. Okay, what are the resources for this episode? No new resources. I'm going to point you once again to the Andrew in Coursera course. So like I said. In the linear regression episode, that course is not optional. It is required. You need to start on it. I'm gonna keep recommending it until we start getting into new territory, but I want you to start working on that course. That's it for this episode, and I'll see you guys next time.
Comments temporarily disabled because Disqus started showing ads (and rough ones). I'll have to migrate the commenting system.