Explains the fundamental differences between tensor dimensions, size, and shape, clarifying frequent misconceptions - such as the distinction between the number of features (“columns”) and true data dimensions - while also demystifying reshaping operations like expand_dims, squeeze, and transpose in NumPy. Through practical examples from images and natural language processing, listeners learn how to manipulate tensors to match model requirements, including scenarios like adding dummy dimensions for grayscale images or reordering axes for sequence data.
You're listening to machine learning applied. In this episode, we're gonna talk about shapes and sizes of ND arrays and tensors. I personally found shaping to be a very confusing concept when I first started doing machine learning is something you definitely don't deal with outside of machine learning and data science and regular computer programming and web development and the sort.
So it took me a while to get comfortable with it, and it just takes practice. So you'll eventually get it when working in machine learning, but I figured I'd do an episode and give you a lay of the land. So as a recap on something I've mentioned multiple times in MLG, when you're dealing with arrays of multiple dimensions, we call that a tensor.
So an array like you're used to a standard array. We call that a 1D tensor, a one dimensional tensor, or sometimes a vector. A two dimensional array or 2D tensor is called a matrix. Incidentally, a zero D tensor is called a scaler, which is just a number like the number five or the number 10. That's a zero D tensor.
So the general term we have for any dimensional array is a tensor in mathematics in num, pa, they call it an ND array. Any dimensional array. We have names for zero D 1D and 2D Tensors. That is Scalor vector and Matrix respectively. And there might be names for 3D and above. I'm unfamiliar with them.
Instead, in the industry, we just say 3D tensor, four D tensor and beyond. So a tensor is any dimensional array. In a NumPy, we call it ND array. So let's talk about the structure of these tensors. What we have is number of dimensions, size and shape, dimensions, size and shape. So dimensions, like I just mentioned, are basically the number of nestings of these arrays.
So if we have a list of numbers, that's a vector. If we have a spreadsheet of housing data, that's a matrix that that's a 2D tensor. So the number of dimensions is two. If we have an image, say 2 56 by 2 56 pixel image with by height. It's actually a 3D tensor because it also has something called channels, which is gonna be red, green, and blue pixel colors.
So it's width by height, by channels, deep or depth. And so an image is a 3D tensor. It's number of dimensions is three. In Num Pi, you might have an ND array on hand called RARR, for example, a variable named R, which is assigned to an ND array. And if you wanted to retrieve the number of dimensions associated with that ND array, you would, you would type R, do N dm, NDIM.
So that'll get the number of dimensions. And you might see that performed from time to time, just as a sanity to check maybe an assert statement, making sure that the input that's getting passed into some machine learning model is some specific number of dimensions, because that's what it expects. For example, an off the shelf image classifier, namely a convolutional neural network, we'll typically expect to be working with a 3D tensor, and so it will assert that.
The dimensions of the images that you're sending to the model are three, so it'll look likeer input, N dim equals equals three. So that's the number of dimensions. Then there's size. Size is the number of elements in your tensor. So if you have a zero D tensor, AKA, A scaler, the size is going to be one.
There's one element in this scaler, namely the scaler itself, the number five, there's one. Number five, if you have a vector or an array of numbers, A 1D tensor, the size is just gonna be the number of elements in that tensor. So if you have a list of numbers, 1, 2, 3, 4, 5, the size of that. Vector is five. Five elements in the vector.
If you have a matrix and it's a 10 by 10 matrix, well then the size is gonna be 10 times 10, which is a hundred. It's gonna be the number of rows, times the number of columns. The size is the number of elements in your tensor. So those are pretty clear number of dimensions and size of your tensor. Those are pretty obvious.
The tricky stuff is when we get into shapes. So shapes of your tensors are the numbers of rows or columns or channels, et cetera, per dimension. So if you have a 3D tensor, three dimensions, then the shape of that tensor will be three elements and each element will be the number. Of rows or columns or whatever in that dimension.
So if you have a 2 56 by 2 56 by three image, namely width by height, by channels, picture on disc, then its dimensions will be three. Its size will be 2 56 times 2 56 times three, and its shape will be 2 56 comma 2 56, comma three. So the shape is the number of elements per dimension. So some common shapes that you might be working with.
It's very common to receive your data from the client in the form of a spreadsheet. A CSV file. That's a 2D tensor, a KAA matrix, and its shape will be rows by columns, number of rows, number of columns. So if you have a housing prices spreadsheet. And you have a million rows, a million housing examples, and 50 features per house, whether they be distance to downtown, number of bedrooms, number of bathrooms, 50 features.
The shape will be 1 million, comma 50, and the size will be 1 million times 50 or 50 million. Now real quick, a point of potential confusion is that you'll see sometimes people referring to the number of columns in a matrix as the number of dimensions. So for example, if you have a housing spreadsheet, a million by 50, you'll see that 50 referred to sometimes as the number of dimensions, and they'll call this like the curse of dimensionality when you have way too many features in your data set and you need to.
Pair it down with something like principle component analysis or an auto encoder, the curse of dimensionality, and what specifically they're referring to. There is the number of columns. Well, that doesn't stack up to what we were just talking about. Dimensions is the number of stacking of arrays in your tensor.
And so in our case, there's only two dimensions. There's rows and there's columns in our spreadsheet. So using the word dimensions to refer to the number of features is incorrect. What they mean here, actually, when they're talking about the curse of dimensionality or when they're referring to dimensions with respect to the columns in a spreadsheet or a dataset, they're talking about these columns after they've been put into the machine learning model.
So when you feed your data into a machine learning model, like a neural network, for example, what your neural network's going to do is take every feature per row. And combine it every which way. One feature combines with every other feature. So it's a cross product of the features. So it would be 50 times 50 or 2,500 various ways that we can combine these features with each other.
And since we're combining the features in a cross product Cartesian sort of fashion, we do indeed end up in that scenario with. 2,500 dimensions. So over here in the machine learning model, we're dealing with a newly shaped tensor. The dataset you received is two dimensions and the shape is 1 million by 50.
That's all here in your left hand and you pipe it into your neural network over here in your right hand. And the neural network does a bunch of combinations of these features and it recreates sort of internally in its model, a new tensor with different dimensions and different shape. And so when we talk about the cursive dimensionality or the number of columns sort of representing the number of dimensions, they're kind of taking a conceptual shortcut step between your data set and how it ends up in the model.
So don't let that trip you up. The number of features is not the number of dimensions. It becomes a new number of dimensions downstream in the machine learning model, but that's only after the dataset has been input into your machine learning model. So dimensions, size, and shape. And the tricky part with shapes comes when you're reshaping, when you're changing the shape of a data set.
Now, normally you receive a data set from your client, a spreadsheet of housing information. For example, a million by 50, and it's got its two dimensions and it's 1 million by 50 shape, and you pipe that into your machine learning model and everything's hunky dory. But sometimes the shape that you received your data in is not the shape that your machine learning model wants or.
Somewhere downstream in the machine learning pipeline, you have to change shapes of your data as it as it moves along through the steps. This is called reshaping, and it can be a little confusing. So as an example, let's talk about images. I. If you took your off the shelf convolutional neural network, CNN sort of boilerplate copy paste code from from some TensorFlow example on GitHub, you just copy and paste some convolutional neural network code in TensorFlow and Python.
You copy and paste it into a new file and you feed it some images, okay? And it expects those images to be 3D width by height, by channels. So it wants a 3D tensor. Actually, it wants a four D tensor. The first dimension is gonna be the batch size of the number of images that you're feeding into the model at any given time.
So you're gonna have batch size by width, by height, by channels. So that's four dimensions there. Now, let's say that you don't have 3D images. Okay. You have gray scale images. You have images in black and white. In other words, that third dimension, which is typically red, green blue, three elements in the third dimension, representing the color of each pixel goes away.
And all you have is width by height, by nothing. You have a picture where. It's width by height, and in each cell of that matrix of that 2D tensor is the magnitude of gray scale for that pixel. So a black and white image is a 2D tensor, not a 3D tensor, but your convolutional neural network here expects a 3D tensor.
What do you do? That's where reshaping comes into play. Reshaping allows you to change the shape of your tensor to add additional dimensions, remove dimensions, et cetera. In this particular case, what you could do is force those gray scale magnitudes, the entries of the row by column pixels. This 2D matrix, you could force each cell to be an array because that's what you have in a 3D image.
You have at each cell in width, by height. At each pixel cell, you have a list, a three element list of color values. So each cell is an array. Well, what you could do is in the case of your gray scale images wrap each number in each cell in an array. So now it's a 2 56 by 2 56 by one tensor. We didn't add any new information.
We didn't add any new numbers. Into these pictures. All we did was wrap the numbers in a raise. We call this expanding the dimensions or the num pi function. For this is NP num pi. Do expand. Dims, expand dims. And it will take your original tensor and it will give it one extra dimension where the shape of that dimension is one.
Namely, it will wrap the last dimension of your tensor in brackets, in square brackets. It'll just wrap them into an array. So the way you do that, like I said, is NP do expand dims, but in NumPy, there's many, many ways to skin a cat when dealing with reshaping tensors or transposing or swapping es. And so we'll talk about a number of these different ways.
Expand DIMS is the way you simply add a one shape dimension to your tensor. Expand dims. Well, the general form of this is reshape num pa reshape. That's a general function in num pa that allows you to reshape your tensors any which way. So a common fashion you'll see reshape used in NumPy tutorials is maybe they want to show you how to work with a 2D tensor.
And so what they'll do is NP a range, and then in parentheses a hundred, so they're creating a 1D vector of a hundred elements, and then reshape parentheses 10 comma 10. And so it'll take your 1D array of a hundred elements and it'll turn it into a 2D array of 10 by 10 elements, same size, so you can do the reshape, new shape, and new number of dimensions.
So that's a common example you'll see of reshape in practice in tutorials. But you could also use reshape to add additional dimensions, remove unneeded dimensions and things like this. So for example, in our case, when we want to turn our image from a gray scale image into a 3D tensor, where the last element is the pixel value wrapped in an array.
The way you do that is you take your tensor, RARR. Which is an image, and you'd say R reshape parentheses 2 56, 2 56 1, and that one in the end, NumPy will know, just wrap the element. I'm not adding new information and so the reshaping is allowed. Now, let's say we have too many dimensions. For example, sometimes when I'm using a machine learning model that makes predictions, regression predictions, it'll return to me a 2D tensor.
The first dimension will be all the predictions, and the second dimension, for some reason, will just be all those predictions wrapped in an array. So the shape will be, let's say, a million predictions by one. So it's an unnecessary. Second dimension. So the way you can get your predictions out of that, unpack it, remove them from that unnecessary array.
Wrapping is with NP squeeze. Squeeze that last dimension out. So squeeze removes one shape dimensions. So it's the reverse. Of expand dims, and again, the general form for all of this is just MP reshape. The reshape function is the general purpose function that allows you to perform a lot of these functions that have dedicated names.
So if we had a predictions array, a RR, which has shaped 1 million by one, the equivalent to NP squeeze would be R reshape parentheses, 1 million N parentheses. No one, no second one. And it'll just remove that one shape dimension. Now, a little tip, when you're dealing with reshaping NumPy arrays, you may sometimes see the number negative one.
That means wild card. A wild card. So if I were passing a batch of gray scale images into a convolutional neural network that wanted 3D images, okay, just like we were doing before we pass in a. Batch, let's say 32 images of 2 56 by 2 56 images, so we've got 32 comma 2 56, comma 2 56. That's the shape of our input data.
The dimensions is three and we need to convert it to a four dimensional tensor for the Connet. 32 by 2 56 by 2 56 by one. That last dimension just being this contrived wrapping around the gray scale magnitude. So we want to go from 3D to 40. We want to go from 32, 2 56, 2 56 to 32, 2 56, 2 56, 1. Well, we would just do that, expand dims trick or reshape where the last argument is that added dimension.
But let's say we didn't know that the batch sizes we're dealing with is 32. Let's say we know that the width and height, 2 56, we know the size of our images, but for whatever reason, maybe they're not stored in a variable locally, or we're doing hyper parameter optimization on the batch size, for example.
The point in our code where Num Pi is dealing with this tensor, it has no way of knowing that the batch size is 32, that that first dimension is 32. What you can do is use negative one. As a wild card placeholder, and what NumPy will do then is say, okay, I'm gonna hold off on this dimension. It'll go through the other dimensions, 2 56, 2 56, and then add a single dimension, and it'll realize that that point that has.
Covered all of the size requirements, converting the original tensor to the new tensor. It has enough information now to know based on the conversion of these numbers to those numbers, what that wild card should be replaced with, and it inserts 32 there, so negative one you'll see. Used as a wild card placeholder in some reshaping operations where you may not know the variable there, but you know all the other dimensions.
And that's enough information for NumPy to infer the wild card dimension. We talked about expands, dims and squeeze. Those are inverses of each other. One adds a single dimension, and the other removes it. We talked about reshaping a one by 100 vector into a 10 by 10 matrix. And by the way, the inverse of that is Ravel, R-A-V-E-L.
That will take your 10 by 10 and flatten it into a one by 100. And as I mentioned, all of these operations can be generalized into the single function reshape. So reshape is sort of the only function you have to know in this broad category. Get familiar with reshape and it'll allow you to perform multiple functions.
So that's shape shapes of tensors. Now, sometimes you have a different problem, you have the right shape, but some axes, some dimensions are in the wrong spot. This is a little bit rarer of a circumstance. I think you'll see this situation less commonly, but I've encountered it professionally. And the way you solve this problem is by transposing axes, where there's a function in NumPy called swap axes.
And we'll compare these two in a bit. Let's frame this problem first. When you're dealing with natural language processing and recurrent neural networks, and word to vek, you're gonna wanna listen to my deep natural language processing episodes to properly understand this problem setup. But in NLP, when you're doing sequence to sequence modeling or machine translation and stuff like this.
Your recurrent neural network will take a 3D tensor batch size, let's say 32 or 64 phrases. And then the second dimension is phrase length. Okay? Now what this is, is the maximum number of words that can occur in a sentence. So let's say that we're training this model on Wikipedia. We're trying to classify something in these sentences.
Well, this dimension, the second dimension is gonna be the maximum number of words that can ever appear in a sentence. So the largest sentence possible, and the reason for this is so that this dimension can accommodate the longest sentence that we encounter. And smaller sentences, we'll just pad the end of the sentences with these pad tokens so the sentence length is fixed.
So that's our second dimension is. Sentence length. And the third dimension is the word encodings. The number of columns it takes to encode a word, using the word to VEC model that I discussed in the NLP episodes. We call this the embedding dimension. So remember, what we do here is we convert words like cat and dog and human into a vector, into a list of numbers, which seems like an odd thing to do, but there's this magic behind it that makes it work into a dense vector of numbers.
Well, the number of elements in that vector is this embedding dimension. So let's say it's 64. So we have a 32 batch size by a hundred sentence length by 34 embedding dimension, and we pipe that into our recurrent neural network. Bing bang, boom. We have a language modeling process. Oftentimes what you'll find is these last two dimensions, the sentence length, what's often called the time steps.
Actually, a lot of times you'll see the sentence length referred to as time steps, and the reason being that we're not always dealing with natural language processing. Sometimes we're using this same model on stock market predictions and stuff like this. Time steps and embedding dimension. Sometimes you'll see these two dimensions.
Flipped. You'll have embedding dimensions as your second dimension and time steps as your last dimension. Maybe you got an off the shelf recurrent neural network built for working with natural language processing, and it expects time steps first and embedding dimension second. And then you take this model and instead you use it for stock market prediction.
And in that case. What you get as your dataset has candlestick data first and time steps second. So these two axes are flipped. You go one, two, and three in case one, and you go 1, 3, 2 in case two. Well, that's when transpose comes into play. Transpose num pi transpose will take your array and allow you to swap axes, swap specific axes.
So in this case you would swap two and three to being three and two, and it's just that easy. Num pi do transpose. You take your tensors, your first argument, and then your second argument is the new list of dimensions in what order you want them. Now, another way you can do this is with the function swap AEs, but swap axes allows you to swap any two axes, which is fine in our current case.
Sometimes you'll want to swap multiple axes at once, and so it's good habit to get into the practice of using the more general transpose function, which has more functionality beyond swapping XI over the swap XI function. Just like we saw in the reshaping stuff, you can use reshape as a general function to handle, expand dims, squeeze, ravel, et cetera.
So in summary, the two core functions in NumPy that allow you to manipulate dimensions and shapes are reshape and NumPy transpose. There's obviously plenty more to swapping axes and reshaping and redi dimensioning your tensors in mpa, but you'll get a lot of that stuff from experience on the job. I just wanted to give you a crash course on some of the heavy hitting functions you'll see.
Dimensions, size and shape, and the way you manipulate any of these is via reshape or transpose. Reshape to insert dimensions or remove dimensions or change the shape and transpose to swap axes with each other.