- NumPy is a core Python library every data science professional should be well acquainted with
- This comprehensive NumPy tutorial covers NumPy from scratch, from basic mathematical operations to how Numpy works with image data
- Plenty of Numpy concepts and Python code in this article

I am a huge fan of the NumPy library in Python. I have relied on it countless times during my data science journey to perform all sorts of tasks, from basic mathematical operations to using it for image classification!

In short – NumPy is one of the most fundamental libraries in Python and perhaps the most useful of them all. NumPy handles large datasets effectively and efficiently. I can see your eyes glinting at the prospect of mastering NumPy already. 🙂 As a data scientist or as an aspiring data science professional, we need to have a solid grasp on NumPy and how it works in Python.

In this article, I am going to start off by describing what the NumPy library is and why you should prefer it over the ubiquitous but cumbersome Python lists. Then, we will cover some of the most basic NumPy operations that will get you hooked on to this awesome library!

If you’re new to Python, don’t worry! You can take the comprehensive (and free) Python course to learn everything you need to get started with data science programming!

- What is the NumPy Library in Python?
- Python list vs NumPy arrays – What’s the Difference?
- Creating a NumPy Array
- Basic ndarray
- Array of zeros
- Array of ones
- Random numbers in ndarray
- An array of your choice
- Imatrix in NumPy
- Evenly spaced ndarray

- The Shape and Reshaping of NumPy Array
- Dimensions of NumPy array
- Shape of NumPy array
- Size of NumPy array
- Reshaping a NumPy array
- Flattening a NumPy array
- Transpose of a NumPy array

- Expanding and Squeezing a NumPy Array
- Expanding a NumPy array
- Squeezing a NumPy array

- Indexing and Slicing of NumPy Array
- Slicing 1-D NumPy arrays
- Slicing 2-D NumPy arrays
- Slicing 3-D NumPy arrays
- Negative slicing of NumPy arrays

- Stacking and Concatenating Numpy Arrays
- Stacking ndarrays
- Concatenating ndarrays

- Broadcasting in Numpy Arrays – A class apart!
- NumPy Ufuncs – The secret of its success!
- Maths with NumPy Arrays
- Mean, Median and Standard deviation
- Min-Max values and their indexes

- Sorting in NumPy Arrays
- NumPy Arrays and Images

NumPy stands for Numerical Python and is one of the most useful scientific libraries in Python programming. It provides support for large multidimensional array objects and various tools to work with them. **Various other libraries like Pandas, Matplotlib, and Scikit-learn are built on top of this amazing library.**

Arrays are a collection of elements/values, that can have one or more dimensions. An array of one dimension is called a *Vector* while having two dimensions is called a *Matrix*.

NumPy arrays are called **ndarray** or** N-dimensional arrays** and they store elements of the same type and size. It is known for its high-performance and provides efficient storage and data operations as arrays grow in size.

NumPy comes pre-installed when you download Anaconda. But if you want to install NumPy separately on your machine, just type the below command on your terminal:

pip install numpy

Now you need to import the library:

import numpy as np

**np** is the de facto abbreviation for NumPy used by the data science community.

If you’re familiar with Python, you might be wondering why use NumPy arrays when we already have Python lists? After all, these Python lists act as an array that can store elements of various types. This is a perfectly valid question and the answer to this is hidden in the way Python stores an object in memory.

A Python object is actually a pointer to a memory location that stores all the details about the object, like bytes and the value. Although this extra information is what makes Python a dynamically typed language, it also comes at a cost which becomes apparent when storing a large collection of objects, like in an array.

Python lists are essentially an array of pointers, each pointing to a location that contains the information related to the element. This adds a lot of overhead in terms of memory and computation. And most of this information is rendered redundant when all the objects stored in the list are of the same type!

**To overcome this problem, we use NumPy arrays that contain only homogeneous elements**, i.e. elements having the same data type. This makes it more efficient at storing and manipulating the array. This difference becomes apparent when the array has a large number of elements, say thousands or millions. **Also, with NumPy arrays, you can perform element-wise operations, something which is not possible using Python lists!**

This is the reason why NumPy arrays are preferred over Python lists when performing mathematical operations on a large amount of data.

NumPy arrays are very easy to create given the complex problems they solve. To create a very basic ndarray, you use the np.array() method. All you have to pass are the values of the array as a list:

This array contains integer values. You can specify the type of data in the **dtype** argument:

`np.array([1,2,3,4],dtype=np.float32)`

**Output:**

```
np.array([1,2.0,3,4])
```

**Output:**

Here, NumPy has upcast integer values to float values.

**NumPy arrays can be multi-dimensional too.**

`np.array([[1,2,3,4],[5,6,7,8]])`

Here, we created a 2-dimensional array of values.

*Note: A matrix is just a rectangular array of numbers with shape N x M where N is the number of rows and M is the number of columns in the matrix. The one you just saw above is a 2 x 4 matrix.*

NumPy lets you create an array of all zeros using the **np.zeros()** method. All you have to do is pass the shape of the desired array:

`np.zeros(5)`

The one above is a 1-D array while the one below is a 2-D array:

`np.zeros((2,3))`

`np.ones(5,dtype=np.int32)`

Another very commonly used method to create ndarrays is np.random.rand() method. It creates an array of a given shape with random values from [0,1):

# random np.random.rand(2,3)

array([[0.95580785, 0.98378873, 0.65133872], [0.38330437, 0.16033608, 0.13826526]])

Or, in fact, you can create an array filled with any given value using the **np.full()** method. Just pass in the shape of the desired array and the value you want:

`np.full((2,2),7)`

Another great method is **np.eye()** that returns an array with **1s** along its diagonal and **0s** everywhere else.

*An Identity matrix is a square matrix that has 1s along its main diagonal and 0s everywhere else. Below is an Identity matrix of shape 3 x 3.*

*Note: A square matrix has an N x N shape. This means it has the same number of rows and columns.*

```
# identity matrix
np.eye(3)
```

However, NumPy gives you the flexibility to change the diagonal along which the values have to be **1s**. You can either move it above the main diagonal:

`# not an identity matrix`

`np.eye(3,k=1)`

Or move it below the main diagonal:

`np.eye(3,k=-2)`

*Note: A matrix is called the Identity matrix only when the 1s are along the main diagonal and not any other diagonal!*

You can quickly get an evenly spaced array of numbers using the **np.arange()** method:

`np.arange(5)`

The start, end and step size of the interval of values can be explicitly defined by passing in three numbers as arguments for these values respectively. A point to be noted here is that the interval is defined as [start,end) where the last number will not be included in the array:

`np.arange(2,10,2)`

**np.linspace()**, but instead of step size, it takes in the number of samples that need to be retrieved from the interval. A point to note here is that the last number is included in the values returned unlike in the case of np.arange().

`np.linspace(0,1,5)`

Great! Now you know how to create arrays using NumPy. But its also important to know the shape of the array.

You can easily determine the number of dimensions or axes of a NumPy array using the **ndims** attribute:

This array has two dimensions: 2 rows and 3 columns.

The **shape** is an attribute of the NumPy array that shows how many rows of elements are there along each dimension. You can further index the shape so returned by the ndarray to get value along each dimension:

```
a = np.array([[1,2,3],[4,5,6]])
print('Array :','\n',a)
print('Shape :','\n',a.shape)
print('Rows = ',a.shape[0])
print('Columns = ',a.shape[1])
```

You can determine how many values there are in the array using the **size** attribute. It just multiplies the number of rows by the number of columns in the ndarray:

# reshape a = np.array([3,6,9,12]) np.reshape(a,(2,2))

Here, I reshaped the ndarray from a 1-D to a 2-D ndarray.

Sometimes when you have a multidimensional array and want to collapse it to a single-dimensional array, you can either use the **flatten()** method or the **ravel()** method:

a = np.ones((2,2)) b = a.flatten() c = a.ravel() print('Original shape :', a.shape) print('Array :','\n', a) print('Shape after flatten :',b.shape) print('Array :','\n', b) print('Shape after ravel :',c.shape) print('Array :','\n', c)

Original shape : (2, 2) Array : [[1. 1.] [1. 1.]] Shape after flatten : (4,) Array : [1. 1. 1. 1.] Shape after ravel : (4,) Array : [1. 1. 1. 1.]

But an important difference between flatten() and ravel() is that the former returns a copy of the original array while the latter returns a reference to the original array. This means any changes made to the array returned from ravel() will also be reflected in the original array while this will not be the case with flatten().

b[0] = 0 print(a)

[[1. 1.] [1. 1.]]

Lorem ipsum dolor sit amet, consectetur adipiscing elit,

First of all, I would like to thank you for putting together a helpful list of all that Numpy can do when it comes to data science. I do have a point to make about the timing section of the article. You show examples using Python's list and Numpy's ufunc add. For the list example you get a mean time of 283 ns For the Numpy example you get a mean time of 1.29 us, or 1,290 ns. This example shows the exact opposite point that is being made, that Numpy ufuncs made things significantly faster. I used the exact same code on my own machine as a double check, and the Numpy example was still slower ( 71 ns for list compared to 431 for Numpy)

It is a nice article, written in a well explained way. I learned a lot from this link. WIth thanks and regards, Kishor Kumar Kumawat

Very informative and helpful 🙂