Python was conceived with one target in mind, simplicity. This simplicity has made Python one of the most popular languages today. However, Python developers have to sacrifice performance to make their lives easier. This performance sacrifice considerably impacts numerical and scientific computing, though! Luckily, NumPy is there to save the day!
This article is an introduction to NumPy. After reading it, you’ll know how to install and import NumPy and how to process numeric data with one-dimensional NumPy arrays. We also offer a full course on NumPy right here on Python Land, covering many more topics and multi-dimensional arrays!
Table of Contents
Why NumPy?
Scientists have been interested in Python since the early days, thanks to its ease of use and extensibility. NumPy emerged from the efforts of the scientific Python community to tackle numerical computing weaknesses in Python. The main issues that needed tackling were:
- Efficient array creation and manipulation
- Mathematical functions and algorithms to operate on these arrays
NumPy was created in 2005 by merging two numerical packages available at the time: Numeric and Numarray. Since Python is not optimized for speed, most of NumPy’s heavy-lifting code is written in C, with some Fortran code doodling around the edges. It’s what makes NumPy blazing fast!
Since its creation, NumPy has become one of the most reputable third-party modules available for Python. There have even been fierce debates about including it as one of Python’s standard modules. Only the very best of projects earn such honor.
Fast forward to the present, where we’ve seen an explosion of data science and machine learning jobs leveraging Python. NumPy is a cornerstone package offering a solid foundation many other projects rely upon. Many data science and machine learning packages use NumPy under the hood, notably:
- Pandas
- scikit-learn
- Tensorflow
Therefore, mastering NumPy is of paramount importance.
Installing NumPy
NumPy is not part of the default Python distribution, so you’ll need to install it. Before I continue, I’d like to point you to pipx, which is ideal for installing tools like this. If you don’t want to use that tool, you can use the regular pip install or Poetry / Pipenv:
pip install numpy
If you’re a Conda user, you can use:
conda install numpy
Importing NumPy
Like all packages, you can import parts of NumPy or the entire package. There’s a convention to import the entire package and rename it to np. It’s strongly recommended to use this convention as well. The simple reason for this is that most NumPy users use the package interactively, so they have to type less:
import numpy as np
Throughout this article, we’ll abide by the convention and assume you have NumPy imported as np
.
NumPy arrays
At the core of NumPy are arrays. Let’s formally define what an array is:
- array
- A data structure with elements of the same type whose position can be accessed by one or more indices is an array. Hence, a programming language implementation of a vector, a matrix, or a tensor is an array.
Given this broad definition of arrays, Python has two built-in types that resemble an array: the list and the tuple. We could use a list of lists or a dictionary of lists to store multiple lists, effectively creating multidimensional arrays. However, neither the list, the tuple, nor the dictionary is optimized for numerical purposes. They don’t even adhere to our definition that well since they can store values of different types. E.g., a list can contain a mixed collection of numbers, strings, and any other object type.
NumPy solves many of the Python shortcomings regarding numerical computation through arrays. Especially array creation and manipulation in NumPy is blazing fast and well optimized.
Creating a 1-dimensional array
The easiest way to create an array is to pass a list to NumPy’s main utility to create arrays, np.array
:
a = np.array([1, 2, 3])
The array function will accept any Python sequence. Think of lists, sets, tuples, or even a range. The function accepts several optional keyword arguments, and we will discuss two of them here: copy
and dtype
.
The copy argument
The copy
argument states whether to make a copy of the input object. When copy is True
, any changes in the resulting array will not change the input object. However, if it is False
, changes in the array can change the input object.
When using lists to make arrays, NumPy will always copy the object regardless of the argument’s value; for example:
lst = [1, 2, 3] a = np.array(lst, copy=False) print(a) # array([1, 2, 3])
If we change the array, the list will stay the same since NumPy copied it:
a[0] = 0 print(lst) # [1, 2, 3]
If we create the same list but with another NumPy array as input:
a_in = np.array([1, 2, 3]) a = np.array(a_in, copy=False) a
Let’s see what happens if we change the resulting array:
a[0] = 0 print(a) # array([0,2,3]) print(a_in) # array([0,2,3])
Both arrays changed because we set the copy
option to False
.
You can test this for yourself using the following code crumb:
Change the copy
argument and see what happens!
The dtype argument
Another commonly used argument is dtype
, indicating the data type of the elements of this array explicitly. In the next section, you will learn about the available data types. One of them, the np.int16
type, is the smallest available integer type, taking up way less space (just two bytes) than a regular Python integer.
NumPy data types (dtypes)
Another keyword argument of the function np.array
is dtype
. This argument specifies the data type in the array. Remember, one of the key properties of an array is that all elements have the same type.
NumPy implements its own data types that are optimized for efficient storage and processing. For this, it uses the base class called dtype
. Let’s take a look at the most common dtypes
:
- np.int16
- np.int32
- np.int64
- np.float32
- np.float64
- np.float128
- np.bool_
- np.str_
- np.bytes_
- np.object_
In this article, we’ll focus on numeric types only.
Integers
The integer dtypes, np.int16
, np.int32
, and np.int64
differ only in the size of the number they can store:
np.int16
-> ± 32,762np.int32
-> ± 2,147,483,647np.int64
-> ± 9,223,372,036,854,775,807
Under normal circumstances using np.int64
is the way to go since it allows us to store the largest numbers. Int64 is the dtype
NumPy uses by default. There are benefits, however, to using smaller integers:
- Reduced memory usage
- Faster computations
More often than not, memory usage on a modern PC won’t be an issue for relatively small arrays. If you think it will be, try the smaller types. You must ensure that all elements and the results of future operations on those elements will not exceed the maximum size of the chosen type.
Floats
The different NumPy float types allow us to store floats in different precision, dependent on the number of bits we allow the float to use. The larger the number of allowed bits, the more precision our array’s elements will have. E.g., np.float16
will use 16 bits (two bytes), while np.float64
takes up 64 bits (8 bytes).
Increased precision comes at the expense of memory and performance. Still, the rule of thumb is to err on the safe side and use np.float64
by default unless you have a good reason to use something else. E.g., if you can spare some precision, and performance and memory usage are of the essence, use something smaller.
Let’s explore how the float size affects precision:
The output nicely demonstrates how the different types influence the amount of precision we can store:
float16: 1.374 float32: 1.3738729 float64: 1.3738729019013636 float128: 1.3738729019013635746
There’s even a float128
type on Linux and MacOS, as can be seen in the example. It will probably give an error if you’re on Windows.
Using NumPy arrays
We’ll now look closely at how to use NumPy arrays, starting with accessing elements using array indexing.
Getting a single element
We can access and modify single elements:
a = np.array([0.0, 2.0, 3.0, 4.0, 5.0]) print(a[0]) # 0.0 a[0] = 1.0 print(a) # [1., 2., 3., 4., 5.]
Accessing multiple elements
We can access and modify multiple specific elements in a NumPy array at once. Note that Python lists do not have this feature:
a = np.array([0.0, 2.0, 3.0, 4.0, 5.0]) # Get elements at position 0 and 2 print(a[[0, 2]]) # [0., 3.] # Change the first two elements a[[0, 1]] = [0, 3.0] print(a) # [0., 3., 3., 4., 5.]
Negative indexing
Negative indices work the same as with lists; they count indices backward. For example, to get elements at the end of the array you can use:
a = np.array([0.0, 2.0, 3.0, 4.0, 5.0]) print(a[-1]) # 5.0 print(a[-2]) 4.0
Slicing
Slicing works as well, and it behaves exactly like the regular slicing of lists, e.g., the format is a[start: stop: step]
. As an example, let’s get the first three elements of an array:
a = np.array([0.0, 2.0, 3.0, 4.0, 5.0]) print(a[0: 3])
Or all the elements except for the last one:
print(a[0: -1]) # [0., 2., 3., 4.]
And like lists, we can also reverse the array this way:
print(a[:: -1]) # [5., 4., 3., 2., 0.]
Append, insert, delete, and sort
NumPy arrays have more in common with lists. Many of the regular operations behave similarly to Python lists, like sorting, deleting, inserting, and appending data. Note that these methods all return a new array instead of modifying the given array.
Append to NumPy array
To append means to add elements to the end. We can append single elements to a NumPy array just like we do with lists:
a = np.array([1.0, 2.0]) a = np.append(a, 3.0) print(a) # [1., 2., 3.]
We’re used to using the extend
method to append multiple elements to a list. However, NumPy arrays reuse the same append function to add multiple elements:
a = np.array([1.0, 2.0]) a = np.append(a, [4.0, 5.0]) print(a) [1., 2., 4., 5.]
Insert into NumPy array
We can insert one or more elements at specific index locations using insert:
a = np.array([1.0, 2.0, 3.0, 4.0, 5.0]) # Insert one element at position 3 a = np.insert(a, 3, values=3.5) # a is now [1. , 2. , 3. , 3.5, 4. , 5. ] # Insert a list of elements at position 3 a = np.insert(a, 3, values=[100, 200]) # a is now [1. , 2. , 3. , 3.5, 100, 200, 4. , 5. ] # Insert multiple elements at multiple positions a = np.insert(a, [3, 5], values=[4.5, 5.5]) # a is nop [1. , 2. , 3. , 4.5, 4. , 5. , 5.5]
Delete elements from NumPy array
We can delete one or more elements at once as well:
a = np.array([1.0, 2.0, 3.0, 4.0, 5.0]) a = np.delete(a, -1) # a is now [1., 2., 3., 4.] a = np.delete(a, [0, 1]) # a is now [3., 4.]
Sorting NumPy array
There are two ways to sort a NumPy array: in-place sort and creating a new, sorted array. To start with that last one:
a = np.array([1.0, 3.0, 2.0, 4.0, 5.0]) b = np.sort(a) # b is now [1., 2., 3., 4., 5.]
And to do an in-place sort, do as follows:
a = np.array([1.0, 3.0, 2.0, 4.0, 5.0]) a.sort() # a is now [1., 2., 3., 4., 5.]
Notice that most methods do not belong to the array class itself, except for the sort method. Hence, we have to call the methods on the np
object that accepts the array as an argument. And all these transformations do not happen in place but return a new array (except for sort).
In the NumPy course (coming soon!), we will go through more functions and array methods that enable us to do much more with arrays.
Mathematical array operations
We’ll conclude this article with the most common mathematical operations that one might want to perform with arrays: sum, subtraction, multiplication, and division.
Arrays handle like scalars; operations are carried out element-wise. Hence, arrays can only be added, subtracted, multiplied, or divided by another array of the same size or a scalar.
Let’s define some arrays first, notice that a
and b
have the same size of 4, b_wrong_size
has a different size of 3 elements:
a = np.array([1.0, 2.0, 3.0, 4.0]) b = np.array([2.0, 2.0, 2.0, 2.0]) b_wrong_size = np.array([2.0, 2.0, 2.0])
If we try to operate with arrays of different sizes, a ValueError
exception will be raised:
a = np.array([1.0, 2.0, 3.0, 4.0]) b_wrong_size = np.array([2.0, 2.0, 2.0]) # raises ValueError exception a + b_wrong_size ValueError: operands could not be broadcast together with shapes (4,) (3,)
Addition and subtraction
We can add arrays together or add a single value to each element of the array:
a = np.array([1.0, 2.0, 3.0, 4.0]) b = np.array([2.0, 2.0, 2.0, 2.0]) print(a + b) [3., 4., 5., 6.] print(a + 2) [3., 4., 5., 6.] print(a - b) [-1., 0., 1., 2.] print(a - 2) [-1., 0., 1., 2.]
Multiplication and division
The same is true for multiplication and division: we can either use a single value or two arrays:
a = np.array([1.0, 2.0, 3.0, 4.0]) b = np.array([2.0, 2.0, 2.0, 2.0]) print(a * b) [2., 4., 6., 8.] print(a * 2) [2., 4., 6., 8.] print(a / b) [0.5, 1. , 1.5, 2. ] print(a / 2) [0.5, 1. , 1.5, 2. ]
Learn more
This article is part of a free Python Tutorial. You can browse the tutorial with the navigation buttons at the top and bottom of the article or use the navigation menu. Want to learn more? NumPy pairs nicely with Jupyter Notebooks, so you might want to read up about those.
Conclusion
We’ve looked at one-dimensional array creation, accessing elements, array manipulation, and the most important mathematical operations on arrays. There’s a lot more to learn about NumPy. Python Land will soon release a full course on NumPy that covers everything you’d want to know. Until then, I recommend the following resources to learn more:
- The official NumPy manual has a section for absolute beginners
- If you know MATLAB, you will like NumPy for MATLAB users