Python was conceived with one target in mind, simplicity. This simplicity has made Python one of the most popular languages today. However, Python developers have to sacrifice performance to make their lives easier. This performance sacrifice considerably impacts numerical and scientific computing, though! Luckily, NumPy is there to save the day!
This article is an introduction to NumPy. After reading it, you’ll know how to install and import NumPy and how to process numeric data with one-dimensional NumPy arrays. There will soon be a full course on NumPy right here on Python Land, covering many more topics and multi-dimensional arrays. if you want to be notified of its release, please subscribe to the low-volume Python Land newsletter.
Table of contents
Scientists have been interested in Python since the early days, thanks to its ease of use and extensibility. NumPy emerged from the efforts of the scientific Python community to tackle numerical computing weaknesses in Python. The main issues that needed tackling were:
- Efficient array creation and manipulation
- Mathematical functions and algorithms to operate on these arrays
NumPy was created in 2005 by merging two numerical packages available at the time: Numeric and Numarray. Since Python is not optimized for speed, most of NumPy’s heavy-lifting code is written in C, with some Fortran code doodling around the edges. It’s what makes NumPy blazing fast!
Since its creation, NumPy has become one of the most reputable third-party modules available for Python. There have even been fierce debates about whether to include it as one of Python’s standard modules. Only the very best of projects earn such honor.
Fast forward to the present, where we’ve seen an explosion of data science and machine learning jobs leveraging Python. NumPy is a cornerstone package that offers a solid foundation required by many other projects. Many data science and machine learning packages use NumPy under the hood, notably:
Therefore, mastering NumPy is of paramount importance.
pip install numpy
If you’re a Conda user, you can use:
conda install numpy
Like all packages, you can either import parts of NumPy or the entire package. There’s a convention to import the entire package and then rename it to np. It’s strongly recommended to use this convention as well. The simple reason for this is that most NumPy users use the package interactively, so they have to type less:
import numpy as np
Throughout this article, we’ll abide by the convention and assume you have NumPy imported as
At the core of NumPy are arrays. Let’s formally define what an array is:
- A data structure with elements of the same type whose position can be accessed by one or more indices is an array. Hence, a programming language implementation of a vector, a matrix, or a tensor is an array.
Given this broad definition of arrays, Python has two built-in types that resemble an array: the list and the tuple. We could use a list of lists or a dictionary of lists to store multiple lists, effectively creating multidimensional arrays. However, neither the list, the tuple, nor the dictionary is optimized for numerical purposes. They don’t even adhere to our definition that well since they can store values of different types. E.g., a list can contain a mixed collection of numbers, strings, and any other object type.
NumPy solves many of the Python shortcomings regarding numerical computation through arrays. Especially array creation and manipulation in NumPy is blazing fast and well optimized.
Creating a 1-dimensional array
The easiest way to create an array is to pass a list to NumPy’s main utility to create arrays,
a = np.array([1, 2, 3])
The method accepts several optional keyword arguments, and we will discuss three of them:
The copy argument
copy argument states whether to make a copy of the input object. When copy is
True, any changes in the resulting array will not change the input object. However, if it is
False, changes in the array can change the input object.
When using lists to make arrays, NumPy will always copy the object regardless of the argument’s value; for example:
lst = [1, 2, 3] a = np.array(lst, copy=False) print(a) # array([1, 2, 3])
If we change the array, the list will stay the same since NumPy copied it:
a = 0 print(lst) # [1, 2, 3]
If we create the same list but with another NumPy array as input:
a_in = np.array([1, 2, 3]) a = np.array(a_in, copy=False) a
Let’s see what happens if we change the resulting array:
a = 0 print(a) # array([0,2,3]) print(a_in) # array([0,2,3])
Both arrays changed because we set the
copy option to
NumPy data types (dtypes)
Another keyword argument of the function
dtype. This argument specifies the data type in the array. Remember, one of the key properties of an array is that all elements have the same type.
NumPy implements its own data types that are optimized for efficient storage and processing. For this, it uses the base class called
dtype. Let’s take a look at the most common
In this article, we’ll focus on numeric types only.
The integer dtypes,
np.int64 differ only in the size of the number they can store:
np.int16-> ± 32,762
np.int32-> ± 2,147,483,647
np.int64-> ± 9,223,372,036,854,775,807
Under normal circumstances using
np.int64 is the way to go since it allows us to store the largest numbers. Int64 is the
dtype NumPy uses by default. There are benefits, however, to using smaller integers:
- Reduced memory usage
- Faster computations
More often than not, memory usage won’t be an issue for relatively small arrays. If you think it will be, try the smaller types but ensure that all elements and the results of future operations on those elements will not exceed the maximum size of the chosen type.
The float types also refer to the size of the number in memory. The larger the size, the more precision our array’s elements will have. However, this precision comes at the expense of memory and performance. The rule of thumb is to use
np.float64 by default. If you can spare some precision and the performance and memory usage are paramount, use something smaller.
Let’s explore how the float size affects precision in the REPL:
>>> np.array([1.3738729019013636723763], dtype=np.float16) 1.374 >>> np.array([1.3738729019013636723763], dtype=np.float32) 1.3738729 >>> np.array([1.3738729019013636723763], dtype=np.float64) 1.3738729019013636 >>> np.array([1.3738729019013636723763], dtype=np.float128) 1.3738729019013635746
On some systems, there’s even a
float128 type as can be seen in the last example. This will probably give an error if you’re on Windows, but Linux and MacOS should support it.
Using NumPy arrays
We’ll now look closely at how to use NumPy arrays, starting with accessing elements using array indexing.
Getting a single element
We can access and modify single elements:
a = np.array([0.0, 2.0, 3.0, 4.0, 5.0]) print(a) # 0.0 a = 1.0 print(a) # [1., 2., 3., 4., 5.]
Accessing multiple elements
We can access and modify multiple specific elements in a NumPy array at once. Note that Python lists do not have this feature:
a = np.array([0.0, 2.0, 3.0, 4.0, 5.0]) # Get elements at position 0 and 2 print(a[[0, 2]]) # [0., 3.] # Change the first two elements a[[0, 1]] = [0, 3.0] print(a) # [0., 3., 3., 4., 5.]
Negative indices work the same as with lists; they count indices backward. For example, to get elements at the end of the array you can use:
a = np.array([0.0, 2.0, 3.0, 4.0, 5.0]) print(a[-1]) # 5.0 print(a[-2]) 4.0
Slicing works as well, and it behaves exactly like the regular slicing of lists, e.g., the format is
a[start: stop: step]. As an example, let’s get the first three elements of an array:
a = np.array([0.0, 2.0, 3.0, 4.0, 5.0]) print(a[0: 3])
Or all the elements except for the last one:
print(a[0: -1]) # [0., 2., 3., 4.]
And like lists, we can also reverse the array this way:
print(a[:: -1]) # [5., 4., 3., 2., 0.]
Append, insert, delete, and sort
NumPy arrays have more in common with lists. Many of the regular operations behave similarly to Python lists, like sorting, deleting, inserting, and appending data. Note that these methods all return a new array instead of modifying the given array.
Append to NumPy array
To append means to add elements to the end. We can append single elements to a NumPy array just like we do with lists:
a = np.array([1.0, 2.0]) a = np.append(a, 3.0) print(a) # [1., 2., 3.]
We’re used to using the
extend method to append multiple elements to a list. However, NumPy arrays reuse the same append function to add multiple elements:
a = np.array([1.0, 2.0]) np.append(a, [4.0, 5.0]) print(a) [1., 2., 4., 5.]
Insert into NumPy array
We can insert one or more elements at specific index locations using insert:
a = np.array([1.0, 2.0, 3.0, 4.0, 5.0]) # Insert one element at position 3 a = np.insert(a, 3, values=3.5) # a is now [1. , 2. , 3. , 3.5, 4. , 5. ] # Insert a list of elements at position 3 a = np.insert(a, 3, values=[100, 200]) # a is now [1. , 2. , 3. , 3.5, 100, 200, 4. , 5. ] # Insert multiple elements at multiple positions a = np.insert(a, [3, 5], values=[4.5, 5.5]) # a is nop [1. , 2. , 3. , 4.5, 4. , 5. , 5.5]
Delete elements from NumPy array
We can delete one or more elements at once as well:
a = np.array([1.0, 2.0, 3.0, 4.0, 5.0]) np.delete(a, -1) # a is now [1., 2., 3., 4.] np.delete(a, [0, 1]) # a is now [3., 4.]
Sorting NumPy array
There are two ways to sort a NumPy array: in-place sort and creating a new, sorted array. To start with that last one:
a = np.array([1.0, 3.0, 2.0, 4.0, 5.0]) b = np.sort(a) # b is now [1., 2., 3., 4., 5.]
To do an in-place sort, do as follows:
a = np.array([1.0, 3.0, 2.0, 4.0, 5.0]) a.sort() # a is now [1., 2., 3., 4., 5.]
To reiterate: notice that most methods do not belong to the array class except for the sort method. Hence, we have to call the methods on the
np object that accept the array as an argument. Hence these transformations do not happen in place but return a new array.
In the NumPy course (coming soon!), we will go through more functions and array methods that enable us to do much more with arrays.
Mathematical array operations
We’ll conclude this article with the most common mathematical operations that one might want to perform with arrays: sum, subtraction, multiplication, and division.
Arrays handle like scalars; operations are carried out element-wise. Hence, arrays can only be added, subtracted, multiplied, or divided by another array of the same size or a scalar.
Let’s define some arrays first, notice that
b have the same size of 4,
b_wrong_size has a different size of 3 elements:
a = np.array([1.0, 2.0, 3.0, 4.0]) b = np.array([2.0, 2.0, 2.0, 2.0]) b_wrong_size = np.array([2.0, 2.0, 2.0])
If we try to operate with arrays of different sizes, a
ValueError exception will be raised:
a = np.array([1.0, 2.0, 3.0, 4.0]) b_wrong_size = np.array([2.0, 2.0, 2.0]) # raises ValueError exception a + b_wrong_size ValueError: operands could not be broadcast together with shapes (4,) (3,)
Addition and subtraction
We can add arrays together or add a single value to each element of the array:
a = np.array([1.0, 2.0, 3.0, 4.0]) b = np.array([2.0, 2.0, 2.0, 2.0]) print(a + b) [3., 4., 5., 6.] print(a + 2) [3., 4., 5., 6.] print(a - b) [-1., 0., 1., 2.] print(a - 2) [-1., 0., 1., 2.]
Multiplication and division
The same is true for multiplication and division: we can either use a single value or two arrays:
a = np.array([1.0, 2.0, 3.0, 4.0]) b = np.array([2.0, 2.0, 2.0, 2.0]) print(a * b) [2., 4., 6., 8.] print(a * 2) [2., 4., 6., 8.] print(a / b) [0.5, 1. , 1.5, 2. ] print(a / 2) [0.5, 1. , 1.5, 2. ]
We’ve looked at one-dimensional array creation, accessing elements, array manipulation, and the most important mathematical operations on arrays. There’s a lot more to learn about NumPy. Python Land will soon release a full course on NumPy that covers everything you’d want to know. Until then, I recommend to following resources to learn more: