Numpy: The Foundation of Python Data Science

Python was conceived with one target in mind, simplicity. This simplicity has made Python one of the most popular languages today. However, Python developers have to sacrifice performance to make their lives easier. This performance sacrifice considerably impacts numerical and scientific computing, though! Luckily, NumPy is there to save the day!

This article is an introduction to NumPy. After reading it, you’ll know how to install and import NumPy and how to process numeric data with one-dimensional NumPy arrays. There will soon be a full course on NumPy right here on Python Land, covering many more topics and multi-dimensional arrays. if you want to be notified of its release, please subscribe to the low-volume Python Land newsletter.

Why NumPy?

Scientists have been interested in Python since the early days, thanks to its ease of use and extensibility. NumPy emerged from the efforts of the scientific Python community to tackle numerical computing weaknesses in Python. The main issues that needed tackling were:

  • Efficient array creation and manipulation
  • Mathematical functions and algorithms to operate on these arrays

NumPy was created in 2005 by merging two numerical packages available at the time: Numeric and Numarray. Since Python is not optimized for speed, most of NumPy’s heavy-lifting code is written in C, with some Fortran code doodling around the edges. It’s what makes NumPy blazing fast!

Since its creation, NumPy has become one of the most reputable third-party modules available for Python. There have even been fierce debates about whether to include it as one of Python’s standard modules. Only the very best of projects earn such honor.

Fast forward to the present, where we’ve seen an explosion of data science and machine learning jobs leveraging Python. NumPy is a cornerstone package that offers a solid foundation required by many other projects. Many data science and machine learning packages use NumPy under the hood, notably:

  • Pandas
  • scikit-learn
  • Tensorflow

Therefore, mastering NumPy is of paramount importance.

Installing NumPy

NumPy is not part of the default Python distribution, so you’ll need to install it with pip install (or Poetry / Pipenv):

pip install numpy

If you’re a Conda user, you can use:

conda install numpy

Importing NumPy

Like all packages, you can either import parts of NumPy or the entire package. There’s a convention to import the entire package and then rename it to np. It’s strongly recommended to use this convention as well. The simple reason for this is that most NumPy users use the package interactively, so they have to type less:

import numpy as np

Throughout this article, we’ll abide by the convention and assume you have NumPy imported as np.

NumPy arrays

At the core of NumPy are arrays. Let’s formally define what an array is:

array
A data structure with elements of the same type whose position can be accessed by one or more indices is an array. Hence, a programming language implementation of a vector, a matrix, or a tensor is an array.

Given this broad definition of arrays, Python has two built-in types that resemble an array: the list and the tuple. We could use a list of lists or a dictionary of lists to store multiple lists, effectively creating multidimensional arrays. However, neither the list, the tuple, nor the dictionary is optimized for numerical purposes. They don’t even adhere to our definition that well since they can store values of different types. E.g., a list can contain a mixed collection of numbers, strings, and any other object type.

NumPy solves many of the Python shortcomings regarding numerical computation through arrays. Especially array creation and manipulation in NumPy is blazing fast and well optimized.

Creating a 1-dimensional array

The easiest way to create an array is to pass a list to NumPy’s main utility to create arrays, np.array:

a = np.array([1, 2, 3])

The method accepts several optional keyword arguments, and we will discuss three of them: copy.

The copy argument

The copy argument states whether to make a copy of the input object. When copy is True, any changes in the resulting array will not change the input object. However, if it is False, changes in the array can change the input object.

When using lists to make arrays, NumPy will always copy the object regardless of the argument’s value; for example:

lst = [1, 2, 3]
a = np.array(lst, copy=False)
print(a)
# array([1, 2, 3])

If we change the array, the list will stay the same since NumPy copied it:

a[0] = 0
print(lst)
# [1, 2, 3]

If we create the same list but with another NumPy array as input:

a_in = np.array([1, 2, 3])
a = np.array(a_in, copy=False)
a

Let’s see what happens if we change the resulting array:

a[0] = 0
print(a)
# array([0,2,3])
print(a_in)
# array([0,2,3])

Both arrays changed because we set the copy option to False.

NumPy data types (dtypes)

Another keyword argument of the function np.array is dtype. This argument specifies the data type in the array. Remember, one of the key properties of an array is that all elements have the same type.

NumPy implements its own data types that are optimized for efficient storage and processing. For this, it uses the base class called dtype. Let’s take a look at the most common dtypes:

  • np.int16
  • np.int32
  • np.int64
  • np.float32
  • np.float64
  • np.float128
  • np.bool_
  • np.str_
  • np.bytes_
  • np.object_

In this article, we’ll focus on numeric types only.

Integers

The integer dtypes, np.int16, np.int32, and np.int64 differ only in the size of the number they can store:

  • np.int16 -> ± 32,762
  • np.int32 -> ± 2,147,483,647
  • np.int64 -> ± 9,223,372,036,854,775,807

Under normal circumstances using np.int64 is the way to go since it allows us to store the largest numbers. Int64 is the dtype NumPy uses by default. There are benefits, however, to using smaller integers:

  • Reduced memory usage
  • Faster computations

More often than not, memory usage won’t be an issue for relatively small arrays. If you think it will be, try the smaller types but ensure that all elements and the results of future operations on those elements will not exceed the maximum size of the chosen type.

Floats

The float types also refer to the size of the number in memory. The larger the size, the more precision our array’s elements will have. However, this precision comes at the expense of memory and performance. The rule of thumb is to use np.float64 by default. If you can spare some precision and the performance and memory usage are paramount, use something smaller.

Let’s explore how the float size affects precision in the REPL:

>>> np.array([1.3738729019013636723763], dtype=np.float16)[0]
1.374
>>> np.array([1.3738729019013636723763], dtype=np.float32)[0]
1.3738729
>>> np.array([1.3738729019013636723763], dtype=np.float64)[0]
1.3738729019013636
>>> np.array([1.3738729019013636723763], dtype=np.float128)[0]
1.3738729019013635746

On some systems, there’s even a float128 type as can be seen in the last example. This will probably give an error if you’re on Windows, but Linux and MacOS should support it.

Using NumPy arrays

We’ll now look closely at how to use NumPy arrays, starting with accessing elements using array indexing.

Getting a single element

We can access and modify single elements:

a = np.array([0.0, 2.0, 3.0, 4.0, 5.0])
print(a[0])
# 0.0

a[0] = 1.0
print(a)
# [1., 2., 3., 4., 5.]

Accessing multiple elements

We can access and modify multiple specific elements in a NumPy array at once. Note that Python lists do not have this feature:

a = np.array([0.0, 2.0, 3.0, 4.0, 5.0])

# Get elements at position 0 and 2
print(a[[0, 2]])
# [0., 3.]

# Change the first two elements
a[[0, 1]] = [0, 3.0]
print(a)
# [0., 3., 3., 4., 5.]

Negative indexing

Negative indices work the same as with lists; they count indices backward. For example, to get elements at the end of the array you can use:

a = np.array([0.0, 2.0, 3.0, 4.0, 5.0])
print(a[-1])
# 5.0

print(a[-2])
4.0

Slicing

Slicing works as well, and it behaves exactly like the regular slicing of lists, e.g., the format is a[start: stop: step]. As an example, let’s get the first three elements of an array:

a = np.array([0.0, 2.0, 3.0, 4.0, 5.0])
print(a[0: 3])

Or all the elements except for the last one:

print(a[0: -1])
# [0., 2., 3., 4.]

And like lists, we can also reverse the array this way:

print(a[:: -1])
# [5., 4., 3., 2., 0.]

Append, insert, delete, and sort

NumPy arrays have more in common with lists. Many of the regular operations behave similarly to Python lists, like sorting, deleting, inserting, and appending data. Note that these methods all return a new array instead of modifying the given array.

Append to NumPy array

To append means to add elements to the end. We can append single elements to a NumPy array just like we do with lists:

a = np.array([1.0, 2.0])
a = np.append(a, 3.0)
print(a)
# [1., 2., 3.]

We’re used to using the extend method to append multiple elements to a list. However, NumPy arrays reuse the same append function to add multiple elements:

a = np.array([1.0, 2.0])
np.append(a, [4.0, 5.0])
print(a)
[1., 2., 4., 5.]

Insert into NumPy array

We can insert one or more elements at specific index locations using insert:

a = np.array([1.0, 2.0, 3.0, 4.0, 5.0])

# Insert one element at position 3
a = np.insert(a, 3, values=3.5)
# a is now [1. , 2. , 3. , 3.5, 4. , 5. ]

# Insert a list of elements at position 3
a = np.insert(a, 3, values=[100, 200])
# a is now [1. , 2. , 3. , 3.5, 100, 200, 4. , 5. ]

# Insert multiple elements at multiple positions
a = np.insert(a, [3, 5], values=[4.5, 5.5])
# a is nop [1. , 2. , 3. , 4.5, 4. , 5. , 5.5]

Delete elements from NumPy array

We can delete one or more elements at once as well:

a = np.array([1.0, 2.0, 3.0, 4.0, 5.0])
np.delete(a, -1)
# a is now [1., 2., 3., 4.]

np.delete(a, [0, 1])
# a is now [3., 4.]

Sorting NumPy array

There are two ways to sort a NumPy array: in-place sort and creating a new, sorted array. To start with that last one:

a = np.array([1.0, 3.0, 2.0, 4.0, 5.0])
b = np.sort(a)
# b is now [1., 2., 3., 4., 5.]

To do an in-place sort, do as follows:

a = np.array([1.0, 3.0, 2.0, 4.0, 5.0])
a.sort()
# a is now [1., 2., 3., 4., 5.]

To reiterate: notice that most methods do not belong to the array class except for the sort method. Hence, we have to call the methods on the np object that accept the array as an argument. Hence these transformations do not happen in place but return a new array.

In the NumPy course (coming soon!), we will go through more functions and array methods that enable us to do much more with arrays.

Mathematical array operations

We’ll conclude this article with the most common mathematical operations that one might want to perform with arrays: sum, subtraction, multiplication, and division.

Arrays handle like scalars; operations are carried out element-wise. Hence, arrays can only be added, subtracted, multiplied, or divided by another array of the same size or a scalar.

Let’s define some arrays first, notice that a and b have the same size of 4, b_wrong_size has a different size of 3 elements:

a = np.array([1.0, 2.0, 3.0, 4.0])
b = np.array([2.0, 2.0, 2.0, 2.0])
b_wrong_size = np.array([2.0, 2.0, 2.0])

If we try to operate with arrays of different sizes, a ValueError exception will be raised:

a = np.array([1.0, 2.0, 3.0, 4.0])
b_wrong_size = np.array([2.0, 2.0, 2.0])

# raises ValueError exception
a + b_wrong_size

ValueError: operands could not be broadcast together with shapes (4,) (3,)

Addition and subtraction

We can add arrays together or add a single value to each element of the array:

a = np.array([1.0, 2.0, 3.0, 4.0])
b = np.array([2.0, 2.0, 2.0, 2.0])

print(a + b)
[3., 4., 5., 6.]
print(a + 2)
[3., 4., 5., 6.]

print(a - b)
[-1.,  0.,  1.,  2.]
print(a - 2)
[-1.,  0.,  1.,  2.]

Multiplication and division

The same is true for multiplication and division: we can either use a single value or two arrays:

a = np.array([1.0, 2.0, 3.0, 4.0])
b = np.array([2.0, 2.0, 2.0, 2.0])

print(a * b)
[2., 4., 6., 8.]
print(a * 2)
[2., 4., 6., 8.]

print(a / b)
[0.5, 1. , 1.5, 2. ]
print(a / 2)
[0.5, 1. , 1.5, 2. ]

Conclusion

We’ve looked at one-dimensional array creation, accessing elements, array manipulation, and the most important mathematical operations on arrays. There’s a lot more to learn about NumPy. Python Land will soon release a full course on NumPy that covers everything you’d want to know. Until then, I recommend to following resources to learn more:

Python Courses

Are you enjoying this free tutorial? Please also have a look at my premium courses. They offer a superior user experience with small, easy-to-digest lessons and topics, progress tracking, quizzes to test your knowledge, and practice sessions. Each course will earn you a downloadable course certificate.

The Python Fundamentals Course For Beginners
Now for $29 (from $49)

Python Fundamentals I is a course for beginners that will get you started with Python in no time. Learn all the essentials, test your progress with quizzes and assignments, and bring it all together with the final course project!

Python Course for Beginners

Modules, Packages, And Virtual Environments
Now for $29 (from $49)

Python Fundamentals II covers creating your own modules and packages, using virtual environments and Python package managers to make your life as a programmer easier. Advance your productivity as a Python programmer!

Python Fundamentals 2

Leave a Comment