A Python set is a collection of distinct elements. The set has some things in common with lists and tuples, but there are important differences:

- A Python set can only contain
**unique values** - Sets are
**unordered**

More formally: sets are *unordered* collections of *distinct* objects. In this article, we’ll take a close look at sets and how to use them. I’ll focus more on the extra’s that a set has to offer than on the basics that are the same for lists and other sequence types.

Table of contents

## How to create a Python set

There are a couple of ways to create a Python set, depending on the situation. To create a set from scratch and directly add some elements to it, you can use curly braces:

names = { "Eric", "Ali", "John" } # Mixed types are allowed mixed_set = { 'a', 1, (1, 2) }

Sets use the same curly braces as Python dictionaries, but they are easy to distinguish because a set always contains a sequence of elements, separated by commas. In contrast, dictionaries contain key-value pairs that are specified with colons.

To create an empty set, you can use the `set()`

function:

my_set = set() my_set.add(1) my_set.add('Erik')

As you can see, you can add elements to a set using the `add`

method on a set object. If you want to add multiple elements at once, you need to use the update method and provide an iterable object like a list, range, or tuple:

my_set = set() my_set.update(range(3)) my_set.update(['a', 'b']) print(my_set) # {0, 1, 2, 'b', 'a'}

You can also use the `set()`

function to convert any iterable object into a set:

print( set([1, 2, 3]) ) # {1, 2, 3} print( set(range(3) ) # {1, 2, 3}

Finally, you can use a set comprehension to create sets. Set comprehensions work exactly like list comprehensions, so I suggest reading the linked article if the concept is new to you.

Here’s an example for the sake of demonstration. Remember that strings are sequence objects too, so they are iterable. Since we can filter a set comprehension, lets filter punctuation and space characters as well:

my_set = { x for x in 'Hi, my name is...' if x not in '., ' } print(my_set) # {'n', 'a', 'e', 'i', 's', 'y', 'H', 'm'}

And with this example, it becomes abundantly clear that sets have no order!

## Sets and lists

Before we continue, I want to share two commonly used tricks that many people search for.

### Deduplicate a list

Sets only contain unique elements, and we can create a set by giving the `set()`

function a list. Those are all the ingredients you need to deduplicate a list. Deduplication is the process of removing duplicates, and converting a list to a set is by far to easiest way to do this in Python:

my_list = [1, 1, 1, 2, 3, 4, 4, 4, 4, 2, 2, 2] my_set = set(my_list) print(my_set) # {1, 2, 3, 4}

### Convert set to list

To convert a set to a list, simply create a new list with the set as the argument:

A = { 1, 2, 3 } my_list = list(A) print(my_list) # [1, 2, 3]

## Why would you need sets?

People use sets for a number of reasons.

- The most common one is to remove duplicates from a sequence (like lists, as demonstrated before)
- Many use them to perform membership testing (is an element present in this set of unique elements)

But there are more reasons to use sets. If you’re not familiar with set theory, you might want to read up on it on Wikipedia. I’ll do my best to explain the basics too, though. In short: sets can be used to perform mathematical set operations like:

- Finding the
**difference**between two sets **Union**: combining sets, and only keeping unique elements**Intersection**: which elements are present in both sets- Finding
**subsets**and**supersets**

These operations can be visualized with a Venn diagram. Venn diagrams show the logical relation between sets. Chances are you’ve seen the following before in your life:

In the examples that follow, I’ll refer back to this image and I’ll use the names A and B. All the examples here use two sets, but it’s important to know that they work just as well with more than two sets.

## Mathematical Python set operations

This section will demonstrate and explain all the mathematical set operations. Don’t let the math part scare you, it’s not that hard!

### Finding the difference between Python sets

Let’s define two sets, A and B, and find the difference between them. What is the difference between two sets? When looking at the Venn diagram, we want to find the elements that are only present in A. In other words, we want to get rid of any overlapping elements that are also in B. Or, even more specific: we want all elements that are in A but not in A ∩ B.

We can do so by using the minus (subtraction) operator:

A = { 1, 2, 3, 4, 5 } B = { 3, 4, 5, 6, 7 } print(A-B) # {1, 2} # And the reverse print(B-A) # {6, 7}

A and B have some overlap: the numbers 3, 4, and 5 are in both sets. These numbers fall into the section that is labeled with A ∩ B when looking at the Venn diagram. If we want only the unique numbers that are in A, we ‘subtract’ B from A by using A – B. When we only want the unique set of numbers from B, we subtract A from B: B – A.

### Find the symmetric difference between Python sets

The symmetric difference between two sets consists of the elements that are either in set A or in set B, but not in both. In other words: all the elements in A plus the elements from B, minus the A ∩ B part. To find the symmetric difference, we can use the ^ operator:

A = { 1, 2, 3, 4, 5 } B = { 3, 4, 5, 6, 7 } print(A^B) # {1, 2, 6, 7}

### Find the intersection of two Python sets

The intersection is the part of the Venn diagram labeled with A ∩ B. The intersection consists of elements present in both sets. To find the intersection, we can use the & operator:

A = { 1, 2, 3, 4, 5 } B = { 3, 4, 5, 6, 7 } print(A & B) # {3, 4, 5}

### Subsets and supersets

If A is a subset of B, it means that all elements of A are also present in B. However, subset A can be smaller than set B, meaning some elements in B might not be present in A. So if A overlaps almost completely, but has one element that’s not present in B, it’s not a subset of B. We can check if A is a subset of B with the *smaller than* operator: <.

If B is a superset of A, it means that B has all the elements of A but it may also have extra elements. We can check if B is a superset of A with the *greater than* operator: >.

Let’s look at some examples:

A = { 1, 2, 3 } B = { 1, 2, 3, 4, 5 } C = { 1, 2, 3, 10 } # is A a subset of B print(A < B) # True # is C a subset of B? print(C < B) # False # No, it has a 10 that's not in B # Is B a superset of A? print(B > A) # True # B is not a superset of C since C has a 10 in it print(B > C) # False print(A < A) # False print(A <= A) # True print(A >= A) # True

### Proper subsets and supersets

Python differentiates between a *subset* (in math we write ⊂), which is also called a proper subset, and a *subset or equal to* (in math we write ⊆).

The < and > operators do the former, the check for a proper subset. If A < B, it means A is a subset of B and it’s not itself (A != A). You can check this for yourself too with `A < A`

, which returns `False`

. If you want to check for *subset or equal to*, you can use the <=. The same holds for supersets: use >=.

### Union

Finally, we can add two Python sets together and only keep elements that are unique. This is called a union of sets. In mathematics, the notation for a union is A ∪ B, but in Python, we must use the pipe operator (the `|`

) to create unions:

A = { 1, 2, 3 } B = { 3, 4, 5 } print(A|B) # {1, 2, 3, 4, 5}

All elements from A and B are present in the newly created set, but because sets only contain unique values, the overlapping element 3 is present only once.

## Named set methods

Almost all the set methods have named equivalents, e.g.:

- The <= and >= can also be used by calling the
`issubset()`

and`issuperset()`

methods on a set object. - The | operator can be replaced with a call to the
`union()`

method - … and so forth

In the table at the end of this article, I’ll list all operations and their equivalent names.

The big difference between using the operator versus using the named methods is that **the named methods take any iterable object as an argument**. So you can calculate the union between set A and list L by using A.union(L). This saves us a conversion from a Python list to a Python set and thus is more efficient.

## Python frozenset

Besides the regular, mutable `set`

, there’s also the `frozenset`

. You can guess how this datatype differs from a regular set: it’s frozen directly after creation, so you can’t add or remove elements from it. However, you can mix the `set`

and `frozenset`

types: all the regular operations like union and intersection work on combinations of a `set`

and `frozenset`

too.

The advantage of a `frozenset`

is that it’s hashable, meaning you can use it as a dictionary key, or even as an element in another `set`

.

## All Python set operations

The following tables conveniently list all the mathematical set operations, the required operators, their named method equivalent, example code, and what it does:

Name | Operator example | Method example | What it does |
---|---|---|---|

Union | A | B | `A.union(B)` | Create a set that combines A and B |

Intersection | A & B | `A.intersection(B)` | Create a set with elements common between A and B |

Difference | `A - B` | `A.difference(B)` | Create a set with elements that are not in B |

Symmetric difference | `A ^ B` | A.symmetric_difference(B) | Create a set with elements that are in A or B, but not in both |

Is superset? | `A >= B` | `A.issuperset(B)` | Returns `True` if every element of B is in A |

Is subset? | `A <= B` | `A.issubset(B)` | Returns `True` if every element of A is in B |

Is disjoint? | – | `A.isdisjoint(B)` | Returns `True` if A has no elements in common with B |

Is proper superset? | A > B | There’s no method | True if A >= B and A != B |

Is proper subset? | A < B | There’s no method | True if A =< B and A != B |

Note that you can use the above operations in an assignment style too, e.g.:

- A |= B will update set A, adding elements from set B
- A |= B | C | … will update set A, adding elements from set B, C, etcetera.

The above works for intersections (&) and difference (-) too.

In the next table, I listed some extra operations that you can use to manipulate sets:

Name | Example | What is does |
---|---|---|

add | A.add(elem) | Add element to set |

remove | A.remove(elem) | Remove element from set |

discard | A.discard(elem) | Remove element from the set if it is present (save you a presence check) |

pop | A.pop() | Remove and return an arbitrary element from the set (raises KeyError if the set is empty) |

clear | A.clear() | Remove all elements from the set |

## Set indexing

If you were hoping to access set elements by using indexing (e.g. with `my_set[1]`

) you’re out of luck. Sets have no order, and thus can’t be indexed. If you think you need set indexing, you might be thinking in the wrong direction. Look at other solutions, like using lists or converting your set to a list.

## Conclusion

We’ve looked at Python sets, and how they differ from other sequence types like lists and tuples. Besides deduplication of sequences, sets can be used to perform set calculations. We reviewed all the set calculations and looked at example code to see how they work. Using the convenient table, you can quickly review all the set operations.

Although I believe the article touched everything you need to know, you might want to look at the official documentation on sets as well.