A Python set is a collection of distinct elements. The set has some things in common with Python lists and tuples, but there are important differences:

- A Python set can only contain
**unique values** - Sets are
**unordered**

More formally: sets are *unordered* collections of *distinct* objects. In this article, we’ll closely examine sets and how to use them. I’ll focus more on the extras that a set has to offer than on the basics that are the same for lists and other sequence types.

Table of Contents

## How to create a Python set

Depending on the situation, there are a couple of ways to create a Python set. To create a set from scratch and directly add some elements to it, you can use curly braces:

names = { "Eric", "Ali", "John" } # Mixed types are allowed mixed_set = { 'a', 1, (1, 2) }

Sets use the same curly braces as Python dictionaries, but they are easy to distinguish because a set always contains a sequence of elements separated by commas. In contrast, dictionaries contain key-value pairs that are specified with colons.

To create an empty set, you can use the `set()`

function:

my_set = set() my_set.add(1) my_set.add('Erik')

As you can see, you can add elements to a set using the `add`

method on a set object. If you want to add multiple elements at once, you need to use the update method and provide an iterable object like a list, range, or tuple:

my_set = set() my_set.update(range(3)) my_set.update(['a', 'b']) print(my_set) # {0, 1, 2, 'b', 'a'}

You can also use the `set()`

function to convert any iterable object into a set:

print( set([1, 2, 3]) ) # {1, 2, 3} print( set(range(3)) ) # {0, 1, 2}

Finally, you can use a set comprehension to create sets. Set comprehensions work exactly like list comprehensions, so I suggest reading the linked article if the concept is new to you.

Here’s an example for the sake of demonstration. Remember that strings are sequence objects, too, so they are iterable. Since we can filter a set comprehension, let’s filter punctuation and space characters as well:

my_set = { x for x in 'Hi, my name is...' if x not in '., ' } print(my_set) # {'n', 'a', 'e', 'i', 's', 'y', 'H', 'm'}

And with this example, it becomes abundantly clear that sets have no order!

## Sets and lists

Before continuing, I want to share two commonly used tricks that we can perform using sets.

### Deduplicate a list

Sets only contain unique elements and we can create a set by giving the set() function a list. Those are all the ingredients you need to deduplicate a list. Deduplication is the process of removing duplicates and converting a list to a set is by far to easiest way to do this in Python:

my_list = [1, 1, 1, 2, 3, 4, 4, 4, 4, 2, 2, 2] my_set = set(my_list) print(my_set) # {1, 2, 3, 4}

### Convert set to list

To convert a set to a list, simply create a new list with the set as the argument:

A = { 1, 2, 3 } my_list = list(A) print(my_list) # [1, 2, 3]

## Why would you need sets?

People use sets for a number of reasons.

- The most common one is to remove duplicates from a sequence (like lists, as demonstrated before)
- Many use them to perform membership testing (is an element present in this set of unique elements)

But there are more reasons to use sets. If you’re unfamiliar with set theory, you might want to read about it on Wikipedia. I’ll do my best to explain the basics, too, though. In short: sets can be used to perform mathematical set operations like:

- Finding the
**difference**between two sets **Union**: combining sets and only keeping unique elements**Intersection**: which elements are present in both sets- Finding
**subsets**and**supersets**

These operations can be visualized with a Venn diagram. Venn diagrams show the logical relation between sets. Chances are you’ve seen the following before in your life:

In the following examples, I’ll refer back to this image and use the names A and B. All the examples here use two sets, but it’s important to know they work just as well with more than two sets.

## Mathematical Python set operations

This section will demonstrate and explain all the mathematical set operations. Don’t let the math part scare you, it’s not that hard!

### Finding the difference between Python sets

Let’s define two sets, A and B, and find the difference between them. What is the difference between two sets? When looking at the Venn diagram, we want to find the elements that are only present in A. In other words, we want to get rid of any overlapping elements that are also in B. Or, even more specific: we want all elements that are in A but not in A ∩ B.

We can do so by using the minus (subtraction) operator:

A = { 1, 2, 3, 4, 5 } B = { 3, 4, 5, 6, 7 } print(A-B) # {1, 2} # And the reverse print(B-A) # {6, 7}

A and B have some overlap: the numbers 3, 4, and 5 are in both sets. These numbers fall into the section that is labeled with A ∩ B when looking at the Venn diagram. If we want only the unique numbers that are in A, we ‘subtract’ B from A by using A – B. When we only want the unique set of numbers from B, we subtract A from B: B – A.

### Find the symmetric difference between Python sets

The symmetric difference between two sets consists of the elements that are either in set A or in set B, but not in both. In other words: all the elements in A plus the elements from B, minus the A ∩ B part. To find the symmetric difference, we can use the ^ operator:

A = { 1, 2, 3, 4, 5 } B = { 3, 4, 5, 6, 7 } print(A^B) # {1, 2, 6, 7}

### Find the intersection of two Python sets

The intersection is the part of the Venn diagram labeled with A ∩ B. The intersection consists of elements present in both sets. To find the intersection, we can use the & operator:

A = { 1, 2, 3, 4, 5 } B = { 3, 4, 5, 6, 7 } print(A & B) # {3, 4, 5}

### Subsets and supersets

If A is a subset of B, all elements of A are also present in B. However, subset A can be smaller than set B, meaning some elements in B might not be present in A. So if A overlaps almost completely, but has one element that’s not present in B, it’s not a subset of B. We can check if A is a subset of B with the *smaller than* operator: <.

If B is a superset of A, it means that B has all the elements of A but it may also have extra elements. We can check if B is a superset of A with the *greater than* operator: >.

Let’s look at some examples:

A = { 1, 2, 3 } B = { 1, 2, 3, 4, 5 } C = { 1, 2, 3, 10 } # is A a subset of B print(A < B) # True # is C a subset of B? print(C < B) # False # No, it has a 10 that's not in B # Is B a superset of A? print(B > A) # True # B is not a superset of C since C has a 10 in it print(B > C) # False print(A < A) # False print(A <= A) # True print(A >= A) # True

### Proper subsets and supersets

Python differentiates between a *subset* (in math, we write ⊂), which is also called a proper subset, and a subset or equal to (in math, we write ⊆).

The < and > operators do the former, the check for a proper subset. If A < B, it means A is a subset of B and it’s not itself (A != B). You can check this for yourself with `A < A`

, which returns `False`

. If you want to check for *subset or equal to*, you can use the <=. The same holds for supersets: use >=.

### Union

Finally, we can add two Python sets together and only keep unique elements. This is called a union of sets. In mathematics, the notation for a union is A ∪ B, but in Python, we must use the pipe operator (the `|`

) to create unions:

A = { 1, 2, 3 } B = { 3, 4, 5 } print(A|B) # {1, 2, 3, 4, 5}

All elements from A and B are present in the newly created set, but because sets only contain unique values, the overlapping element 3 is present only once.

## Named set methods

Almost all the set methods have named equivalents, e.g.:

- The <= and >= can also be used by calling the
`issubset()`

and`issuperset()`

methods on a set object. - The | operator can be replaced with a call to the
`union()`

method - … and so forth

In the table at the end of this article, I’ll list all operations and their equivalent names.

The big difference between using the operator and the named methods is that the named methods take any iterable object as an argument. So you can calculate the union between set A and list L using A.union(L). This saves us a conversion from a Python list to a Python set, thus it is more efficient.

## Python frozenset

Besides the regular, mutable `set`

, there’s also the `frozenset`

. You can guess how this datatype differs from a regular set: it’s frozen directly after creation, so you can’t add or remove elements from it. However, you can mix the `set`

and `frozenset`

types: all the regular operations like union and intersection work on combinations of a `set`

and `frozenset`

too.

The advantage of a `frozenset`

is that it’s hashable, meaning you can use it as a dictionary key, or even as an element in another `set`

.

## All Python set operations

The following tables conveniently list all the mathematical set operations, the required operators, their named method equivalent, example code, and what it does:

Name | Operator example | Method example | What it does |
---|---|---|---|

Union | A | B | `A.union(B)` | Create a set that combines A and B |

Intersection | A & B | `A.intersection(B)` | Create a set with elements common between A and B |

Difference | `A - B` | `A.difference(B)` | Create a set with elements that are not in B |

Symmetric difference | `A ^ B` | `A.symmetric_difference(B)` | Create a set with elements that are in A or B, but not in both |

Is superset? | `A >= B` | `A.issuperset(B)` | Returns `True` if every element of B is in A |

Is subset? | `A <= B` | `A.issubset(B)` | Returns `True` if every element of A is in B |

Is disjoint? | No operator | `A.isdisjoint(B)` | Returns `True` if A has no elements in common with B |

Is proper superset? | A > B | There’s no method | True if A >= B and A != B |

Is proper subset? | A < B | There’s no method | True if A =< B and A != B |

Note that you can use the above operations in an assignment style too, e.g.:

- A |= B will update set A, adding elements from set B
- A |= B | C | … will update set A, adding elements from set B, C, etcetera.

The above works for intersections (&) and difference (-) too.

In the next table, I listed some extra operations that you can use to manipulate sets:

Name | Example | What is does |
---|---|---|

add | A.add(elem) | Add element to set |

remove | A.remove(elem) | Remove element from set |

discard | A.discard(elem) | Remove element from the set if it is present (save you a presence check) |

pop | A.pop() | Remove and return an arbitrary element from the set (raises KeyError if the set is empty) |

clear | A.clear() | Remove all elements from the set |

## Set indexing

If you were hoping to access set elements by using indexing (e.g. with `my_set[1]`

) you’re out of luck. Sets have no order, and thus can’t be indexed. If you think you need set indexing, you might be thinking in the wrong direction. Look at other solutions, like using lists or converting your set to a list.

## Conclusion

We’ve looked at Python sets, and how they differ from other sequence types like lists and tuples. Besides deduplication of sequences, sets can be used to perform set calculations. We reviewed all the set calculations and looked at example code to see how they work. Using the convenient table, you can quickly review all the set operations.

Although I believe the article touched on everything you need to know, you might also want to look at the official documentation on sets.

Hi. Thanks for the great lessons, so far.

I’ve got one doubt:

Do I risk messing the list’s order when deduplicating with list(set(x)) ?

Great question. Yes, sets are unordered, so you risk messing up the order. It will depend on the Python implementation, but I would not count the order being maintained.

I was trying to use dash as a disjoint operator until realized what that means. I think it would be more clear if you replace it with “no operator”.

Good point, thanks. I updated the table.