Python YAML: How to Load, Read, and Write YAML

YAML, a recursive acronym for “YAML Ain’t Markup Language”, is a human-readable data serialization language. It is often used for configuration files, but can also be used for data exchange. The most used python YAML parser is PyYAML, a library that allows you to load, parse, and write YAML, much like Python’s JSON library helps you to work with JSON.

Why use YAML with Python?

YAML is easy to write and read for humans, while at the same time it’s also easy to parse YAML, especially with Python! That’s basically the biggest advantage YAML has over other formats, like JSON and XML. But there are other advantages.

These are the most prominent features of YAML:

  • You can use comments in YAML files
  • You can store multiple documents in one YAML file, with the --- separator
  • It’s easy to read for humans
  • It’s easy to parse for computers

There are some downsides to using YAML with Python too, though:

  • YAML is not part of the standard Python library, while XML and JSON are
  • Its dependence on indentation is frustrating sometimes (however, Python developers are used to that, right?)

If you ask me, YAML is perfect for configuration files. That’s exactly how I, and many other developers, use it the most. It has a richer syntax than the often used alternative, .ini files, but is still nice on the eyes and simple to write and parse.

If you’re looking for a good data format for data exchange and storage, I recommend JSON, XML, or other advanced formats like protocol buffers or Avro.

Installing and importing PyYAML

There are multiple Python packages that can parse YAML data. However, PyYAML is the most prevalent and also the most complete implementation for parsing YAML. PyYAML is not part of the standard Python library, meaning you need to install it with Pip. Use the following command to install PyYAML, preferable in a virtual environment:

$ pip install pyyaml

To use PyYAML in your scripts, import the library as follows. Note that you don’t import ‘pyyaml’, but simply ‘yaml’:

import yaml

Reading and parsing a YAML file with Python

Once we have the YAML parser imported, we can load a YAML file and parse it. YAML files usually carry the extension .yaml or .yml. Let’s work with the following example YAML file, called config.yaml:

rest:
  url: "https://example.org/primenumbers/v1"
  port: 8443

prime_numbers: [2, 3, 5, 7, 11, 13, 17, 19]

Now, loading, parsing, and using this configuration file is very similar to loading JSON with the Python JSON library. Please note that I changed the output a little to make it more readable for you:

>>> import yaml
>>> with open('config.yml', 'r') as file
...    prime_service = yaml.safe_load(file)

>>> prime_service
{'rest': 
  { 'url': 'https://example.org/primenumbers/v1',
    'port': 8443
  },
  'prime_numbers': [2, 3, 5, 7, 11, 13, 17, 19]}

>>> prime_service['rest']['url']
https://example.org/primenumbers/v1

The YAML parser returns a regular Python object that best fits the data. In this case, it’s a Python dictionary. This means all the regular dictionary features can be used, like using get() with a default value.

Parsing YAML strings with Python

You can use yaml.safe_load() to parse all kinds of valid YAML strings. Here’s an example that parses a simple list of items into a Python list:

>>> import yaml
>>>
>>> names_yaml = """
... - 'eric'
... - 'justin'
... - 'mary-kate'
... """
>>>
>>> names = yaml.safe_load(names_yaml)
>>> names
['eric', 'justin', 'mary-kate']

Parsing multiple YAML documents at once

YAML allows you to define multiple documents in one file, separating them with a triple dash (---). PyYAML will happily parse such files too, and return a list of documents.

Writing (or dumping) YAML to a file

Although most will only read YAML as a configuration file, it can be very handy to write YAML as well. For example to:

  • Create an initial configuration file with current settings for your user
  • To save state of your program in an easy to read file (instead of using something like Pickle)

In the following example, we’ll:

  • Create a list with names as we did before
  • Save the names to a YAML formatted file with yaml.dump
  • Read and print the file, as proof that everything worked like a charm

Here you go:

import yaml

names_yaml = """
- 'eric'
- 'justin'
- 'mary-kate'
"""

with open('names.yaml', 'w') as file:
    yaml.dump(names, file)

print(open('names.yaml').read())
- eric
- justin
- mary-kate

PyYAML safe_load() vs load()

You will encounter many examples of PyYAML usage where load() is used instead of safe_load(). I intentionally didn’t tell you about the load() function until now. Since most people have a job to do and tend to quickly copy-paste some example code, I wanted them to use the safest method of parsing YAML with Python.

However, if you’re curious about the difference between these two, here’s the short summary: load() is a very powerful function, just like pickle, if you know that function. Both are very insecure methods because they allow an attacker to execute arbitrary code. PyYAML’s load function allows you to serialize and deserialize complete Python objects and even allows you to execute Python code, including calls to the os.system library, which in turn can execute any command on your system.

In recent PyYAML versions, the load() function is deprecated and will issue a big fat warning when you use it in an insecure way.

If you’re parsing regular YAML files, like 99% of us do, you should always use safe_load(), since it only contains a subset of the load function. All the scary, arbitrary code execution type of stuff is stripped out. The full details are explained on this site if you’re interested.

Resources

About the author

Erik is the owner of Python Land and the author of many of the articles and tutorials on this website. He's been working as a professional software developer for 25 years, and he holds a Master of Science degree in computer science. His favorite language of choice: Python!

Leave a Comment