CSV is short for comma-separated values, and it’s a common format to store all kinds of data. Many tools offer an option to export data to CSV. Python’s CSV module is a built-in module that we can use to read and write CSV files. In this article, you’ll learn to use the Python CSV module to read and write CSV files. In addition, we’ll look at how to write CSV files with NumPy and Pandas, since many people use these tools as well.
Table of Contents
Python’s CSV module
Python has a CSV module built-in, so there’s no need to install anything. We’ll first look at Python’s built-in CSV module before we dive into using alternatives like Numpy and Pandas. In many cases, Python’s module will offer everything you need without requiring extra dependencies for your script.
It is good to know that if you plan to use the data with NumPy, you can use Numpy’s functionality instead. Similarly, Pandas has its read_csv
function to read CSV directly into a DataFrame. These options are demonstrated extensively in this article, and, spoiler alert, they offer some advantages compared to Python’s built-in module. Especially if you were planning on using Numpy or Pandas anyway.
Import the Python CSV module
For starters, let’s import the csv
module. This couldn’t be simpler:
import csv
Read CSV files with Python
Now that we know how to import the CSV module let’s see how we can use Python to open a CSV file and read the data from it. In the following interactive crumb, we read a CSV file with names, ages, and countries and use print()
to display each parsed line:
Let’s walk through this step-by-step and see what’s going on.
After importing the CSV module, we open the CSV file with Python open. There’s one peculiarity that might catch your eye: the newline=''
argument to the open()
function. This ensures that open
won’t try to convert newlines but instead return them as-is. The csvreader
will instead handle the newlines based on the platform and selected CSV dialect (more on that later!).
Once we have an open file, we use the csv.reader()
to parse the CSV file. It’s good to know that this reader will not read the entire file at once. It accepts any iterable object and starts requesting rows from the iterator. As you may know, a file object will not read a file at once but will read it in chunks, depending on how large the file is. So this CSV reader can process large files without causing memory issues.
The call to csv.reader()
itself returns an iterator as well; hence we can use a simple for-loop to iterate over the CSV file from here on.
Write CSV with Python
Now that we know how to read CSV let’s see how to write CSV in Python. As you may have guessed, there’s also a csv.writer()
function that we can use to write to a file:
import csv # Open the file in write mode with open("output.csv", "w") as csv_file: # Create a writer object csv_writer = csv.writer(csv_file) # Write the data to the file csv_writer.writerow(["Name", "Age", "Country"]) csv_writer.writerow(["John Doe", 30, "United States"]) csv_writer.writerow(["Jane Doe", 28, "Canada"])
We open the output.csv
file in write mode and create a writer object. Next, use the writerow()
method to write new rows. The writerow()
method takes a list of values and writes them to a single row in the CSV file.
Add to CSV
Adding extra data to an existing CSV file is similar to writing a new one. We just need to open the file in another mode: append mode. More on file modes can be learned in the article on Python files. As an example, we will append some more lines to the output.csv
file from above:
import csv # Open the file in append mode with open("output.csv", "a") as csv_file: # Create a writer object csv_writer = csv.writer(csv_file) # Write the new data to the file csv_writer.writerow(["Joe Smith", 35, "United Kingdom"]) csv_writer.writerow(["Mary Smith", 32, "France"])
Choosing a CSV dialect
you can choose a CSV dialect when working with the csv
module in Python. A CSV dialect is a set of parameters that defines the specific format of a CSV file. This includes the character used to delimit fields, the character used to quote fields, and other formatting details.
The csv
module provides some pre-defined dialects that you can use, such as excel
, excel-tab
, and unix
. You can specify the dialect that you want to use when creating a writer object, like this:
import csv with open("output.csv", "w", newline="") as csv_file: # Create a writer object, using the `excel` dialect csv_writer = csv.writer(csv_file, dialect="excel") ...
In this example, we create a writer object using the excel
dialect. This tells the csv
module to use the formatting conventions of the excel
dialect when writing the data to the file.
Custom dialects
You can create your own custom dialect by defining the dialect parameters yourself. This can be useful if you need to write a CSV file in a specific format that is not supported by the pre-defined dialects.
To create a custom dialect, you can use the csv.register_dialect()
function, like this:
import csv # Define the custom dialect my_dialect = csv.register_dialect("my_dialect", delimiter=";", quotechar='"', quoting=csv.QUOTE_MINIMAL ) # Open the file in write mode with open("output.csv", "w") as csv_file: # Create a writer object, using the custom dialect csv_writer = csv.writer(csv_file, dialect="my_dialect") # Write the data to the file csv_writer.writerow(["Name", "Age", "Country"]) csv_writer.writerow(["John Doe", 30, "United States"]) csv_writer.writerow(["Jane Doe", 28, "Canada"])
In this example, we define a custom dialect called my_dialect
, which uses a semicolon as the delimiter character and a double quote as the quote character. We then use this custom dialect when creating the writer object, which tells the csv
module to use the formatting conventions of the my_dialect
dialect when writing the data to the file.
In addition to writing, you can also use a custom dialect when reading a CSV file. Here is an example of how you can use a custom dialect to read the previously created output.csv
file:
import csv # Define the custom dialect my_dialect = csv.register_dialect("my_dialect", delimiter=";", quotechar='"', quoting=csv.QUOTE_MINIMAL ) # Open the file in read mode with open("output.csv", "r") as csv_file: # Create a reader object, using the custom dialect csv_reader = csv.reader(csv_file, dialect="my_dialect") # Read the data from the file for row in csv_reader: # Process the data in the row print(row)
Python CSV vs. NumPy or Pandas
In the following sections, we’ll look at how to read and write CSV files with NumPy and Pandas. These packages both have great CSV support, but for projects that are not built around these (large) packages, I recommend using the built-in Python CSV module for a few reasons:
- Both NumPy and Pandas are extensive tools that can do a lot. The downside is they both need to be installed before you can use them.
- Both packages will add ‘weight’ to your project. Sometimes it’s better to create lightweight scripts without dependencies since they are much easier to share and use.
However, if you are working with one of these libraries anyway, you’re better off using their CSV readers and writers since they tie in nicely with their specific data structures!
Reading and writing CSV with NumPy
Let’s start with NumPy. I’ve written an introduction to Numpy here if you’re interested.
To write a CSV file with NumPy, you can use the numpy.savetxt()
function, which allows you to save a NumPy array to a CSV file.
Here is an example of how you can use the numpy.savetxt()
function to write a CSV file:
import numpy as np # Create a NumPy array data = np.array([["Name", "Age", "Country"], ["John Doe", 30, "United States"], ["Jane Doe", 28, "Canada"]]) # Save the array to a CSV file np.savetxt("output.csv", data, delimiter=",", fmt="%s")
In this example, we first create a NumPy array called data
that contains the data that we want to write to the CSV file. Since NumPy arrays can only hold one type of element, all elements will be converted to strings. We then use the numpy.savetxt()
function to save the array to a CSV file. The numpy.savetxt()
function takes the following arguments:
- The name of the file to save the array to
- The NumPy array to save
- The delimiter character to use in the CSV file (in this case, a comma)
- The format to use when writing the data (in this case, a string format, indicated by the
%s
format specifier)
Appending data with NumPy
NumPy will overwrite an existing file and thus remove any data already present in that file. If you need to append data to an existing CSV file, you can first open the existing file in append mode and then use savetxt to write to that file:
import numpy as np # Open the file in append mode with open("output.csv", "a") as csv_file: # Create a NumPy array with the new data new_data = np.array([["Joe Smith", 35, "United Kingdom"], ["Mary Smith", 32, "France"]]) # Append the data to the file np.savetxt(csv_file, new_data, delimiter=",", fmt="%s")
Reading and writing CSV with Pandas
To write CSV data using Pandas, you can use the pandas.DataFrame.to_csv()
method to save a Pandas DataFrame to a CSV file.
Here is an example of how you can use the to_csv()
method to write CSV data using Pandas:
import pandas as pd # Create a Pandas DataFrame data = pd.DataFrame([["John Doe", 30, "United States"], ["Jane Doe", 28, "Canada"]]) # Save the DataFrame to a CSV file data.to_csv("output.csv", index=False, header=False)
For a complete description of the method, I can recommend the official documentation here.