How to Process Excel Data in Python and Pandas

Spreadsheets are one of the most used file formats for storing tabular data in offices worldwide. The most famous software for creating and editing spreadsheets is Microsoft Excel. Hence, spreadsheets are also commonly called Excel files. There are several alternatives with which you can do similar things, like LibreOffice, which also supports reading and writing files in Excel format.

Because Excel files are so common around the office, you will eventually want to manipulate them programmatically! This tutorial shows how to create, read, update, and delete data in Excel files using Python. Please note that this article is not about using Python in Excel itself.

How to Process Excel Files with Python?

To process Excel data in Python, we will use the highly popular pandas module. Pandas is a Python library for data manipulation and analysis. It offers data structures and operations for manipulating numerical tables and time series. To get started with Pandas, you can read the documentation and find tutorials on the official Pandas website. It provides a comprehensive guide to all the functionalities provided by Pandas.

The pandas module works together with two other modules, xlsxwriter and openpyxl, to interact with Excel files. You can install the latest version of both of these modules with pip install:

pip install pandas xlsxwriter openpyxl --upgradeCode language: plaintext (plaintext)

After installing the above modules, we can write Python programs to process Excel data in Python in different ways, as discussed in the next sections.

How do you create an Excel file in Python?

Let’s start with the basics. We will use the ExcelWriter() function to open a new Excel file. It has the following syntax:

ExcelWriter(filename,engine=None, mode='w',if_sheet_exists=None)Code language: Python (python)

Here, 

  • The filename parameter takes the name of the Excel file that we want to open.
  • The engine parameter is used to specify the engine or module to process the Excel files. By default, it is set to None. You can set the engine parameter to “xlsxwriter” or “openpyxl” to use any of these modules to process Excel files.
  • The mode parameter is used to specify if we want to open the Excel file in write mode or append mode. By default, the mode parameter is set to “w” and the ExcelWriter() function opens the file in write mode. You can set the mode parameter to “a” to open the spreadsheet in the append mode. Note that the append mode doesn’t work with the “xlsxwriter” engine. You need to use the “openpyxl” module to open the file in append mode.
  • We use the if_sheet_exists parameter while updating an Excel sheet. The program runs into an error if we write a new sheet to the Excel file with the same name as an existing sheet. You can set the if_sheet_exists parameter to “replace” for replacing a sheet. Alternatively, you can set it to “new” to create a new sheet with a number subscript if a sheet with the same name already exists.

ExcelWriter() returns an ExcelWriter object just like the open() function returns a file object. We can use this object to process Excel data in Python.

Create an Empty Excel File in Python

To create an empty Excel file in Python using the pandas module, follow these steps:

  • First, open an Excel file in write mode using the ExcelWriter() function. The ExcelWriter() function takes the name of the Excel file as the first input argument. Here, we will use the “xlsxwriter” engine to open the file.
  • Next, we create an empty pandas dataframe using the DataFrame() function defined in the pandas module.
  • Once we have the empty dataframe, we will use the to_excel() method to convert it to an empty Excel file. The to_excel() method takes the ExcelWriter object returned by the ExcelWriter() function as its first input argument. It also takes the new sheet’s name as input to the sheet_name parameter.
  • Finally, we will close the ExcelWriter object using close() method to ensure the file is saved properly.

Here’s the code to do so:

import pandas as pd

writer = pd.ExcelWriter('empty_file.xlsx', engine='xlsxwriter')
empty_dataframe=pd.DataFrame()
empty_dataframe.to_excel(writer, sheet_name='empty')
writer.close()Code language: Python (python)

After executing the above code, we get the following empty Excel file. You can observe that the sheet in the Excel file is named "empty" as we have set the sheet_name parameter to "empty" in the to_excel() method.

Our next step is to fill the pandas dataframe with some data, as discussed in the following section.

Export Pandas Dataframe to Excel Sheet in Python

We can use different approaches to create an Excel sheet with data in Python using pandas dataframes. We can create a pandas dataframe using a list of lists, a list of dictionaries, or by reading from a CSV file.

Create an Excel File Using a List of Lists in Python

To create an Excel file using a list of lists in Python, we will first create a pandas dataframe using the DataFrame() function. Here, we will pass the list of lists as the first input argument and a list of column names as input to the columns parameter.

After executing the DataFrame() function, we get a pandas dataframe containing the data from the list of lists.  Next, we can use the ExcelWriter() function to open an Excel file in write mode and export the pandas dataframe to an Excel sheet using the to_excel() method.

import pandas as pd

# Define a list of lists
data=[["Aditya",179],
      ["Sameer",181],
      ["Dharwish",170],
      ["Joel",167]]

# Define column names
column_names=["Name", "Height"]

# Create a pandas dataframe using the list of lists
df=pd.DataFrame(data, columns=column_names)
writer = pd.ExcelWriter('excel_with_list.xlsx', engine='xlsxwriter')

# Add the pandas dataframe to the excel file as sheet
df.to_excel(writer, sheet_name='first_sheet')
writer.close()Code language: Python (python)

After execution of the above code, we get the following Excel file.

The above Excel file shows that the sheet contains an extra unnamed column with index counts. To avoid this, you can set the index parameter to False in the to_excel() method. After this, you will get the Excel file with only the desired columns:

...
df.to_excel(writer, sheet_name='first_sheet',index=False)
...Code language: Python (python)

In the above code, we have set the index parameter to False. Hence, we will get an Excel without the unnamed column:

Shifting rows

By default, the data is written into the Excel sheet starting from the sheet’s first row and first column. If you want to shift rows while writing data to the Excel sheet, you can use the startrow parameter in the to_excel() method as shown below:

...
df.to_excel(writer, sheet_name='first_sheet',index=False, startrow=3)
...Code language: Python (python)

The startrow parameter takes the row number starting from where the data should be written in the Excel file. For example, we have set the startrow parameter to 3 in the above code. Hence, the data in the Excel file will be written starting from the fourth row of the sheet (remember, we programmers start counting at 0), leaving the first three rows empty. You can observe this in the following file.

Shifting columns

You can also shift the output columns in the Excel sheet while writing data to the spreadsheet. For this, you can use the startcol parameter in the to_excel() method. The startcol parameter takes the column number after which the data should be written, and writes the data to the Excel sheet starting from the specific column. You can observe this in the following example.

import pandas as pd

# Define a list of list
data=[["Aditya",179],
      ["Sameer",181],
      ["Dharwish",170],
      ["Joel",167]]

# Define column names
column_names=["Name", "Height"]

# Create a pandas dataframe using the list of lists
df=pd.DataFrame(data, columns=column_names)

# Create an ExcelWriter Object
writer = pd.ExcelWriter('excel_with_list_right.xlsx', engine='xlsxwriter')

# Add the pandas dataframe to the excel file as sheet
df.to_excel(writer, sheet_name='first_sheet',index=False, startcol=4)
writer.close()Code language: Python (python)

In the above code, we have set the startcol parameter to 4. Hence, the data will be written to the Excel sheet starting from the fifth column as shown in the following file:

Shifting both columns and rows

Of course, you can also use the startcol and startrow parameters together to write data at a specific location in the Excel sheet, as shown below:

import pandas as pd

data=[["Aditya",179],
      ["Sameer",181],
      ["Dharwish",170],
      ["Joel",167]]

column_names=["Name", "Height"]

df=pd.DataFrame(data, columns=column_names)

writer = pd.ExcelWriter('excel_with_list_displaced.xlsx', engine='xlsxwriter')
df.to_excel(writer, sheet_name='first_sheet',index=False, startrow=3, startcol=4)
writer.close()Code language: Python (python)

In this code, we have set the startrow and startcol parameters to 3 and 4, respectively. Hence, the data is written to the Excel sheet starting from the fourth row and fifth column, as demonstrated in the file below:

Create an Excel File Using a List of Dictionaries

Instead of a list of lists, we can use a list of dictionaries to create an Excel file in Python. The DataFrame() function takes the list of dictionaries as input and returns a pandas dataframe. Here, values in each dictionary are put into a row, and the keys are converted into the column names in the dataframe. 

After creating the dataframe, you can write the data into an Excel file using the ExcelWriter() function and the to_excel() method as shown below:

import pandas as pd

# Define a list of dictionaries
data=[{"Name":"Aditya","Height":179},
      {"Name":"Sameer","Height":181},
      {"Name":"Dharwish","Height":170},
      {"Name":"Joel","Height":167}]

# Convert list of dictionaries to dataframe
df=pd.DataFrame(data)


writer = pd.ExcelWriter('excel_with_dict.xlsx', engine='xlsxwriter')
df.to_excel(writer, sheet_name='first_sheet',index=False)
writer.close()Code language: Python (python)

In the above code, we have used a dictionary instead of a list of lists to create the dataframe. Still, we get the same Excel file as the output:

Convert JSON to Excel

Since we can easily convert JSON to dictionaries with Python, you can also use the above method to convert (flat) JSON objects into Pandas dataframes, and, thus, into an Excel file.

Convert a CSV File Into an Excel File in Python

To convert a CSV file into an Excel sheet, we will first read the CSV file into a pandas dataframe using the read_csv() function. The read_csv() function takes the filename of the CSV file as its input argument and returns a pandas dataframe.

Then, we can export the pandas dataframe into an Excel file using the ExcelWriter() function and the to_excel() method as shown in the following example.

import pandas as pd

# Read a csv file into pandas dataframe
df=pd.read_csv("height_file.csv")

writer = pd.ExcelWriter('excel_from_csv.xlsx', engine='xlsxwriter')
df.to_excel(writer, sheet_name='first_sheet',index=False)
writer.close()Code language: Python (python)

In the above code, we have used the following CSV file.

After executing the code, the output Excel file looks as follows.

Create an Excel File With Multiple Sheets in Python

To create an Excel file with multiple sheets in Python, follow these steps:

  • First, create multiple pandas dataframes using the DataFrame() function and a list of lists or dictionaries.
  • Next, open an Excel file in write mode using the ExcelWriter() function.
  • Once we get the ExcelWriter object, we write all the dataframes into the Excel file as sheets using the to_excel() method. Here, we have to give each sheet a different name.
  • Finally, we will close the ExcelWriter object using the close() method.

After executing the above steps, we can create an Excel file with multiple sheets:

import pandas as pd

# Define list of dictionaries
height_data=[{"Name":"Aditya","Height":179},
      {"Name":"Sameer","Height":181},
      {"Name":"Dharwish","Height":170},
      {"Name":"Joel","Height":167}]

weight_data=[{"Name":"Aditya","Weight":76},
      {"Name":"Sameer","Weight":68},
      {"Name":"Dharwish","Weight":69},
      {"Name":"Joel", "Weight":73}]

marks_data=[{"Name":"Aditya","Marks":79},
      {"Name":"Sameer","Marks":81},
      {"Name":"Dharwish","Marks":70},
      {"Name":"Joel","Marks":67}]

# Convert list of dictionaries to dataframe
height_df=pd.DataFrame(height_data)
weight_df=pd.DataFrame(weight_data)
marks_df=pd.DataFrame(marks_data)

writer = pd.ExcelWriter('excel_with_multiple_sheets.xlsx', engine='xlsxwriter')

height_df.to_excel(writer, sheet_name='height',index=False)
weight_df.to_excel(writer, sheet_name='weight',index=False)
marks_df.to_excel(writer, sheet_name='marks',index=False)
writer.close()Code language: Python (python)

In the above code, we first defined three lists of dictionaries that contain height, weight, and marks data. Then, we converted the list of dictionaries to pandas dataframes. Finally, we exported the dataframes to Excel sheets using the to_excel() method. The output Excel file with multiple sheets looks as follows:

Merge Multiple CSV Files Into an Excel File in Python

We can also merge multiple CSV files into an Excel file as sheets. For this, we will first read all the CSV files into pandas dataframes using the read_csv() function. Then, we will write the dataframes as sheets into an Excel file using the ExcelWriter() function and the to_excel() function, as shown in the following example.

import pandas as pd

#read all the csv files into dataframes
df_height=pd.read_csv("height_file.csv")
df_marks=pd.read_csv("marks_file.csv")
df_weight=pd.read_csv("weight_file.csv")

writer = pd.ExcelWriter('excel_from_csv_files.xlsx', engine='xlsxwriter')

df_height.to_excel(writer, sheet_name='height',index=False)
df_marks.to_excel(writer, sheet_name='marks',index=False)
df_weight.to_excel(writer, sheet_name='weight',index=False)

writer.close()Code language: Python (python)

We have used the following CSV files in the above code as input:

After executing the above code, the output Excel sheet looks as follows:

How to Read Excel Data in Python?

To read an Excel file in Python, we can use the read_excel() function with the ExcelFile() function defined in the pandas module. In some cases, we can directly use the read_excel() function without using the ExcelFile() function. Let’s take a look at all the use cases.

Read an Excel File Into a Pandas Dataframe

To read an Excel file into a pandas dataframe in Python, we will use the read_excel() function. The read_excel() function takes the path of the Excel file as its input argument and returns the Excel sheet as a pandas dataframe. You can observe this in the following example.

import pandas as pd
df=pd.read_excel('excel_without_index.xlsx')
print("The dataframe is:")
print(df)Code language: Python (python)

In the above code, we have used the following Excel file as input to the read_excel() function.

The output of the program looks as follows.

The dataframe is:
       Name  Height
0    Aditya     179
1    Sameer     181
2  Dharwish     170
3      Joel     167Code language: plaintext (plaintext)

If there are multiple sheets in the Excel file, the read_excel() function only returns the first sheet in the Excel file as its output.

Read an Excel File With Multiple Sheets in Python

To read sheets other than the first sheet from an Excel file with multiple sheets, we can use the sheet_name parameter in the read_excel() function.

The sheet_name parameter takes the position or the name of the Excel sheet as its input argument. After execution, the read_excel() function returns the sheet at the specified position or with the specified name as a dataframe. 

For example, we can read the third sheet in the Excel file below by setting the sheet_name parameter to 2. 

import pandas as pd
# Read a specific sheet into dataframe
df=pd.read_excel('excel_with_multiple_sheets.xlsx', sheet_name=2)
print("The dataframe is:")
print(df)Code language: Python (python)

Output:

The dataframe is:
       Name  Marks
0    Aditya     79
1    Sameer     81
2  Dharwish     70
3      Joel     67Code language: plaintext (plaintext)

If we know the sheet names in the Excel file, we can directly pass the sheet name to the sheet_name parameter in the read_excel() function:

import pandas as pd

df=pd.read_excel('excel_with_multiple_sheets.xlsx', sheet_name="marks")
print("The dataframe is:")
print(df)Code language: Python (python)

In the above code, we have read the "marks" sheet from the Excel file. We get the following output:

The dataframe is:
       Name  Marks
0    Aditya     79
1    Sameer     81
2  Dharwish     70
3      Joel     67Code language: plaintext (plaintext)

Read Excel Sheet Names in Python

If we don’t know the sheet names in an Excel file, we can obtain them using the ExcelFile() function. The ExcelFile() function takes the file name of the spreadsheet as its input argument and returns an ExcelFile object. This ExcelFile object contains the data from the spreadsheet and all its attributes. 

We can get the name of all the sheets in the Excel file using the sheet_names attribute of the ExcelFile object as shown below:

import pandas as pd

excel_file=pd.ExcelFile('excel_with_multiple_sheets.xlsx')

print("The sheet names in the excel file are:")
print(excel_file.sheet_names)Code language: Python (python)

Output:

The sheet names in the excel file are:
['height', 'weight', 'marks']Code language: plaintext (plaintext)

You can also use the ExcelFile() function to read data from an Excel file in Python. For this, you can pass the ExcelFile object returned by the ExcelFile() function to the read_excel() function instead of the filename. The rest of the syntax remains the same as shown in the following example.

import pandas as pd

excel_file=pd.ExcelFile('excel_with_multiple_sheets.xlsx')
df=pd.read_excel(excel_file, sheet_name="marks")

print("The dataframe is:")
print(df)Code language: Python (python)

In this code, we have passed the ExcelFile object returned by the ExcelFile() function to the read_excel() function instead of the filename of the Excel file. Still, we get the same output as in the previous examples:

The dataframe is:
       Name  Marks
0    Aditya     79
1    Sameer     81
2  Dharwish     70
3      Joel     67Code language: plaintext (plaintext)

Read Data From a Specific Column of a Sheet in the Excel File

You can use the usecols parameter in the read_excel() function to read data from a specific column of a sheet in an Excel file in Python. The usecols parameter takes a list containing the column names we want to read. After execution, the read_excel() function returns a pandas dataframe containing the specified data from the given sheet. You can observe this in the following example.

import pandas as pd

df=pd.read_excel('excel_with_multiple_sheets.xlsx', sheet_name=1, usecols=["Name"])
print("The dataframe column is:")
print(df)Code language: Python (python)

In the above code, we have passed the list ["Name"] to the usecols parameter. This causes the read_excel() function only to read the Name column from the Excel sheet, as shown in the following output:

The dataframe column is:
       Name
0    Aditya
1    Sameer
2  Dharwish
3      JoelCode language: plaintext (plaintext)

In this example, we have read only one column from the Excel sheet. To read two or more columns, you can put the desired column names in the list passed to the usecols parameter.

Read Data From a Single Row of a Sheet in the Excel File

To read data from a single row of an Excel sheet, we will first read the Excel sheet into a pandas dataframe using the read_excel() function. Then, we will use the iloc attribute of the pandas dataframe to get the row at the specified position, as shown below:

import pandas as pd

# Read a sheet into dataframe directly and extract row
row=pd.read_excel('excel_with_multiple_sheets.xlsx', sheet_name=1).iloc[2]

print("The dataframe row is:")
print(row)Code language: Python (python)

We used the iloc attribute to read the row at index 2 of the dataframe. Hence, we get the third row of the Excel sheet as output:

The dataframe row is:
Name      Dharwish
Weight          69
Name: 2, dtype: objectCode language: plaintext (plaintext)

Read Data From a Specific Cell in an Excel Sheet

To read the data from a specific cell in a given Excel sheet using the pandas module in Python, we will first read the Excel sheet into a dataframe. Next, we will read the data from the given row and column of the dataframe, as shown below:

import pandas as pd

#read a sheet into dataframe directly and extract the cell
data=pd.read_excel('excel_with_multiple_sheets.xlsx', sheet_name=1).iloc[2]["Name"]

print("The data is:")
print(data)Code language: Python (python)

In this code, we created the pandas dataframe using the read_excel() function. Then, we used the iloc attribute to read the row at index 2 of the dataframe. Finally, we use the indexing operator to read the value in the "Name" column of the given row. After execution of the program, we get the following output:

The data is:
DharwishCode language: Python (python)

Alternatively, you can select the desired column first and then read the data in a specific row, as shown below:

import pandas as pd

excel_file=pd.ExcelFile('excel_with_multiple_sheets.xlsx')
data=pd.read_excel(excel_file, sheet_name=2)["Name"].iloc[2]

print("The data is:")
print(data)Code language: Python (python)

In the above code, we first obtained the "Name" column from the dataframe using the indexing operator. Then, we used the iloc attribute to get the data in the third row. The output is the same. Observe that I have also used the ExcelFile() function to read the data instead of using the read_excel() function directly.

The data is:
DharwishCode language: plaintext (plaintext)

Update an Excel File in Python

To update an Excel file in Python using the pandas module, we will use the “openpyxl” module by using the engine parameter in the ExcelWriter() function. This will allow us to open the Excel files in append mode. After this, we can perform different update operations, as discussed in the following subsections.   

Add a Sheet to an Existing Excel File

To add an Excel sheet to an existing spreadsheet, we will first open the existing Excel file in append mode. For this, we will set the mode parameter to “a” and the engine parameter to “openpyxl” in the ExcelWriter() function. After this, we can append a dataframe into the spreadsheet using the to_excel() method as shown below.

import pandas as pd

# Read existing excel file into ExcelWriter in Append Mode
writer = pd.ExcelWriter('excel_with_multiple_sheets.xlsx',mode='a',engine="openpyxl")

data=[{"Name":"Aditya","Age":25},
      {"Name":"Sameer","Age":26},
      {"Name":"Dharwish","Age":24},
      {"Name":"Joel","Age":27}]

#convert list of dictionaries to dataframe
df=pd.DataFrame(data)

# Write the pandas dataframe to the excel file
df.to_excel(writer, sheet_name='age',index=False)

# Make sure to properly close the file
writer.close()Code language: Python (python)

We have used the following Excel file as input:

In the above example, we create a dataframe from a list of dictionaries. Then, we write it to an existing Excel file using the to_excel() method. After execution of the code, we get the following output file.

In the above example, the sheet name passed to the sheet_name parameter in the to_excel() method must not exist in the input Excel file. Otherwise, the program will run into a ValueError exception.

Add a Row or Column to a Sheet in an Excel File

We will use the following steps to add a row to an existing sheet in an Excel file in Python:

  • First, we open the Excel file in append mode using the ExcelWriter() function. Here, we will set the if_sheet_exists parameter to “replace” so that we can overwrite the existing sheet after adding a new row to it.
  • Next, we read the sheet we want to modify into a pandas dataframe using the read_excel() method.
  • Once we get the data frame, we add a row to it using the concat() function.
  • After adding a new row to the dataframe, we write back the dataframe to the Excel file using the to_excel() method.
  • Finally, we close the ExcelWriter object using the close() method.

After executing the above statements, we can add a row to a sheet in an Excel file in Python as shown below.

import pandas as pd

# Read existing excel file into ExcelWriter in Append Mode
writer = pd.ExcelWriter('excel_with_multiple_sheets.xlsx',
           mode='a',engine="openpyxl",if_sheet_exists="replace")
df=pd.read_excel('excel_with_multiple_sheets.xlsx', sheet_name="weight")
newRow= {"Name":"Elon","Weight":77}
new_row=pd.DataFrame([newRow])
df=pd.concat([df,new_row],ignore_index=True)

# Write the pandas dataframe to the excel file
df.to_excel(writer, sheet_name='weight',index=False)
writer.close()Code language: Python (python)

We have used the following Excel file as input to the above code.

After executing the program, we get the following Excel file as output.

Instead of a row, you can also add a column to a sheet in an Excel file in Python. For example, we have added two columns, namely "Weight_lbs" and "Age", to the Excel sheet named weight as shown below:

import pandas as pd

# Read existing excel file into ExcelWriter in Append Mode
writer = pd.ExcelWriter('excel_with_multiple_sheets.xlsx',
           mode='a',engine="openpyxl",if_sheet_exists="replace")
df=pd.read_excel('excel_with_multiple_sheets.xlsx', sheet_name="weight")
df["Weight_lbs"]=df["Weight"]*2.20462
df["Age"]=[25,22,24,27,49]

# Write the pandas dataframe to the excel file
df.to_excel(writer, sheet_name='weight',index=False)
writer.close()Code language: Python (python)

For the above program, we used the following Excel file as input.

Once we execute the code, we get the following Excel file as the output.

Duplicate an Excel Sheet in Python

We can use the following steps to duplicate a sheet in an Excel file in Python:

  • First, open the existing Excel file in append mode using the ExcelWriter() function. Here, we set the if_sheet_exists parameter to “new” so that we can duplicate the existing sheet.
  • Next, we read the sheet we want to duplicate into a pandas dataframe using the read_excel() method.
  • Once we get the data frame, we write back the dataframe to the Excel file using the to_excel() method.
  • Finally, we close the ExcelWriter object using the close() method.

After executing the above steps, we get the duplicate sheet with the sheet name “sheet_name1” if the name of the existing sheet is “sheet_name”. You can observe this in the following example:

import pandas as pd

# Read existing excel file into ExcelWriter in Append Mode
writer = pd.ExcelWriter('excel_with_multiple_sheets.xlsx',
           mode='a',engine="openpyxl",if_sheet_exists="new")
df=pd.read_excel('excel_with_multiple_sheets.xlsx', sheet_name="weight")

df.to_excel(writer, sheet_name='weight',index=False)
writer.close()Code language: Python (python)

In the above code, we passed the following Excel file as input:

In the program, we have created a duplicate of the "weight" sheet. Hence, we will get a sheet named "weight1" in the output Excel file as shown below:

Rename an Excel Sheet in Python

To rename a sheet in an Excel file in Python, we will use the openpyxl module directly. For this, we will use the following steps.

  • First, we will open the Excel file using the load_workbook() function defined in the openpyxl module. The load_workbook() function takes the filename of the Excel file as its input argument and returns the spreadsheet.
  • Next, we will select the desired sheet from the Excel file using the sheet name and the indexing operator.
  • After selecting the sheet, we will use the title attribute of the Excel sheet to rename it. For renaming, we will assign the new sheet name to the title attribute of the selected sheet.
  • Finally, we will save the modified Excel file using the save() method. 

In code, this looks like:

import openpyxl

spreadsheet = openpyxl.load_workbook('excel_with_multiple_sheets.xlsx')
ss_sheet = spreadsheet['weight1']
ss_sheet.title = 'new_weights'
spreadsheet.save('excel_with_multiple_sheets.xlsx')Code language: Python (python)

In the above code, we have used the following Excel file as input.

After renaming the weight1 sheet to new_weights, we get the following Excel file as output.

Performing Delete Operations on an Excel File

We can also perform delete operations on Excel data using the pandas and openpyxl modules. We can delete a sheet from an Excel file, a column from a sheet, or a row from a sheet in a given Excel file, as discussed in the following subsections.

Delete a Sheet From an Excel File

To delete a sheet from an Excel file in Python, we will use the openpyxl module. For this, we can use the following steps:

  • First, we will open the Excel file using the load_workbook() function. 
  • Then, we will get the sheet we want to delete using the sheet name. 
  • After this, we will invoke the remove() method on the spreadsheet object returned by the load_workbook() function and pass the sheet we want to delete as its input argument. The remove() method will delete the Excel sheet from the spreadsheet.
  • Finally, we will save the modified spreadsheet using the save() method.

After executing the above steps, we can easily delete a sheet from an Excel file in Python as shown below.

import openpyxl

spreadsheet = openpyxl.load_workbook('excel_with_multiple_sheets.xlsx')
sheet_to_delete=spreadsheet['new_weights']
spreadsheet.remove(sheet_to_delete)
spreadsheet.save('excel_with_multiple_sheets.xlsx')Code language: Python (python)

We have given the below Excel file as input to the code.

After deleting the sheet named "new_weights" from the Excel file, we get the following Excel file as output.

Delete a Row or a Column From a Sheet in an Excel File

To delete a row from an Excel sheet in Python, we will use the following steps:

  • We will open the Excel file in append mode using the ExcelWriter() function. We will also set the if_sheet_exists parameter to replace so that we can overwrite an existing sheet after deleting a row.
  • Next, we will read the sheet from which we want to delete a row into a pandas dataframe.
  • Once we get the dataframe, we will delete the row from the dataframe using the drop() method. The drop() method, when invoked on a dataframe, takes a list of row indices to be deleted as its input argument. We will also set the axis parameter to 0 in the drop() method to let the program know that we want to delete a row and not a column. The drop() method will return the modified dataframe after deleting the given row.
  • Next, we will write back the dataframe to the Excel file using to_excel() function.
  • Finally, we will close the ExcelWriter object using the close() method.

After executing the above steps, we can delete a row from a sheet in an Excel file in Python. You can observe this in the following example.

import pandas as pd

# Read existing excel file into ExcelWriter in Append Mode
writer = pd.ExcelWriter('excel_with_multiple_sheets.xlsx',
          mode='a',engine="openpyxl",if_sheet_exists="replace")
df=pd.read_excel('excel_with_multiple_sheets.xlsx', sheet_name="weight")
df=df.drop([0],axis=0)

# Write the pandas dataframe to the excel file
df.to_excel(writer, sheet_name='weight',index=False)
writer.close()Code language: Python (python)

The following is the input Excel file to the above program:

After deleting a row from the weight sheet, we get the following Excel file:

Instead of a row, you can delete a column from a sheet in an Excel file using the pandas module in Python. For this, you can pass the list of column names you want to delete as the first input argument to the drop() method. You also need to set the axis parameter to 1 to let the program know you want to delete a column, as shown below:

import pandas as pd

# Read existing excel file into ExcelWriter in Append Mode
writer = pd.ExcelWriter('excel_with_multiple_sheets.xlsx',
          mode='a',engine="openpyxl",if_sheet_exists="replace")
df=pd.read_excel('excel_with_multiple_sheets.xlsx', sheet_name="weight")
df=df.drop(["Weight_lbs"],axis=1)

# Write the pandas dataframe to the excel file
df.to_excel(writer, sheet_name='weight',index=False)
writer.close()Code language: Python (python)

We have given the following Excel file as input to the above program.

After deleting the “Weight_lbs” column from the weight sheet, we get the following Excel file:

Merge Excel Sheets Into a CSV File in Python

We will use the following steps to merge all the sheets of an Excel file into a CSV file in Python:

  • First, we will read the Excel file using the ExcelFile() function.
  • Then, we will read all the sheets we want to merge into CSV into pandas dataframes.
  • Once we get the dataframes, we will merge them into a single dataframe using the concat() function.
  • Finally, we will save the merged dataframe into a CSV file using the to_csv() method.

After executing the above steps, we will get the merged data from all the sheets in the Excel into a CSV file. You can observe this in the following example:

import pandas as pd

# Read existing excel file into ExcelWriter in Append Mode
excel_file = pd.ExcelFile('height_data.xlsx')

df1=pd.read_excel(excel_file, sheet_name="height1")
df2=pd.read_excel(excel_file, sheet_name="height2")
df3=pd.read_excel(excel_file, sheet_name="height3")

output_df=pd.concat([df1,df2,df3], ignore_index=True)

# Write the pandas dataframe to the csv file
output_df.to_csv("height_data_merged.csv",index=False)Code language: Python (python)

In the above code, if identically named columns are in each sheet, we will get a CSV file with one sheet where the columns are merged into one. For example, the following file contains three sheets with the same columns:

If we give the above Excel file as input to the program, we will get a CSV file as shown below:

If we have differently named columns in the sheets in the Excel file, the output CSV file contains the union of the columns in the Excel file. You can observe this in the following example:

import pandas as pd

# Read existing excel file into ExcelWriter in Append Mode
excel_file = pd.ExcelFile('excel_with_multiple_sheets.xlsx')

df1=pd.read_excel(excel_file, sheet_name="height")
df2=pd.read_excel(excel_file, sheet_name="weight")
df3=pd.read_excel(excel_file, sheet_name="marks")
df4=pd.read_excel(excel_file, sheet_name="age")

output_df=pd.concat([df1,df2,df3,df4], ignore_index=True)

# Write the pandas dataframe to the csv file
output_df.to_csv("merged_data.csv",index=False)Code language: Python (python)

We have given the following file as input to the above program:

The output CSV file is as follows:

Conclusion

This tutorial discussed different Excel file operations using Python and Pandas. Using these examples, you can automate tedious work in an office environment. We discussed how to create, read, update, and delete sheets in an Excel file and how to merge Excel sheets to CSV files and vice versa.

Get certified with our courses

Learn Python properly through small, easy-to-digest lessons, progress tracking, quizzes to test your knowledge, and practice sessions. Each course will earn you a downloadable course certificate.

Related articles

Leave a Comment