How to Read Excel Files in Python with IronXL
This guide provides Python developers with step-by-step instructions on utilizing the IronXL library to read and edit Microsoft Excel documents.
IronXL is a comprehensive Excel file processing library that supports multiple programming languages, including .NET and Python. This tutorial focuses specifically on using IronXL in Python scripts to read and edit Microsoft Excel documents.
For a separate tutorial on how to read and edit Microsoft Excel documents in .NET applications, please refer to the following here.
Reading and creating Excel files in Python is easy using the IronXL for Python software library.
Overview
How to Read Excel File in Python
- Download the Python Library to read Excel files
- Load and read an Excel file (workbook)
- Create an Excel workbook in CSV or XLSX
- Edit cell values in a range of cells
- Validate spreadsheet data
- Export data using Entity Framework
Tutorial
Step 1: Add IronXL as a Dependency in Your Python Project
To integrate the IronXL library into your Python project, you must install it as a dependency using the widely used Python package manager, pip. Open the terminal and execute the following command:
pip install IronXL
This will install the specified version of IronXL in your project, making it accessible for import.
Please note
Step 2: Load an Excel Workbook
The WorkBook
class represents an Excel workbook. To open an Excel file, we use the WorkBook.Load
method, specifying the path of the Excel file.
:path=/static-assets/excel-python/content-code-examples/tutorials/how-to-read-excel-file-csharp-1.py
// Ensure to include the necessary namespace for working with spreadsheets
using System;
// Hypothetical library for handling Excel spreadsheets
// Note: You need to reference the actual library in your project
using SomeSpreadsheetLibrary;
// Load an existing spreadsheet file using a hypothetical workbook class
// Create an instance of a workbook from an existing file
WorkBook workbook = null;
try
{
workbook = WorkBook.Load("Spreadsheets\\GDP.xlsx");
}
catch (Exception ex)
{
Console.WriteLine($"An error occurred while loading the spreadsheet: {ex.Message}");
}
// Check if the workbook was loaded successfully
if (workbook == null)
{
Console.WriteLine("Failed to load the spreadsheet.");
}
else
{
Console.WriteLine("Spreadsheet loaded successfully.");
}
// The code above attempts to load a spreadsheet file named "GDP.xlsx"
// from the "Spreadsheets" directory using a fictional library. It wraps
// the loading process in a try-catch block to handle any exceptions that
// might occur during loading. If the workbook is successfully loaded, it
// outputs "Spreadsheet loaded successfully." If loading fails, it catches
// the exception and outputs an error message.
Each WorkBook
can have multiple WorkSheet
objects. Each one represents a single Excel worksheet in the Excel document. Use the WorkBook.get_worksheet
method to retrieve a reference to a specific Excel worksheet.
:path=/static-assets/excel-python/content-code-examples/tutorials/how-to-read-excel-file-csharp-2.py
// Assuming `workBook` is an already existing and initialized instance of a workbook.
// It is expected to be part of a library that handles Excel files, such as Epplus, NPOI, etc.
// Fetch the worksheet named "GDPByCountry" from the workbook.
var workSheet = workBook.Worksheets["GDPByCountry"];
// Note:
// 1. This line of code specifically pertains to a situation where you've already loaded or created a workbook
// and need to access a specific worksheet by name.
// 2. It's essential for the library being used to support such operations. In libraries like Epplus and NPOI,
// the usage may slightly differ, so please ensure that the right method is being called specific to that library's API.
// Example when using the EPPlus library:
using OfficeOpenXml;
ExcelPackage package = new ExcelPackage(new FileInfo("yourfile.xlsx"));
ExcelWorksheet workSheetEpplus = package.Workbook.Worksheets["GDPByCountry"];
// Example when using the NPOI library:
using NPOI.XSSF.UserModel;
XSSFWorkbook workbookNpoi = new XSSFWorkbook(new FileStream("yourfile.xlsx", FileMode.Open, FileAccess.Read));
XSSFSheet workSheetNpoi = (XSSFSheet)workbookNpoi.GetSheet("GDPByCountry");
// Ensure proper error handling for cases where the worksheet might not exist
if (workSheet == null)
{
Console.WriteLine("Worksheet 'GDPByCountry' does not exist in the workbook.");
// Handle the absence of the worksheet as needed, possibly by terminating the operation
// or attempting to create a new worksheet.
}
Creating new Excel Documents
To create a new Excel document, construct a new WorkBook
object with a valid file type.
:path=/static-assets/excel-python/content-code-examples/tutorials/how-to-read-excel-file-csharp-3.py
// Include necessary namespaces
using IronXL; // Make sure to have the IronXL library installed and properly referenced
// Create a new WorkBook object with the specified Excel file format.
// IronXL's ExcelFileFormat enum supports multiple formats such as XLSX, XLS, and CSV.
WorkBook workBook = WorkBook.Create(ExcelFileFormat.XLSX);
// Note: Ensure the IronXL library is added to your project.
// You can include it via the NuGet Package Manager or by directly downloading the library.
// This enables you to work with Excel files efficiently within your C# application.
Note: Use ExcelFileFormat.XLS
to support legacy versions of Microsoft Excel (95 and earlier).
Add a Worksheet to an Excel Document
As explained previously, an IronXL for Python WorkBook
contains a collection of one or more WorkSheet
s.
To create a new worksheet, call workbook.create_worksheet
with the name of the worksheet.
:path=/static-assets/excel-python/content-code-examples/tutorials/how-to-read-excel-file-csharp-4.py
// Assuming that we are using a library for handling Workbooks and Worksheets, such as EPPlus or another Excel library.
// Ensure you have added the appropriate using directives for the library being used. This example assumes EPPlus.
using System.IO; // Necessary for file handling.
using OfficeOpenXml; // Assumed library for handling Excel files.
// Code snippet to add a new worksheet to a workbook.
// Note: EPPlus may require a license. Ensure proper licensing compliance when using this library.
// Create a new Excel package.
// In a real scenario, this would be done by providing a file path to a FileInfo object.
using (ExcelPackage excelPackage = new ExcelPackage())
{
// Add a new worksheet named "GDPByCountry" to the workbook.
ExcelWorksheet workSheet = excelPackage.Workbook.Worksheets.Add("GDPByCountry");
// The workSheet variable now holds a reference to the newly created worksheet,
// where you can now perform various operations, such as adding data, formatting cells, etc.
// Example: Adding a header to the worksheet in cells A1 and B1.
workSheet.Cells["A1"].Value = "Country"; // Cell A1 will be labeled 'Country'.
workSheet.Cells["B1"].Value = "GDP"; // Cell B1 will be labeled 'GDP'.
// Further code can manipulate the workSheet object, save the excelPackage to a file, etc.
// Save the excel package to a file.
// You need to specify a valid path where the output file will be saved.
FileInfo fileInfo = new FileInfo("path_to_save\\GDPByCountry.xlsx");
excelPackage.SaveAs(fileInfo);
}
Access Cell Values
Read and Edit a Single Cell
Access to the values of individual spreadsheet cells is carried out by retrieving the desired cell from its WorkSheet
as shown below:
:path=/static-assets/excel-python/content-code-examples/tutorials/how-to-read-excel-file-csharp-5.py
using IronXL;
class Program
{
static void Main()
{
// Load an existing spreadsheet named "test.xlsx"
// The WorkBook.Load method loads the spreadsheet into memory
WorkBook workbook = WorkBook.Load("test.xlsx");
// Access the default worksheet within the workbook
// Use workbook.DefaultWorkSheet to retrieve it
WorkSheet worksheet = workbook.DefaultWorkSheet;
// Access cell B1 in the worksheet
// This retrieves the cell object located at B1
var cell = worksheet["B1"];
// Output the value of the cell, if needed
// Console.WriteLine(cell.Value);
}
}
IronXL for Python's Cell
class represents an individual cell in an Excel spreadsheet. It contains properties and methods that enable users to access and modify the cell's value directly.
With a reference to a Cell
object, we can read and write data to and from a spreadsheet cell.
Read and Write a Range of Cell Values
The Range
class represents a two-dimensional collection of Cell
objects. This collection refers to a literal range of Excel cells. Obtain ranges by using the string indexer on a WorkSheet
object.
:path=/static-assets/excel-python/content-code-examples/tutorials/how-to-read-excel-file-csharp-6.py
// The following code demonstrates how to access and manipulate Excel cells in a worksheet using a C# library like EPPlus or ClosedXML.
// The code accesses cell B1, reads its current value, and then writes a new numerical value to it.
// Assuming workSheet is an instance of ExcelWorksheet (EPPlus) or IXLWorksheet (ClosedXML),
// make sure to initialize this object before using it in the lines below.
var cell = workSheet.Cells["B1"]; // Access cell B1 in the worksheet
// Read the current value of cell B1, assuming it is stored as a string
string initialValue = cell.Text; // For EPPlus, .Text returns the formatted cell value as a string.
// Alternative for ClosedXML: string initialValue = cell.GetValue<string>();
Console.WriteLine($"Initial value in B1: {initialValue}");
// Write a new numerical value to the cell
cell.Value = 10.3289; // Assigns a double value to cell B1
// For ClosedXML, if explicitly needed: cell.SetValue(10.3289);
// Retrieve and display the new value from the cell as text
string newValue = cell.Text; // Retrieves the text representation of the new value.
// Alternative for ClosedXML: string newValue = cell.GetValue<string>();
Console.WriteLine($"New value in B1: {newValue}");
Add Formula to a Spreadsheet
Set the formula of Cell
s with the formula
property.
:path=/static-assets/excel-python/content-code-examples/tutorials/how-to-read-excel-file-csharp-7.py
// This code snippet demonstrates how to access a range in an Excel worksheet using C#.
// The example uses the EPPlus library, but also mentions how it would differ if using ClosedXML.
// Ensure you have referenced the necessary library (EPPlus or ClosedXML) in your project.
// Assuming 'workSheet' is a valid reference to a worksheet object from the EPPlus library.
var range = workSheet.Cells["D2:D101"];
// This line accesses the range of cells from D2 to D101 in the worksheet.
// If you were to use the ClosedXML library instead, the syntax for accessing the range would be:
// var range = workSheet.Range("D2:D101");
// This line accesses the range of cells from D2 to D101 in the worksheet, similar to the EPPlus example.
// Note: The term 'workSheet' is assumed to be properly initialized and represents a worksheet
// within an Excel workbook. The library-specific method to obtain 'workSheet' can vary between EPPlus
// and ClosedXML. Make sure to consult the documentation of the library you are using.
The code below iterates through each cell and sets a percentage total in column C.
:path=/static-assets/excel-python/content-code-examples/tutorials/how-to-read-excel-file-csharp-8.py
# This Python code iterates through rows in a spreadsheet starting from the 2nd row up to (but not including) a specified row `i`.
# It sets a formula in each cell of column C to calculate the percentage of a total value.
# The library used for spreadsheet manipulation should allow setting cell formulas.
# Here is the corrected and formatted source code:
# Ensure that `i` and `workSheet` are initialized before this loop:
# `i`: This should be the row number containing the total value in column B.
# `workSheet`: This should be a valid worksheet object from the appropriate library, such as openpyxl.
for y in range(2, i): # Start from row 2 up to (but not including) row `i`
# Access the cell in column 'C' and row `y`
cell = workSheet[f"C{y}"]
# Set a formula in the cell to calculate the percentage of the total.
# Formula: Divide value in column B at the current row `y` by the total in column B at row `i`.
# Example formula when `y` is 2 and `i` is 10: cell.Formula = "=B2/B10"
cell.Formula = f"=B{y}/B{i}"
# Notes:
# 1. Ensure that `workSheet` is a valid worksheet object from the appropriate library, like openpyxl for Excel.
# 2. The variable `i` should be correctly initialized with the row number that contains the total in column B.
# 3. This code assumes that `i` > 2 and that all necessary imports and initializations are done outside this snippet.
# 4. The code assumes that the target library supports setting a formula in this manner. Modify according to library specifics if necessary.
Summary
IronXL.Excel is a standalone Python library for reading a wide variety of spreadsheet formats. It does not require Microsoft Excel to be installed and is not dependent on Interop.
Frequently Asked Questions
What is IronXL for Python?
IronXL for Python is a comprehensive library for reading and editing Excel files in Python without needing Microsoft Excel installed. It supports various spreadsheet formats and is based on the IronXL .NET library.
How do I install IronXL for Python?
You can install IronXL for Python using pip, the Python package manager, by executing `pip install ironxl` in your terminal.
What are the prerequisites for using IronXL for Python?
IronXL for Python requires the .NET 6.0 SDK to be installed on your machine as it relies on the IronXL .NET library.
How do I load an Excel workbook using IronXL?
To load an Excel workbook, use the `WorkBook.Load` method and provide the path to the Excel file. For example: `workbook = ironxl.WorkBook.load('path/to/workbook.xlsx')`.
Can I create a new Excel document with IronXL?
Yes, you can create a new Excel document by constructing a new `WorkBook` object and adding worksheets using `workbook.create_worksheet('SheetName')`.
How do I read and edit a cell's value in IronXL?
Access a cell by specifying its index, such as `cell = worksheet['A1']`. You can read its value with `cell.value` and edit it by assigning a new value, like `cell.value = 'New Value'`.
Is it possible to work with a range of cells in IronXL?
Yes, the `Range` class allows you to work with a two-dimensional collection of cells. Access a range using the string indexer like `range_of_cells = worksheet['B2:E5']` and iterate over them to perform operations.
How can I add formulas to cells using IronXL?
Set a formula for a cell using the `formula` property. For example, `cell.formula = '=A1+B1'` to add values in cells A1 and B1.
Does IronXL require Microsoft Excel to be installed?
No, IronXL does not require Microsoft Excel to be installed, nor does it depend on Microsoft Office Interop.