openpyxl — use read-only mode for speed

Mar. 31, 2018

Due to various vagaries of fate, I’ve been spending quite a few hours extracting, transforming and loading administrative data for my university.

I’ve been using the Python openpyxl package to import data from Excel files. The process was turning out to be horribly slow, using about a second to process a small file.

Turns out that openpyxl has a special read-only mode which is very much faster that the general interface. All you have to do is to pass an extra argument when opening a file, like

wb = openpyxl.load_workbook("myfile.xlsx",read_only=True)

This brought down the run time of my program from 20 minutes to 2 seconds.

The openpyxl people should put up a notice in huge letters on the front-page of their manual advertising this.