I have an excel file foo.xlsx
with about 40 sheets sh1
, sh2
, etc. Each sheet has the format:
area cnt name\nparty1 name\nparty2
blah 9 5 5
word 3 7 5
In each sheet I want to rename the vars with the format name\nparty
to only have the party
as a label. Example output:
area cnt party1 party2 sheet
bacon 9 5 5 sh1
spam 3 7 5 sh1
eggs 2 18 4 sh2
I am reading in the file with:
book = pd.ExcelFile(path)
And then wondering if I need to do:
for f in filelist:
df = pd.ExcelFile.parse(book,sheetname=??)
'more operations here'
# only change column names 2 and 3
i, col in enumerate(df):
if i>=2 and i<=3:
new_col_name = col.split("\n")[-1]
df[new_col_name] =
Or something like that?
The read_excel
method of pandas
lets you read all sheets in at once if you set the keyword parameter sheet_name=None
(in some older versions of pandas
this was called sheetname
). This returns a dictionary - the keys are the sheet names, and the values are the sheets as dataframes.
Using this, we can simply loop through the dictionary and:
- Add an extra column to the dataframes containing the relevant sheetname
- Use the
rename
method to rename our columns - by using a lambda
, we simply take the final entry of the list obtained by splitting each column name any time there is a new line. If there is no new line, the column name is unchanged.
- Append to a list, to be combined at the end.
Once this is done, we combine all the sheets into one with pd.concat
. Then we reset the index and all should be well. Note: if you have parties present on one sheet but not others, this will still work but will fill any missing columns for each sheet with NaN
.
import pandas as pd
sheets_dict = pd.read_excel('Book1.xlsx', sheet_name=None)
all_sheets = []
for name, sheet in sheets_dict.items():
sheet['sheet'] = name
sheet = sheet.rename(columns=lambda x: x.split('\n')[-1])
all_sheets.append(sheet)
full_table = pd.concat(all_sheets)
full_table.reset_index(inplace=True, drop=True)
print(full_table)
Prints:
area cnt party1 party2 sheet
0 bacon 9 5 5 Sheet1
1 spam 3 7 5 Sheet1
2 eggs 2 18 4 Sheet2
所有评论(0)