Answer a question

I have an excel file foo.xlsx with about 40 sheets sh1, sh2, etc. Each sheet has the format:

area      cnt   name\nparty1   name\nparty2
blah      9         5               5
word      3         7               5

In each sheet I want to rename the vars with the format name\nparty to only have the party as a label. Example output:

area      cnt    party1    party2     sheet
bacon     9         5         5        sh1
spam      3         7         5        sh1
eggs      2         18        4        sh2

I am reading in the file with:

book = pd.ExcelFile(path) 

And then wondering if I need to do:

for f in filelist:
    df = pd.ExcelFile.parse(book,sheetname=??)
    'more operations here'
    # only change column names 2 and 3
     i, col in enumerate(df):
     if i>=2 and i<=3:
        new_col_name = col.split("\n")[-1]
        df[new_col_name] =

Or something like that?

Answers

The read_excel method of pandas lets you read all sheets in at once if you set the keyword parameter sheet_name=None (in some older versions of pandas this was called sheetname). This returns a dictionary - the keys are the sheet names, and the values are the sheets as dataframes.

Using this, we can simply loop through the dictionary and:

  1. Add an extra column to the dataframes containing the relevant sheetname
  2. Use the rename method to rename our columns - by using a lambda, we simply take the final entry of the list obtained by splitting each column name any time there is a new line. If there is no new line, the column name is unchanged.
  3. Append to a list, to be combined at the end.

Once this is done, we combine all the sheets into one with pd.concat. Then we reset the index and all should be well. Note: if you have parties present on one sheet but not others, this will still work but will fill any missing columns for each sheet with NaN.

import pandas as pd

sheets_dict = pd.read_excel('Book1.xlsx', sheet_name=None)

all_sheets = []
for name, sheet in sheets_dict.items():
    sheet['sheet'] = name
    sheet = sheet.rename(columns=lambda x: x.split('\n')[-1])
    all_sheets.append(sheet)

full_table = pd.concat(all_sheets)
full_table.reset_index(inplace=True, drop=True)

print(full_table)

Prints:

    area  cnt  party1  party2   sheet
0  bacon    9       5       5  Sheet1
1   spam    3       7       5  Sheet1
2   eggs    2      18       4  Sheet2
Logo

学AI,认准AI Studio!GPU算力,限时免费领,邀请好友解锁更多惊喜福利 >>>

更多推荐