Answer a question

Trying to read MS Excel file, version 2016. File contains several lists with data. File downloaded from DataBase and it can be opened in MS Office correctly. In example below I changed the file name.

EDIT: file contains russian and english words. Most probably used the Latin-1 encoding, but encoding='latin-1' does not help

import pandas as pd
with open('1.xlsx', 'r', encoding='utf8') as f:
        data = pd.read_excel(f)

Result:

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xa8 in position 14: invalid start byte

Without encoding ='utf8'

'charmap' codec can't decode byte 0x9d in position 622: character maps to <undefined>

P.S. Task is to process 52 files, to merge data in every sheet with corresponded sheets in the 52 files. So, please no handle work advices.

Answers

Most probably the problem is in Russian symbols.

Charmap is default decoding method used in case no encoding is beeing noticed.

As I see if utf-8 and latin-1 do not help then try to read this file not as

pd.read_excel(f)

but

pd.read_table(f)

or even just

f.readline()

in order to check what is a symbol raise an exeception and delete this symbol/symbols.

Logo

Python社区为您提供最前沿的新闻资讯和知识内容

更多推荐