I am scraping table data from google finance through pd.read_html and then saving that data to excel through df.to_excel() as seen below:
dfs = pd.read_html('https://www.google.com/finance?q=NASDAQ%3AGOOGL&fstype=ii&ei=9YBMWIiaLo29e83Rr9AM', flavor='html5lib')
xlWriter = pd.ExcelWriter(output.xlsx, engine='xlsxwriter')
for i, df in enumerate(dfs):
df.to_excel(xlWriter, sheet_name='Sheet{}'.format(i))
xlWriter.save()
However, the numbers that are saved to excel are stored as text with the little green triangle in the corner of the cell. When moving over this data to excel, how do I store them as actual values and not text?
Consider converting numeric columns to floats since the pd.read_html reads web data as string types (i.e., objects). But before converting to floats, you need to replace hyphens to NaNs:
import pandas as pd
import numpy as np
dfs = pd.read_html('https://www.google.com/finance?q=NASDAQ%3AGOOGL' +
'&fstype=ii&ei=9YBMWIiaLo29e83Rr9AM', flavor='html5lib')
xlWriter = pd.ExcelWriter('Output.xlsx', engine='xlsxwriter')
workbook = xlWriter.book
for i, df in enumerate(dfs):
for col in df.columns[1:]: # UPDATE ONLY NUMERIC COLS
df.loc[df[col] == '-', col] = np.nan # REPLACE HYPHEN WITH NaNs
df[col] = df[col].astype(float) # CONVERT TO FLOAT
df.to_excel(xlWriter, sheet_name='Sheet{}'.format(i))
xlWriter.save()
所有评论(0)