Add new row to existing CSV using Pandas DataFrame
·
Answer a question
I am trying to add a new row to an existing csv file. The new row is from a for loop which is appended to string list and is saved to a DataFrame. I don't want the entire loop to be saved in memory and then saved to the csv file. I prefer to add each row to the file separately updating it as the loop iterates because it is a long running loop and not having to wait until the entire loop is completed.
I am able to loop over the group but it causes duplicated rows.
names = []
addresses = []
pages = np.arange(10300, 10400, 1)
for page in pages:
page = requests.get(
"https://www.testpage.com/" + str(page), headers=headers)
soup = BeautifulSoup(page.text, 'html.parser')
company = soup.find_all('main')
for container in company:
name = container.find("b", {"id": "company_name"})
names.append(name.text.strip())
address = container.find('div', attrs={'class': 'text location'})
addresses.append(address.text.strip())
companies=pd.DataFrame({
'name': names,
'address': addresses
})
companies.to_csv(r'b_10300_10400.csv', mode='a', header=False)
Any thoughts?
Answers
You should reset the names and addresses variable each time through the loop:
pages = np.arange(10300, 10400, 1)
for page in pages:
names = []
addresses = []
page = requests.get(
"https://www.testpage.com/" + str(page), headers=headers)
soup = BeautifulSoup(page.text, 'html.parser')
company = soup.find_all('main')
for container in company:
name = container.find("b", {"id": "company_name"})
names.append(name.text.strip())
address = container.find('div', attrs={'class': 'text location'})
addresses.append(address.text.strip())
companies=pd.DataFrame({
'name': names,
'address': addresses
})
companies.to_csv(r'b_10300_10400.csv', mode='a', header=False)
更多推荐

所有评论(0)