Convert HTML to Pandas DataFrame

Mangs

0人浏览 · 2022-08-24 22:23:26

Mangs · 2022-08-24 22:23:26 发布

Answer a question

I have a beautifulsoup object as follows

<div class="companyProfileHeader">
<div>Industry<a href="/stock-screener/?sp=country::5|sector::a|industry::146|equityType::a&lt;eq_market_cap;1">Life Sciences Tools &amp; Services</a></div>
<div>Sector<a href="/stock-screener/?sp=country::5|sector::18|industry::a|equityType::a&lt;eq_market_cap;1">Healthcare</a></div>
<div>Employees<p class="bold">17000</p></div>
<div>Equity Type<p class="bold">ORD</p></div>
</div>

I want to convert the above into a Pandas DataFrame as follows

Expected Output

+--------------------------------+------------+-----------+-------------+
|            Industry            |   Sector   | Employees | Equity Type |
+--------------------------------+------------+-----------+-------------+
| Life Sciences Tools & Services | Healthcare |     17000 | ORD         |
+--------------------------------+------------+-----------+-------------+

Suppose that the bs object is named divlist I've extracted the text within using divlist.text but can't slice it appropriately to achieve above data frame.

Answers

I have taken your data as html and you can iterate to specific class using find_all method and i have used list Comprehension to get text and it is separated by ~ symbol

from bs4 import BeautifulSoup
soup=BeautifulSoup(html,"html.parser")
lst=[i.get_text(strip=True,separator="~") for i in soup.find("div",class_="companyProfileHeader").find_all("div")]
final_lst=[i.split("~") for i in lst ]

Now you can transform into DataFrame using final_lst

import pandas as pd
df=pd.DataFrame(final_lst)
df=df.transpose()
df.rename(columns=df.iloc[0], inplace = True)
df.drop(df.index[0], inplace = True)

Python

Python社区为您提供最前沿的新闻资讯和知识内容

更多推荐

求助！为什么用InsCode部署会出现无限重定向？

Python

如何重塑熊猫。系列

问题:如何重塑熊猫。系列在我看来,它就像 pandas.Series 中的一个错误。 a = pd.Series([1,2,3,4]) b = a.reshape(2,2) b b 有类型 Series 但无法显示,最后一条语句给出异常,非常冗长,最后一行是“TypeError: %d format: a number is required, not numpy.ndarray”。 b.sha

Python

在哪里可以找到有关 Keras 中默认权重初始化器的文档? [复制]

问题:在哪里可以找到有关 Keras 中默认权重初始化器的文档? [复制] 我刚刚在这里](https://keras.io/initializers/)中阅读了有关[中的 Keras 权重初始化器的信息。在文档中,只介绍了不同的初始化程序。如: model.add(Dense(64, kernel_initializer='random_normal')) 当我没有指定kernel_initia