Answer a question

I'm trying to scrape an html table with bs4, but my code is not working. I'd like to get the tds row data information so that I can write them in a csv file. this is my html code:

<table class="sc-jAaTju bVEWLO">
    <thead>
        <tr>
            <td width="10%">Rank</td>
            <td>Trending Topic</td>
            <td width="30%">Tweet Volume</td>
        </tr>
        </thead>
        <tbody>
        <tr>
            <td>1</td>
            <td><a href="http:///example.com/search?q=%23One" target="_blank" without="true" rel="noopener noreferrer">#One</a></td>
            <td>1006.4K tweets</td>
        </tr>
        <tr>
            <td>2</td>
            <td><a href="http:///example.com/search?q=%23Two" target="_blank" without="true" rel="noopener noreferrer">#Two</a></td>
            <td>1028.7K tweets</td>
        </tr>
        <tr>
            <td>3</td>
            <td><a href="http:///example.com/search?q=%23Three" target="_blank" without="true" rel="noopener noreferrer">#Three</a></td>
            <td>Less than 10K tweets</td>
        </tr>
    </tbody>
</table>

This is my first try:

url = requests.get(f"https://www.exportdata.io/trends/italy/2020-01-01/0")
soup = BeautifulSoup(url.text, "html.parser")

table = soup.find_all("table", attrs={"class":"sc-jAaTju bVEWLO"})

And my second one:

tables = soup.find_all('table') 


for table in tables:
    td = tables.td.text.strip()

But neither are working. What am I missing? Thank you

Answers

the page loads dynamically, so you need to find the request and substitute the date and time into it

import requests
import pandas as pd


url = "https://api.exportdata.io/trends/locations/it?date=2020-01-01&hour=0"
response = requests.get(url)
df = pd.DataFrame(response.json()).fillna('Less than 10K tweets')
print(df.to_string(columns=['name', 'tweet_volume']))

OUTPUT:

                 name          tweet_volume
0      #lannocheverra  Less than 10K tweets
1      Happy New Year             4948992.0
2           Buon 2020               18359.0
3         #Mattarella               19304.0
4         #skamfrance  Less than 10K tweets
5        Mariah Carey               36853.0
6     #GliAristogatti  Less than 10K tweets
7       Orietta Berti  Less than 10K tweets
8      Gigi D'Alessio  Less than 10K tweets
9          Auguriiiii  Less than 10K tweets
10           #NewYear              163253.0
11       Welcome 2020              101403.0
12       Romina Power  Less than 10K tweets
13      Auguri Matteo  Less than 10K tweets
14            Al Bano  Less than 10K tweets
15      fabrizio moro  Less than 10K tweets
16          Panicucci  Less than 10K tweets
17        John Boyega               78097.0
18             Inizio  Less than 10K tweets
19      Auguri Silvia  Less than 10K tweets
20       Auguri Marco  Less than 10K tweets
21      #Ghostbusters  Less than 10K tweets
22  #thebluesbrothers  Less than 10K tweets
23   #FeliceAnnoNuovo  Less than 10K tweets
24  #bottidicapodanno  Less than 10K tweets
25        #ventiventi  Less than 10K tweets
26         #quirinale  Less than 10K tweets
Logo

Python社区为您提供最前沿的新闻资讯和知识内容

更多推荐