Python BeautifulSoup Loop through divs and multiple elements
Answer a question
I have a website containing film listings, I've put together a simplified HTML of the website. Please note that for the real world example the <ul> tags are not direct children of the class film_listing or showtime. They are found under several <div> or <ul> elements.
<li class="film_listing">
<h3 class="film_title">James Bond</h3>
<ul class="showtimes">
<li class="showtime">
<p class="start_time">15:00</p>
</li>
<li class="showtime">
<p class="start_time">19:00</p>
<ul class="attributes">
<li class="audio_desc">
</li>
<li class="open_cap">
</li>
</ul>
</li>
</ul>
</li>
I have created a Python script to scrape the website which currently lists all film titles with the first showtime and first attribute of each. However, I am trying to list all showtimes. The final aim is to only list film titles with open captions and the showtime of those open captions performances.
Here is the python script with a nested for loop that doesn't work and prints all showtimes for all films, rather than showtimes for a specific film. It is also not set up to only list captioned films yet. I suspect the logic may be wrong and would appreciate any advice. Thanks!
for i in soup.findAll('li', {'class':'film_listing'}):
film_title=i.find('h3', {'class':'film_title'}).text
print(film_title)
for j in soup.findAll('li', {'class':'showtime'}):
print(j['showtime.text'])
#For the time listings, find ones with Open Captioned
i=filmlisting.find('li', {'class':'open_cap'})
print(film_access)
edit: small correction to html script
Answers
There are many ways how you could extract the information. One way is to "search backwards". Search for <li> with class="open_cap" and the find previous start time and film title:
from bs4 import BeautifulSoup
txt = '''
<li class="film_listing">
<h3 class="film_title">James Bond</h3>
<ul class="showtimes">
<li class="showtime">
<p class="start_time">15:00</p>
</li>
<li class="showtime">
<p class="start_time">19:00</p>
<ul class="attributes">
<li class="audio_desc">
</li>
<li class="open_cap">
</li>
</ul>
</li>
</ul>
</li>'''
soup = BeautifulSoup(txt, 'html.parser')
for open_cap in soup.select('.open_cap'):
print('Name :', open_cap.find_previous(class_='film_title').text)
print('Start time :', open_cap.find_previous(class_='start_time').text)
print('-' * 80)
Prints:
Name : James Bond
Start time : 19:00
--------------------------------------------------------------------------------
更多推荐

所有评论(0)