Answer a question

I have a website containing film listings, I've put together a simplified HTML of the website. Please note that for the real world example the <ul> tags are not direct children of the class film_listing or showtime. They are found under several <div> or <ul> elements.

<li class="film_listing">
       <h3 class="film_title">James Bond</h3>
       <ul class="showtimes">
              <li class="showtime">
                     <p class="start_time">15:00</p>
              </li>
              <li class="showtime">
                     <p class="start_time">19:00</p>
                     <ul class="attributes">
                            <li class="audio_desc">
                            </li>
                            <li class="open_cap">
                            </li>
                     </ul>
              </li>
       </ul>
</li>

I have created a Python script to scrape the website which currently lists all film titles with the first showtime and first attribute of each. However, I am trying to list all showtimes. The final aim is to only list film titles with open captions and the showtime of those open captions performances.

Here is the python script with a nested for loop that doesn't work and prints all showtimes for all films, rather than showtimes for a specific film. It is also not set up to only list captioned films yet. I suspect the logic may be wrong and would appreciate any advice. Thanks!

for i in soup.findAll('li', {'class':'film_listing'}):
    film_title=i.find('h3', {'class':'film_title'}).text  
    print(film_title)
 
    for j in soup.findAll('li', {'class':'showtime'}):
            print(j['showtime.text'])   

    #For the time listings, find ones with Open Captioned
    i=filmlisting.find('li', {'class':'open_cap'})
    print(film_access)

edit: small correction to html script

Answers

There are many ways how you could extract the information. One way is to "search backwards". Search for <li> with class="open_cap" and the find previous start time and film title:

from bs4 import BeautifulSoup


txt = '''
<li class="film_listing">
       <h3 class="film_title">James Bond</h3>
       <ul class="showtimes">
              <li class="showtime">
                     <p class="start_time">15:00</p>
              </li>
              <li class="showtime">
                     <p class="start_time">19:00</p>
                     <ul class="attributes">
                            <li class="audio_desc">
                            </li>
                            <li class="open_cap">
                            </li>
                     </ul>
              </li>
       </ul>
</li>'''

soup = BeautifulSoup(txt, 'html.parser')


for open_cap in soup.select('.open_cap'):
    print('Name       :', open_cap.find_previous(class_='film_title').text)
    print('Start time :', open_cap.find_previous(class_='start_time').text)
    print('-' * 80)

Prints:

Name       : James Bond
Start time : 19:00
--------------------------------------------------------------------------------
Logo

Python社区为您提供最前沿的新闻资讯和知识内容

更多推荐