I'd really like to be able to allow Beautiful Soup to match any list of tags, like so. I know attr accepts regex, but is there anything in beautiful soup that allows you to do so?
soup.findAll("(a|div)")
Output:
<a> ASDFS
<div> asdfasdf
<a> asdfsdf
My goal is to create a scraper that can grab tables from sites. Sometimes tags are named inconsistently, and I'd like to be able to input a list of tags to name the 'data' part of a table.
find_all() is the most favored method in the Beautiful Soup search API.
You can pass a variation of filters. Also, pass a list to find multiple tags:
>>> soup.find_all(['a', 'div'])
Example:
>>> from bs4 import BeautifulSoup
>>> soup = BeautifulSoup('<html><body><div>asdfasdf</div><p><a>foo</a></p></body></html>')
>>> soup.find_all(['a', 'div'])
[<div>asdfasdf</div>, <a>foo</a>]
Or you can use a regular expression to find tags that contain a or div:
>>> import re
>>> soup.find_all(re.compile("(a|div)"))
所有评论(0)