Answer a question

I am stuck with a project which requires me to scrape a table from a website. The problem I am running into is there are multiple tables on the webpage and none of them seem to have classes when I inspect the elements. The table rows and columns do however have classes assigned to them.

The table I am needing to scrape is the table detailing zip code, location, city, population, and avg income.

I am new to web-scraping/BeautifulSoup and would appreciate any help I can get.

http://zipatlas.com/us/pa/philadelphia/zip-code-comparison/median-household-income.htm

from bs4 import BeautifulSoup
import requests

income_url = "http://zipatlas.com/us/pa/philadelphia/zip-code-comparison/median-household-income.htm"
headers = {"User-Agent": "Mozilla/5.0"}

response = requests.get(income_url,headers = headers)
response.status_code

soup = BeautifulSoup(response.content,"html.parser")

Answers

If you're after <table> tags, Pandas' read_html() is what you'll want to use (it uses BeautifulSoup under the hood, but does the work for you). It'll return a list of dataframes. The table you are after is the table in index position 11. Then it's just a matter of manipulating the dataframe to get what you want.

import pandas as pd

income_url  = 'http://zipatlas.com/us/pa/philadelphia/zip-code-comparison/median-household-income.htm'
dfs = pd.read_html(income_url )

df = dfs[11]
df.columns = df.iloc[0,:]
df = df.iloc[1:,:].reset_index(drop=True)

If you are after that particular table by tags/attributes, you are correct it doesn't have a class attribute. But you aren't limited to find only class. In this site, the table does have the attribute rules="all" or frame="box", so you can use one of those:

from bs4 import BeautifulSoup
import requests

income_url = "http://zipatlas.com/us/pa/philadelphia/zip-code-comparison/median-household-income.htm"
headers = {"User-Agent": "Mozilla/5.0"}

response = requests.get(income_url,headers = headers)
response.status_code

soup = BeautifulSoup(response.content,"html.parser")
table = soup.find('table',{'rules':'all'})

df = pd.read_html(str(table))[0]
df.columns = df.iloc[0,:]
df = df.iloc[1:,:].reset_index(drop=True)

Output:

print (df.to_string())
0     # Zip Code               Location                        City Population Avg. Income/H/hold National Rank
0    1.    19113  39.870679, -75.247782  Philadelphia, Pennsylvania        136         $70,833.00        #1,450
1    2.    19106  39.950472, -75.147231  Philadelphia, Pennsylvania       8359         $61,720.00        #2,617
2    3.    19118  40.072443, -75.212415  Philadelphia, Pennsylvania       9608         $60,179.00        #2,986
3    4.    19154  40.095521, -74.981818  Philadelphia, Pennsylvania      35606         $51,949.00        #5,075
4    5.    19119  40.052013, -75.192553  Philadelphia, Pennsylvania      28873         $46,520.00        #7,487
5    6.    19116  40.115569, -75.013276  Philadelphia, Pennsylvania      32560         $44,776.00        #8,502
6    7.    19128  40.049525, -75.230253  Philadelphia, Pennsylvania      36420         $43,629.00        #9,136
7    8.    19127  40.027929, -75.224083  Philadelphia, Pennsylvania       5465         $43,490.00        #9,226
8    9.    19150  40.072482, -75.171735  Philadelphia, Pennsylvania      25274         $42,342.00       #10,007
9   10.    19114  40.069361, -75.000264  Philadelphia, Pennsylvania      31083         $41,592.00       #10,652
10  11.    19115  40.092757, -75.042597  Philadelphia, Pennsylvania      31853         $39,075.00       #12,928
11  12.    19130  39.967905, -75.174735  Philadelphia, Pennsylvania      22874         $38,668.00       #13,307
12  13.    19111  40.063318, -75.077631  Philadelphia, Pennsylvania      58874         $37,996.00       #13,922
13  14.    19103  39.952795, -75.173949  Philadelphia, Pennsylvania      19714         $37,959.00       #13,970
14  15.    19152  40.061595, -75.046385  Philadelphia, Pennsylvania      31379         $37,760.00       #14,143
15  16.    19149  40.037448, -75.065561  Philadelphia, Pennsylvania      48483         $37,210.00       #14,713
16  17.    19153  39.894414, -75.232375  Philadelphia, Pennsylvania      12324         $36,872.00       #15,129
17  18.    19129  40.015462, -75.182928  Philadelphia, Pennsylvania      10748         $36,465.00       #15,573
18  19.    19136  40.040272, -75.020603  Philadelphia, Pennsylvania      40080         $35,650.00       #16,647
19  20.    19102  39.953423, -75.165384  Philadelphia, Pennsylvania       4396         $35,625.00       #16,711
20  21.    19126  40.056119, -75.136564  Philadelphia, Pennsylvania      16484         $34,607.00       #17,924
21  22.    19135  40.022732, -75.049612  Philadelphia, Pennsylvania      30881         $34,584.00       #17,946
22  23.    19147  39.936633, -75.153153  Philadelphia, Pennsylvania      32680         $34,431.00       #18,129
23  24.    19151  39.979740, -75.256726  Philadelphia, Pennsylvania      31255         $33,840.00       #18,751
24  25.    19138  40.056028, -75.159179  Philadelphia, Pennsylvania      34477         $32,248.00       #20,628
25  26.    19137  39.995604, -75.074623  Philadelphia, Pennsylvania       8069         $31,761.00       #21,311
26  27.    19120  40.034147, -75.119198  Philadelphia, Pennsylvania      68831         $31,588.00       #21,529
27  28.    19131  39.986772, -75.219521  Philadelphia, Pennsylvania      47044         $30,099.00       #23,512
28  29.    19141  40.037904, -75.145392  Philadelphia, Pennsylvania      34984         $28,861.00       #24,807
29  30.    19125  39.977245, -75.125222  Philadelphia, Pennsylvania      23646         $28,679.00       #24,999
30  31.    19124  40.017119, -75.092814  Philadelphia, Pennsylvania      63131         $28,574.00       #25,098
31  32.    19144  40.031929, -75.176099  Philadelphia, Pennsylvania      46794         $27,436.00       #26,201
32  33.    19148  39.913130, -75.155421  Philadelphia, Pennsylvania      48573         $27,097.00       #26,529
33  34.    19145  39.913431, -75.191556  Philadelphia, Pennsylvania      45647         $26,655.00       #26,971
34  35.    19142  39.921746, -75.233277  Philadelphia, Pennsylvania      29063         $25,973.00       #27,536
35  36.    19143  39.942892, -75.225460  Philadelphia, Pennsylvania      71169         $25,826.00       #27,668
36  37.    19146  39.939069, -75.182585  Philadelphia, Pennsylvania      35783         $24,803.00       #28,329
37  38.    19107  39.951623, -75.158637  Philadelphia, Pennsylvania      12340         $24,448.00       #28,539
38  39.    19139  39.961529, -75.230259  Philadelphia, Pennsylvania      43866         $21,329.00       #30,102
39  40.    19123  39.964212, -75.147103  Philadelphia, Pennsylvania       9818         $21,096.00       #30,195
40  41.    19134  39.992219, -75.107863  Philadelphia, Pennsylvania      57922         $20,903.00       #30,253
41  42.    19140  40.011789, -75.145282  Philadelphia, Pennsylvania      57125         $20,077.00       #30,509
42  43.    19132  39.996457, -75.170586  Philadelphia, Pennsylvania      41709         $18,777.00       #30,808
43  44.    19122  39.977688, -75.145885  Philadelphia, Pennsylvania      19589         $18,395.00       #30,887
44  45.    19104  39.960323, -75.197883  Philadelphia, Pennsylvania      50125         $16,151.00       #31,267
45  46.    19121  39.981980, -75.179120  Philadelphia, Pennsylvania      34935         $15,888.00       #31,300
46  47.    19133  39.993092, -75.141671  Philadelphia, Pennsylvania      27971         $13,828.00       #31,507
47  48.    19112  39.893156, -75.168944  Philadelphia, Pennsylvania         29              $0.00       #31,963
48  49.    19108  39.959626, -75.160879  Philadelphia, Pennsylvania          0              $0.00       #31,977
Logo

Python社区为您提供最前沿的新闻资讯和知识内容

更多推荐