BeautifulSoup - Scrape Table Without A Class
Answer a question
I am stuck with a project which requires me to scrape a table from a website. The problem I am running into is there are multiple tables on the webpage and none of them seem to have classes when I inspect the elements. The table rows and columns do however have classes assigned to them.
The table I am needing to scrape is the table detailing zip code, location, city, population, and avg income.
I am new to web-scraping/BeautifulSoup and would appreciate any help I can get.
http://zipatlas.com/us/pa/philadelphia/zip-code-comparison/median-household-income.htm
from bs4 import BeautifulSoup
import requests
income_url = "http://zipatlas.com/us/pa/philadelphia/zip-code-comparison/median-household-income.htm"
headers = {"User-Agent": "Mozilla/5.0"}
response = requests.get(income_url,headers = headers)
response.status_code
soup = BeautifulSoup(response.content,"html.parser")
Answers
If you're after <table> tags, Pandas' read_html() is what you'll want to use (it uses BeautifulSoup under the hood, but does the work for you). It'll return a list of dataframes. The table you are after is the table in index position 11. Then it's just a matter of manipulating the dataframe to get what you want.
import pandas as pd
income_url = 'http://zipatlas.com/us/pa/philadelphia/zip-code-comparison/median-household-income.htm'
dfs = pd.read_html(income_url )
df = dfs[11]
df.columns = df.iloc[0,:]
df = df.iloc[1:,:].reset_index(drop=True)
If you are after that particular table by tags/attributes, you are correct it doesn't have a class attribute. But you aren't limited to find only class. In this site, the table does have the attribute rules="all" or frame="box", so you can use one of those:
from bs4 import BeautifulSoup
import requests
income_url = "http://zipatlas.com/us/pa/philadelphia/zip-code-comparison/median-household-income.htm"
headers = {"User-Agent": "Mozilla/5.0"}
response = requests.get(income_url,headers = headers)
response.status_code
soup = BeautifulSoup(response.content,"html.parser")
table = soup.find('table',{'rules':'all'})
df = pd.read_html(str(table))[0]
df.columns = df.iloc[0,:]
df = df.iloc[1:,:].reset_index(drop=True)
Output:
print (df.to_string())
0 # Zip Code Location City Population Avg. Income/H/hold National Rank
0 1. 19113 39.870679, -75.247782 Philadelphia, Pennsylvania 136 $70,833.00 #1,450
1 2. 19106 39.950472, -75.147231 Philadelphia, Pennsylvania 8359 $61,720.00 #2,617
2 3. 19118 40.072443, -75.212415 Philadelphia, Pennsylvania 9608 $60,179.00 #2,986
3 4. 19154 40.095521, -74.981818 Philadelphia, Pennsylvania 35606 $51,949.00 #5,075
4 5. 19119 40.052013, -75.192553 Philadelphia, Pennsylvania 28873 $46,520.00 #7,487
5 6. 19116 40.115569, -75.013276 Philadelphia, Pennsylvania 32560 $44,776.00 #8,502
6 7. 19128 40.049525, -75.230253 Philadelphia, Pennsylvania 36420 $43,629.00 #9,136
7 8. 19127 40.027929, -75.224083 Philadelphia, Pennsylvania 5465 $43,490.00 #9,226
8 9. 19150 40.072482, -75.171735 Philadelphia, Pennsylvania 25274 $42,342.00 #10,007
9 10. 19114 40.069361, -75.000264 Philadelphia, Pennsylvania 31083 $41,592.00 #10,652
10 11. 19115 40.092757, -75.042597 Philadelphia, Pennsylvania 31853 $39,075.00 #12,928
11 12. 19130 39.967905, -75.174735 Philadelphia, Pennsylvania 22874 $38,668.00 #13,307
12 13. 19111 40.063318, -75.077631 Philadelphia, Pennsylvania 58874 $37,996.00 #13,922
13 14. 19103 39.952795, -75.173949 Philadelphia, Pennsylvania 19714 $37,959.00 #13,970
14 15. 19152 40.061595, -75.046385 Philadelphia, Pennsylvania 31379 $37,760.00 #14,143
15 16. 19149 40.037448, -75.065561 Philadelphia, Pennsylvania 48483 $37,210.00 #14,713
16 17. 19153 39.894414, -75.232375 Philadelphia, Pennsylvania 12324 $36,872.00 #15,129
17 18. 19129 40.015462, -75.182928 Philadelphia, Pennsylvania 10748 $36,465.00 #15,573
18 19. 19136 40.040272, -75.020603 Philadelphia, Pennsylvania 40080 $35,650.00 #16,647
19 20. 19102 39.953423, -75.165384 Philadelphia, Pennsylvania 4396 $35,625.00 #16,711
20 21. 19126 40.056119, -75.136564 Philadelphia, Pennsylvania 16484 $34,607.00 #17,924
21 22. 19135 40.022732, -75.049612 Philadelphia, Pennsylvania 30881 $34,584.00 #17,946
22 23. 19147 39.936633, -75.153153 Philadelphia, Pennsylvania 32680 $34,431.00 #18,129
23 24. 19151 39.979740, -75.256726 Philadelphia, Pennsylvania 31255 $33,840.00 #18,751
24 25. 19138 40.056028, -75.159179 Philadelphia, Pennsylvania 34477 $32,248.00 #20,628
25 26. 19137 39.995604, -75.074623 Philadelphia, Pennsylvania 8069 $31,761.00 #21,311
26 27. 19120 40.034147, -75.119198 Philadelphia, Pennsylvania 68831 $31,588.00 #21,529
27 28. 19131 39.986772, -75.219521 Philadelphia, Pennsylvania 47044 $30,099.00 #23,512
28 29. 19141 40.037904, -75.145392 Philadelphia, Pennsylvania 34984 $28,861.00 #24,807
29 30. 19125 39.977245, -75.125222 Philadelphia, Pennsylvania 23646 $28,679.00 #24,999
30 31. 19124 40.017119, -75.092814 Philadelphia, Pennsylvania 63131 $28,574.00 #25,098
31 32. 19144 40.031929, -75.176099 Philadelphia, Pennsylvania 46794 $27,436.00 #26,201
32 33. 19148 39.913130, -75.155421 Philadelphia, Pennsylvania 48573 $27,097.00 #26,529
33 34. 19145 39.913431, -75.191556 Philadelphia, Pennsylvania 45647 $26,655.00 #26,971
34 35. 19142 39.921746, -75.233277 Philadelphia, Pennsylvania 29063 $25,973.00 #27,536
35 36. 19143 39.942892, -75.225460 Philadelphia, Pennsylvania 71169 $25,826.00 #27,668
36 37. 19146 39.939069, -75.182585 Philadelphia, Pennsylvania 35783 $24,803.00 #28,329
37 38. 19107 39.951623, -75.158637 Philadelphia, Pennsylvania 12340 $24,448.00 #28,539
38 39. 19139 39.961529, -75.230259 Philadelphia, Pennsylvania 43866 $21,329.00 #30,102
39 40. 19123 39.964212, -75.147103 Philadelphia, Pennsylvania 9818 $21,096.00 #30,195
40 41. 19134 39.992219, -75.107863 Philadelphia, Pennsylvania 57922 $20,903.00 #30,253
41 42. 19140 40.011789, -75.145282 Philadelphia, Pennsylvania 57125 $20,077.00 #30,509
42 43. 19132 39.996457, -75.170586 Philadelphia, Pennsylvania 41709 $18,777.00 #30,808
43 44. 19122 39.977688, -75.145885 Philadelphia, Pennsylvania 19589 $18,395.00 #30,887
44 45. 19104 39.960323, -75.197883 Philadelphia, Pennsylvania 50125 $16,151.00 #31,267
45 46. 19121 39.981980, -75.179120 Philadelphia, Pennsylvania 34935 $15,888.00 #31,300
46 47. 19133 39.993092, -75.141671 Philadelphia, Pennsylvania 27971 $13,828.00 #31,507
47 48. 19112 39.893156, -75.168944 Philadelphia, Pennsylvania 29 $0.00 #31,963
48 49. 19108 39.959626, -75.160879 Philadelphia, Pennsylvania 0 $0.00 #31,977
更多推荐

所有评论(0)