BeautifulSoup not scraping all HREF links
Answer a question
Two part question -- First, when running the code without final if statement, I'm not getting all of the HREF tags... I see many more links in Inspector that don't seem to come through.
Looking for a fix, but also trying to understand general knowledge on this - is there a reason why some links would work and others would not?
Similarly, I wanted to pull the HREF tags that contain "Surf-Report". I've used this code with p.startswith, and it works... but I couldn't find what the function call would be to say "contains".
I'm new to all of this, looking but don't fully understand either of these.
import requests
from bs4 import BeautifulSoup
profiles = []
urls = [
'https://magicseaweed.com/New-Jersey-Monmouth-County-Surfing/277/',
'https://magicseaweed.com/New-Jersey-Ocean-County-Surfing/278/'
]
for url in urls:
req = requests.get(url)
soup = BeautifulSoup(req.text, 'html.parser')
for profile in soup.find_all('a'):
profile = profile.get('href')
profiles.append(profile)
# print(profiles)
for p in profiles:
if p.contains('Surf-Report'):
print(p)
For context, my overall goal is to go to these different county pages, and get all of the HREF tags there. Once I have those, I want to visit each individual link and pull the wave sizes from each of the links stored there.
I'm looking to build a way to monitor all waves in New Jersey daily... no purpose, just a fun practice project with something I find interesting.
Answers
Those urls in page appears to be fed into dynamically, via an (or more?) XHR call. Upon a brief inspection of that page' Dev tools - network tab, I noticed a call to an api (from which I stripped the variables). Scraping that api returns over 8k results:
import requests
import pandas as pd
import json
r = requests.get('https://magicseaweed.com/api/mdkey/spot?&limit=-1')
df = pd.DataFrame(r.json())
print(df)
Result:
_id
_obj
_path
name
description
lat
lon
dataLat
dataLon
surfAreaId
dataSpotId
url
multiplier
optimumSwellAngle
optimumWindAngle
timezone
offset
modelName
isBigWave
ratingType
timeZoneAbbr
hasAdvancedForecast
proteusDataId
proteusResolution
surflineSpotId
defaultModelId
topLevelNav
tidalPort
isDataSpot
favouriteCount
mapImageUrl
breakingWaveModelId
weatherModel
added
hidden
edited
pointOfInterestId
useSDS
0
1
Spot
Spot
Newquay - Fistral North
50.4184
-5.0997
50.42
-5.08
7
nan
/Newquay-Fistral-North-Surf-Report/1/
0.7
290
110
Europe/London
3600
glo_30m
False
directional
BST
True
nan
UK_4m
584204214e65fad6a7709cec
42
True
True
0
https://chart-1.msw.ms/maps/spot/2576f3cfb35dba07a84590141d54d3a5.png
nan
gfs.0p25
-62169984000
False
1617982527
c10396fc-ed41-4771-8e8e-ab8dbff5c67c
True
1
2
Spot
Spot
Porthtowan
50.2891
-5.2461
50.27
-5.3
6
nan
/Porthtowan-Surf-Report/2/
0.8
290
110
Europe/London
3600
glo_30m
False
directional
BST
True
nan
GLOB_30m
5842041f4e65fad6a7708c98
38
True
True
0
https://chart-3.msw.ms/maps/spot/d278b42dc4a8adc983a24e2c04333665.png
nan
gfs.0p25
-62169984000
False
1617982527
39bca112-f093-4a7b-90eb-b7993920e5c4
True
2
3
Spot
Spot
Gwithian
50.2235
-5.399
50.2
-5.5
6
nan
/Gwithian-Surf-Report/3/
0.5
285
105
Europe/London
3600
glo_30m
False
directional
BST
True
nan
GLOB_30m
5842041f4e65fad6a7708c95
38
True
Perranporth
True
0
https://chart-5.msw.ms/maps/spot/2a4608d0e793ee20f4566ca85f5ba6cd.png
nan
gfs.0p25
-62169984000
False
1617982527
6b0785be-1efb-413d-a5a9-ba2133c6ef68
True
3
4
Spot
Spot
Sennen
50.0802
-5.6976
50.07
-5.7
6
nan
/Sennen-Surf-Report/4/
0.8
270
90
Europe/London
3600
glo_30m
False
directional
BST
True
nan
GLOB_30m
5842041f4e65fad6a7708c97
38
True
True
0
https://chart-4.msw.ms/maps/spot/c1be3fe6871d15e4ea5297193b8b81da.png
nan
gfs.0p25
-62169984000
False
1617982527
a641e633-8692-4d4b-b2d6-c4e1d4132c9b
True
4
5
Spot
Spot
Constantine
50.5333
-5.0221
50.5759
-4.92239
8
nan
/Constantine-Surf-Report/5/
1
270
90
Europe/London
3600
glo_30m
False
directional
BST
True
nan
GLOB_30m
584204204e65fad6a77090b3
38
True
True
0
https://chart-3.msw.ms/maps/spot/47b00f609d5e46cda66040d8b811bae6.png
nan
gfs.0p25
-62169984000
False
1617982527
1daacdd5-a92a-4f7c-bc7c-af30a392ef7d
True
5
6
Spot
Spot
Bude - Crooklets
50.8358
-4.5548
50.8336
-4.56057
8
nan
/Bude-Crooklets-Surf-Report/6/
1
270
90
Europe/London
3600
glo_30m
False
directional
BST
True
nan
GLOB_30m
5842041f4e65fad6a7708ca5
38
True
True
0
https://chart-1.msw.ms/maps/spot/553d3a850372eee8b10d13d23cbdb78e.png
nan
gfs.0p25
-62169984000
False
1617982527
6cb522d3-a781-45ae-83cd-fcc941fd47cb
True
6
7
Spot
Spot
Croyde Beach
51.1302
-4.2435
51.1449
-4.25995
9
nan
/Croyde-Beach-Surf-Report/7/
0.8
270
90
Europe/London
3600
glo_30m
False
directional
BST
True
nan
GLOB_30m
5842041f4e65fad6a7708ca4
38
True
Ilfracombe, England
True
0
https://chart-3.msw.ms/maps/spot/0f967e1e6130e9cb1b2623aafe966b58.png
nan
gfs.0p25
-62169984000
False
1617982527
2dca4454-5789-4be3-808e-f512fef45dc3
True
7
8
Spot
Spot
Praa Sands
50.103
-5.391
50
-3.87
5
nan
/Praa-Sands-Surf-Report/8/
0.8
210
30
Europe/London
3600
glo_30m
False
directional
BST
True
nan
GLOB_30m
5842041f4e65fad6a7708c9a
38
True
True
0
https://chart-4.msw.ms/maps/spot/aea8da3ce8bd22228c07c79db8e9b8de.png
nan
gfs.0p25
-62169984000
False
1617982527
a166dde9-2d1c-4a55-bb34-5a6efce93986
True
8
9
Spot
Spot
Whitsand Bay
50.3387
-4.2434
50.3334
-4.2433
5
nan
/Whitsand-Bay-Surf-Report/9/
0.7
225
45
Europe/London
3600
glo_30m
False
directional
BST
True
nan
UK_4m
584204204e65fad6a77090c5
42
True
True
0
https://chart-3.msw.ms/maps/spot/1fe1f342742ba3cf7dd3f8d9943948cc.png
nan
gfs.0p25
-62169984000
False
1617982527
c6b42c46-7db3-4e53-8f38-b28db957b4e7
True
9
10
Spot
Spot
Bantham
50.2787
-3.8885
50
-3.87
5
nan
/Bantham-Surf-Report/10/
0.8
230
65
Europe/London
3600
glo_30m
False
directional
BST
True
2
UK_4m
584204204e65fad6a77090c9
42
True
River Yealm
True
0
https://chart-1.msw.ms/maps/spot/358c02090c0c31888fee4794b39d397c.png
nan
gfs.0p25
-62169984000
False
1646829186
d3566d34-b58d-4803-8cf2-3e3dc5fc1a48
True
Is this what you're after?
更多推荐

所有评论(0)