添加链接
link管理
链接快照平台
  • 输入网页链接,自动生成快照
  • 标签化管理网页链接
I have been running my webSpider locally without a problem but then When I uploaded my code onto pythonanywhere I am getting errors. I am not going to copy the full source here but I am going to paste a prototype version of it which also return the same error.

My code run fine locally and also on Google cloud shell.

import requests
from bs4 import BeautifulSoup
url ="https://www.indeed.com/jobs?q=Developer&l=waterbury%2C%20CT&fromage=1&"
page = requests.get(url)
soup = BeautifulSoup(page.content,'html.parser')
result = soup.find(id="mosaic-provider-jobcards")
job_elements = result.find_all("div", class_="job_seen_beacon")
print(job_elements)

here is the error :

(jobSpider) 14:59 ~/jobSpider $ python testSoup.py                                                                                                                                                               
Traceback (most recent call last):                                                                                                                                                                               
  File "/home/moodkiller2022/jobSpider/testSoup.py", line 12, in <module>                                                                                                                                        
    job_elements = result.find_all("div", class_="job_seen_beacon")                                                                                                                                              
AttributeError: 'NoneType' object has no attribute 'find_all'
              

It looks like soup.find(id="mosaic-provider-jobcards") is returning None. Perhaps you can print out the page that you're getting for soup and find out what it contains? It's possible that the site you're accessing is blocking requests from PythonAnywhere -- sites often don't like being scraped from cloud computing platforms, and it's entirely possible that they've blocked us but haven't blocked the IP of the Google Cloud shell that you're using yet.

Thank you for getting back to me So there aren't any way around that ?

So I tried printing the soup object to see what it contains and I notice that

import requests
from bs4 import BeautifulSoup
import time
url ="https://www.indeed.com/jobs?q=Developer&l=waterbury%2C%20CT&fromage=1&"
page = requests.get(url)
soup = BeautifulSoup(page.content,'html.parser')
#result = soup.find(id="mosaic-provider-jobcards")
#job_elements = result.find_all("div", class_="job_seen_beacon")
print(soup)

Output:

(jobSpider) 15:45 ~/jobSpider $ python testSoup.py                                                                                                                                                               
<title>hCaptcha solve page</title>                                                                                                                                                                               
<script async="" defer="" src="https://www.hcaptcha.com/1/api.js"></script>                                                                                                                                      
<meta content="width=device-width, initial-scale=1" name="viewport"/>                                                                                                                                            
</head>                                                                                                                                                                                                          
<form action="/jobs?q=Developer&amp;l=waterbury,%20CT&amp;fromage=1&amp;redirected=1" method="POST" style="margin: 80px;">                                                                                       
<div class="h-captcha" data-sitekey="eb27f525-f936-43b4-91e2-95a426d4a8bd" data-size="compact"></div>                                                                                                            
<input type="submit" value="Submit"/>                                                                                                                                                                            
</form>                                                                                                                                                                                                          
</body>                                                                                                                                                                                                          
</html>

Looks like I got this captcha page. But the same soup objected returned actual HTML details, it looks like I cant even print soup anymore.

Sorry, we have had to rate-limit your feedback sending.
Please try again in a few moments... Thanks for the feedback! Our tireless devs will get back to you soon.