AttributeError: 'NoneType' object has no attribute 'find_all' : Forums : PythonAnywhere

link管理

链接快照平台

输入网页链接，自动生成快照
标签化管理网页链接

相关文章推荐

英俊的野马 · ClientAbortException ...· 7 月前 ·

俊逸的炒饭 · 贾玲一年减重百斤怎么做？运动学专家：符合科学 ...· 7 月前 ·

绅士的杨桃 · 《新时代的中国国际发展合作》白皮书-国家国际 ...· 1 年前 ·

大力的油条 · Oracle单行子查询返回多行结果的问题解决 ...· 1 年前 ·

慷慨大方的番茄 · Java监控与性能优化(二) 性能分析 - ...· 1 年前 ·

I have been running my webSpider locally without a problem but then When I uploaded my code onto pythonanywhere I am getting errors. I am not going to copy the full source here but I am going to paste a prototype version of it which also return the same error.

My code run fine locally and also on Google cloud shell.

import requests
from bs4 import BeautifulSoup
url ="https://www.indeed.com/jobs?q=Developer&l=waterbury%2C%20CT&fromage=1&"
page = requests.get(url)
soup = BeautifulSoup(page.content,'html.parser')
result = soup.find(id="mosaic-provider-jobcards")
job_elements = result.find_all("div", class_="job_seen_beacon")
print(job_elements)
here is the error :
(jobSpider) 14:59 ~/jobSpider $ python testSoup.py                                                                                                                                                               
Traceback (most recent call last):                                                                                                                                                                               
  File "/home/moodkiller2022/jobSpider/testSoup.py", line 12, in <module>                                                                                                                                        
    job_elements = result.find_all("div", class_="job_seen_beacon")                                                                                                                                              
AttributeError: 'NoneType' object has no attribute 'find_all'
              It looks like soup.find(id="mosaic-provider-jobcards") is returning None.  Perhaps you can print out the page that you're getting for soup and find out what it contains?  It's possible that the site you're accessing is blocking requests from PythonAnywhere -- sites often don't like being scraped from cloud computing platforms, and it's entirely possible that they've blocked us but haven't blocked the IP of the Google Cloud shell that you're using yet.
              Thank you for getting back to me
So there aren't any way around that ?
So I tried printing the soup object to see what it contains and I notice that 
import requests
from bs4 import BeautifulSoup
import time
url ="https://www.indeed.com/jobs?q=Developer&l=waterbury%2C%20CT&fromage=1&"
page = requests.get(url)
soup = BeautifulSoup(page.content,'html.parser')
#result = soup.find(id="mosaic-provider-jobcards")
#job_elements = result.find_all("div", class_="job_seen_beacon")
print(soup)
Output:
(jobSpider) 15:45 ~/jobSpider $ python testSoup.py                                                                                                                                                               
<title>hCaptcha solve page</title>                                                                                                                                                                               
<script async="" defer="" src="https://www.hcaptcha.com/1/api.js"></script>                                                                                                                                      
<meta content="width=device-width, initial-scale=1" name="viewport"/>                                                                                                                                            
</head>                                                                                                                                                                                                          
<form action="/jobs?q=Developer&amp;l=waterbury,%20CT&amp;fromage=1&amp;redirected=1" method="POST" style="margin: 80px;">                                                                                       
<div class="h-captcha" data-sitekey="eb27f525-f936-43b4-91e2-95a426d4a8bd" data-size="compact"></div>                                                                                                            
<input type="submit" value="Submit"/>                                                                                                                                                                            
</form>                                                                                                                                                                                                          
</body>                                                                                                                                                                                                          
</html>
Looks like I got this captcha page.
But the same soup objected returned actual HTML  details, it looks like I cant even print soup anymore.
        Sorry, we have had to rate-limit your feedback sending.
Please try again in a few moments...
        Thanks for the feedback! Our tireless devs will get back to you soon.