添加链接
link管理
链接快照平台
  • 输入网页链接,自动生成快照
  • 标签化管理网页链接
相关文章推荐
儒雅的皮带  ·  霸凌催眠复仇在线观 ...·  1 月前    · 
耍酷的枕头  ·  Rekey and decrypt - ...·  1 月前    · 
侠义非凡的课本  ·  SqlParameter Class ...·  3 月前    · 
求醉的杯子  ·  21609 – Incomplete ...·  6 月前    · 
soup = BeautifulSoup(urls.text , " html5lib" ) # print(soup.prettify()) content = soup.find( " div" , { " class" : " tt_article_useless_p_margin" }) images = content.findAll( ' img' ) for img in images: img_url = img[ ' src' ]+ " ?original" print (img_url,file=im_link) def get_links(): count=1 for line in tw_link: print (line,count) count+=1 get_images(line) get_links()
What I have tried:
<pre>The code seems to work fine when using a single link, but when i pass the urls to the function i ' m getting the following error.<br /> AttributeError Traceback (most recent call last) in () 23 count+=1 24 get_images(line) ---> 25 get_links()<br /> 1 frames in get_links() 22 print(line,count) 23 count+=1 ---> 24 get_images(line) 25 get_links()<br /> in get_images(urli) 12 print(soup.prettify()) 13 content = soup.find("div", {"class": "tt_article_useless_p_margin"}) ---> 14 images = content.findAll(' img ' ) 15 for img in images: 16 img_url = img[' src ' ]+"?original"<br /> AttributeError: ' NoneType ' object has no attribute ' findAll '
My guess is that i'm triggering some sort of Bot Detection (because when passing a single link different page is loaded not the one that's being loaded currently), is there any way to bypass that..? I've tried using time.sleep(5) but that also didn't work tw_link = open ( " TW_Links.txt" , " r" , encoding = ' utf-8' ) im_link = open ( " DCDN_Links.txt" , " w+" ) kak_link = open ( " KCDN_Links.txt" , " w+" ) def get_images(urlset): for x in urlset: rs = requests.Session() urls=rs.get(x) soup = BeautifulSoup(urls.text , " html5lib" ) content = soup.find( " div" , { " class" : " tt_article_useless_p_margin" }) images = content.findAll( ' img' ) for img in images: img_url = img[ ' src' ]+ " ?original" if " blog" in img_url: print (img_url,file=kak_link) print (img_url) print (img_url,file=im_link) print (img_url) # print(x) time.sleep( 2 ) def get_links(): count=1 linklist = [] for line in tw_link: line = line.replace( " \n" , " " ) linklist.append(line) get_images(linklist) get_links()
For those waiting for a solution, it was pretty simple, i was doubtful of the request module so i intercepted the traffic from the program using proxy and voila turns out the request module also included EOL symbol in the request as well, while it might've worked with most sites this particular site redirected to the 404 Page , so a simple removal of "\n" from the lines read did the trick.
  • Read the question carefully.
  • Understand that English isn't everyone's first language so be lenient of bad spelling and grammar.
  • If a question is poorly phrased then either ask for clarification, ignore it, or edit the question and fix the problem. Insults are not welcome.
  • Don't tell someone to read the manual. Chances are they have and don't get it. Provide an answer or move on to the next question. Let's work to help developers, not make them feel stupid.
  •