Beautifulsoup not working -- soup = BeautifulSoup(html, 'html.parser') - Python

link管理

链接快照平台

输入网页链接，自动生成快照
标签化管理网页链接

相关文章推荐

帅呆的汉堡包 · 世界第一簡單的 Python「超」入門 - ...· 5 天前 ·

暴走的树叶 · n2adr-sdr@groups.io | ...· 5 天前 ·

睡不着的盒饭 · xml.sax.saxutils --- ...· 4 天前 ·

才高八斗的豆浆 · [python] ...· 3 天前 ·

坐怀不乱的煎饼果子 · python ...· 2 天前 ·

玉树临风的鸡蛋 · XC22xxM-Series - ...· 3 月前 ·

宽容的刺猬 · scipy.signal.lfilter ...· 10 月前 ·

讲道义的高山 · 山阴县合盛堡乡：喜看稻菽千重浪 ...· 11 月前 ·

多情的莴苣 · 收藏！免费领取35元大兴机场快轨票攻略_旅客 ...· 11 月前 ·

跑龙套的小蝌蚪 · 铠甲勇士模型猎铠刑天铠甲勇士3d模型_次元派· 1 年前 ·

Hi All,

I am following the phyton course and i got to the " 12 - urllinks - Python for Everybody Course" video.

I tried to installed and placed the folder he suggested into where i´m running the python from and it doesn´t work.

the teachers code is:

# To run this, download the BeautifulSoup zip file
# http://www.py4e.com/code3/bs4.zip
# and unzip it in the same directory as this file
import urllib.request, urllib.parse, urllib.error
from bs4 import BeautifulSoup
import ssl
# Ignore SSL certificate errors
ctx = ssl.create_default_context()
ctx.check_hostname = False
ctx.verify_mode = ssl.CERT_NONE
url = input('Enter - ')
html = urllib.request.urlopen(url, context=ctx).read()
soup = BeautifulSoup(html, 'html.parser')
# Retrieve all of the anchor tags
tags = soup('a')
for tag in tags:
    print(tag.get('href', None))
and when i run it i get the following error:
Enter - http://www.dr-chuck.com/
Traceback (most recent call last):
  File "/Users/luis/Desktop/Py/code3/urllinks.py", line 16, in <module>
    soup = BeautifulSoup(html, 'html.parser')
  File "/Users/luis/Desktop/Py/code3/bs4/__init__.py", line 215, in __init__
    self._feed()
  File "/Users/luis/Desktop/Py/code3/bs4/__init__.py", line 239, in _feed
    self.builder.feed(self.markup)
  File "/Users/luis/Desktop/Py/code3/bs4/builder/_htmlparser.py", line 164, in feed
    parser.feed(markup)
  File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/html/parser.py", line 110, in feed
    self.goahead(0)
  File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/html/parser.py", line 170, in goahead
    k = self.parse_starttag(i)
  File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/html/parser.py", line 344, in parse_starttag
    self.handle_starttag(tag, attrs)
  File "/Users/luis/Desktop/Py/code3/bs4/builder/_htmlparser.py", line 62, in handle_starttag
    self.soup.handle_starttag(name, None, None, attr_dict)
  File "/Users/luis/Desktop/Py/code3/bs4/__init__.py", line 404, in handle_starttag
    self.currentTag, self._most_recent_element)
  File "/Users/luis/Desktop/Py/code3/bs4/element.py", line 1001, in __getattr__
    return self.find(tag)
  File "/Users/luis/Desktop/Py/code3/bs4/element.py", line 1238, in find
    l = self.find_all(name, attrs, recursive, text, 1, **kwargs)
  File "/Users/luis/Desktop/Py/code3/bs4/element.py", line 1259, in find_all
    return self._find_all(name, attrs, text, limit, generator, **kwargs)
  File "/Users/luis/Desktop/Py/code3/bs4/element.py", line 516, in _find_all
    strainer = SoupStrainer(name, attrs, text, **kwargs)
  File "/Users/luis/Desktop/Py/code3/bs4/element.py", line 1560, in __init__
    self.text = self._normalize_search_value(text)
  File "/Users/luis/Desktop/Py/code3/bs4/element.py", line 1565, in _normalize_search_value
    if (isinstance(value, str) or isinstance(value, collections.Callable) or hasattr(value, 'match')
AttributeError: module 'collections' has no attribute 'Callable'
I tried already:
to install beautifulsoup using the sudo pip

to download the zip file, unzipped and placed in the same folder
do anyone had the same issue?
Thanks a lot guys!!
Note: im using macOS
              It looks like the BS4 library is trying to use a module collections.Callable and not finding it.  As of Python 3.3, the Callable base class was moved to collections.abc.Callable, which is maybe why it can’t find it. There has been some built-in support for handling this, but it looks like support has ended as of Python 3.10.
I would check your current version of Python and BS4 (something like python --version and pip freeze from the terminal). Judging by the file paths in your error, it looks like python might be using the unzipped BS4 files and not using the module that you later installed via pip. I don’t know how up-to-date that zipped module is but you could try removing those unzipped files and reinstalling from the official publication on PyPi via pip if need be.
              Welcome there,
I’ve edited your post for readability.  When you enter a code block into a forum post, please precede it with a separate line of three backticks and follow it with a separate line of three backticks to make it easier to read.
You can also use the “preformatted text” tool in the editor (</>) to add backticks around text.
Pre-formatted-text1356×380 401 KB
See this post to find the backtick on your keyboard.

Note: Backticks (`)  are not single quotes (’).
              Hi, you are right. After searching in the internet, i found the solution to be to amend the code in “bs4 > element.py”,
ie change all “collections.Callable” to " collections.abc.Callable".
it works for me thereafter.
cheers
locate your “bs4” folder (you can search in your file explorer. )
open the “bs4” folder and you can see the “element” file.
open the “element” file in an editor, eg Atom.
search and change all “collections.Callable” to " collections.abc.Callable". see attached for example.
remember to save the “element” file after you have made the changes.
Also remember to change all the element.py files if you have more than 1 copy of “bs4”
hope this helps.
              I’m having the same problem. I am using Python 3.10.5. and BeautifulSoup doesn’t work.

I have changed all the collections.Callable to collections.abc.Callable and still get following error message.
Traceback (most recent call last):

File “C:\Joe\12_tags.py”, line 6, in 

from bs4 import BeautifulSoup

File “C:\Joe\bs4_init_.py”, line 30, in 

from .builder import builder_registry, ParserRejectedMarkup

File “C:\Joe\bs4\builder_init_.py”, line 314, in 

from . import _html5lib

File “C:\Joe\bs4\builder_html5lib.py”, line 151, in 

class Element(html5lib.treebuilders._base.Node):

AttributeError: module ‘html5lib.treebuilders’ has no attribute ‘_base’. Did you mean: ‘base’?
Any help will be greatly appreciated!
# Do the imports
from urllib import request, parse, error
from bs4 import BeautifulSoup as bs 
import ssl 
# SSL
ctx = ssl.create_default_context()
ctx.check_hostname = False 
ctx.verify_mode = ssl.CERT_NONE
# Ask for user input
url = input('Enter url\n>> ')
# Check if http:// is in url if not add
if 'http://' not in url:
    url = 'http://'+url
# Get the url
html = request.urlopen(url, context=ctx).read()
# Parse with BeautifulSoup
soup = bs(html, 'html.parser')
# Get anchor tags
tags = soup('a')
# Loop through and print
for tag in tags:
    print(tag.get('href', None))
It won’t allow me to post output because of the links