添加链接
link管理
链接快照平台
  • 输入网页链接,自动生成快照
  • 标签化管理网页链接

Hi All,

I am following the phyton course and i got to the " 12 - urllinks - Python for Everybody Course" video.

I tried to installed and placed the folder he suggested into where i´m running the python from and it doesn´t work.

the teachers code is:

# To run this, download the BeautifulSoup zip file
# http://www.py4e.com/code3/bs4.zip
# and unzip it in the same directory as this file
import urllib.request, urllib.parse, urllib.error
from bs4 import BeautifulSoup
import ssl
# Ignore SSL certificate errors
ctx = ssl.create_default_context()
ctx.check_hostname = False
ctx.verify_mode = ssl.CERT_NONE
url = input('Enter - ')
html = urllib.request.urlopen(url, context=ctx).read()
soup = BeautifulSoup(html, 'html.parser')
# Retrieve all of the anchor tags
tags = soup('a')
for tag in tags:
    print(tag.get('href', None))

and when i run it i get the following error:

Enter - http://www.dr-chuck.com/
Traceback (most recent call last):
  File "/Users/luis/Desktop/Py/code3/urllinks.py", line 16, in <module>
    soup = BeautifulSoup(html, 'html.parser')
  File "/Users/luis/Desktop/Py/code3/bs4/__init__.py", line 215, in __init__
    self._feed()
  File "/Users/luis/Desktop/Py/code3/bs4/__init__.py", line 239, in _feed
    self.builder.feed(self.markup)
  File "/Users/luis/Desktop/Py/code3/bs4/builder/_htmlparser.py", line 164, in feed
    parser.feed(markup)
  File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/html/parser.py", line 110, in feed
    self.goahead(0)
  File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/html/parser.py", line 170, in goahead
    k = self.parse_starttag(i)
  File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/html/parser.py", line 344, in parse_starttag
    self.handle_starttag(tag, attrs)
  File "/Users/luis/Desktop/Py/code3/bs4/builder/_htmlparser.py", line 62, in handle_starttag
    self.soup.handle_starttag(name, None, None, attr_dict)
  File "/Users/luis/Desktop/Py/code3/bs4/__init__.py", line 404, in handle_starttag
    self.currentTag, self._most_recent_element)
  File "/Users/luis/Desktop/Py/code3/bs4/element.py", line 1001, in __getattr__
    return self.find(tag)
  File "/Users/luis/Desktop/Py/code3/bs4/element.py", line 1238, in find
    l = self.find_all(name, attrs, recursive, text, 1, **kwargs)
  File "/Users/luis/Desktop/Py/code3/bs4/element.py", line 1259, in find_all
    return self._find_all(name, attrs, text, limit, generator, **kwargs)
  File "/Users/luis/Desktop/Py/code3/bs4/element.py", line 516, in _find_all
    strainer = SoupStrainer(name, attrs, text, **kwargs)
  File "/Users/luis/Desktop/Py/code3/bs4/element.py", line 1560, in __init__
    self.text = self._normalize_search_value(text)
  File "/Users/luis/Desktop/Py/code3/bs4/element.py", line 1565, in _normalize_search_value
    if (isinstance(value, str) or isinstance(value, collections.Callable) or hasattr(value, 'match')
AttributeError: module 'collections' has no attribute 'Callable'

I tried already:

to install beautifulsoup using the sudo pip
to download the zip file, unzipped and placed in the same folder

do anyone had the same issue?

Thanks a lot guys!!

Note: im using macOS

It looks like the BS4 library is trying to use a module collections.Callable and not finding it. As of Python 3.3, the Callable base class was moved to collections.abc.Callable, which is maybe why it can’t find it. There has been some built-in support for handling this, but it looks like support has ended as of Python 3.10.

I would check your current version of Python and BS4 (something like python --version and pip freeze from the terminal). Judging by the file paths in your error, it looks like python might be using the unzipped BS4 files and not using the module that you later installed via pip. I don’t know how up-to-date that zipped module is but you could try removing those unzipped files and reinstalling from the official publication on PyPi via pip if need be.

Welcome there,

I’ve edited your post for readability. When you enter a code block into a forum post, please precede it with a separate line of three backticks and follow it with a separate line of three backticks to make it easier to read.

You can also use the “preformatted text” tool in the editor (</>) to add backticks around text.

Pre-formatted-text1356×380 401 KB

See this post to find the backtick on your keyboard.
Note: Backticks (`) are not single quotes (’).

Hi, you are right. After searching in the internet, i found the solution to be to amend the code in “bs4 > element.py”,

ie change all “collections.Callable” to " collections.abc.Callable".

it works for me thereafter.

cheers

  • locate your “bs4” folder (you can search in your file explorer. )
  • open the “bs4” folder and you can see the “element” file.
  • open the “element” file in an editor, eg Atom.
  • search and change all “collections.Callable” to " collections.abc.Callable". see attached for example.
  • remember to save the “element” file after you have made the changes.
  • Also remember to change all the element.py files if you have more than 1 copy of “bs4”

    hope this helps.

    I’m having the same problem. I am using Python 3.10.5. and BeautifulSoup doesn’t work.
    I have changed all the collections.Callable to collections.abc.Callable and still get following error message.

    Traceback (most recent call last):
    File “C:\Joe\12_tags.py”, line 6, in
    from bs4 import BeautifulSoup
    File “C:\Joe\bs4_init_.py”, line 30, in
    from .builder import builder_registry, ParserRejectedMarkup
    File “C:\Joe\bs4\builder_init_.py”, line 314, in
    from . import _html5lib
    File “C:\Joe\bs4\builder_html5lib.py”, line 151, in
    class Element(html5lib.treebuilders._base.Node):
    AttributeError: module ‘html5lib.treebuilders’ has no attribute ‘_base’. Did you mean: ‘base’?

    Any help will be greatly appreciated!

    # Do the imports
    from urllib import request, parse, error
    from bs4 import BeautifulSoup as bs 
    import ssl 
    # SSL
    ctx = ssl.create_default_context()
    ctx.check_hostname = False 
    ctx.verify_mode = ssl.CERT_NONE
    # Ask for user input
    url = input('Enter url\n>> ')
    # Check if http:// is in url if not add
    if 'http://' not in url:
        url = 'http://'+url
    # Get the url
    html = request.urlopen(url, context=ctx).read()
    # Parse with BeautifulSoup
    soup = bs(html, 'html.parser')
    # Get anchor tags
    tags = soup('a')
    # Loop through and print
    for tag in tags:
        print(tag.get('href', None))
    

    It won’t allow me to post output because of the links