I am following the phyton course and i got to the " 12 - urllinks - Python for Everybody Course" video.
I tried to installed and placed the folder he suggested into where i´m running the python from and it doesn´t work.
the teachers code is:
# To run this, download the BeautifulSoup zip file
# http://www.py4e.com/code3/bs4.zip
# and unzip it in the same directory as this file
import urllib.request, urllib.parse, urllib.error
from bs4 import BeautifulSoup
import ssl
# Ignore SSL certificate errors
ctx = ssl.create_default_context()
ctx.check_hostname = False
ctx.verify_mode = ssl.CERT_NONE
url = input('Enter - ')
html = urllib.request.urlopen(url, context=ctx).read()
soup = BeautifulSoup(html, 'html.parser')
# Retrieve all of the anchor tags
tags = soup('a')
for tag in tags:
print(tag.get('href', None))
and when i run it i get the following error:
Enter - http://www.dr-chuck.com/
Traceback (most recent call last):
File "/Users/luis/Desktop/Py/code3/urllinks.py", line 16, in <module>
soup = BeautifulSoup(html, 'html.parser')
File "/Users/luis/Desktop/Py/code3/bs4/__init__.py", line 215, in __init__
self._feed()
File "/Users/luis/Desktop/Py/code3/bs4/__init__.py", line 239, in _feed
self.builder.feed(self.markup)
File "/Users/luis/Desktop/Py/code3/bs4/builder/_htmlparser.py", line 164, in feed
parser.feed(markup)
File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/html/parser.py", line 110, in feed
self.goahead(0)
File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/html/parser.py", line 170, in goahead
k = self.parse_starttag(i)
File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/html/parser.py", line 344, in parse_starttag
self.handle_starttag(tag, attrs)
File "/Users/luis/Desktop/Py/code3/bs4/builder/_htmlparser.py", line 62, in handle_starttag
self.soup.handle_starttag(name, None, None, attr_dict)
File "/Users/luis/Desktop/Py/code3/bs4/__init__.py", line 404, in handle_starttag
self.currentTag, self._most_recent_element)
File "/Users/luis/Desktop/Py/code3/bs4/element.py", line 1001, in __getattr__
return self.find(tag)
File "/Users/luis/Desktop/Py/code3/bs4/element.py", line 1238, in find
l = self.find_all(name, attrs, recursive, text, 1, **kwargs)
File "/Users/luis/Desktop/Py/code3/bs4/element.py", line 1259, in find_all
return self._find_all(name, attrs, text, limit, generator, **kwargs)
File "/Users/luis/Desktop/Py/code3/bs4/element.py", line 516, in _find_all
strainer = SoupStrainer(name, attrs, text, **kwargs)
File "/Users/luis/Desktop/Py/code3/bs4/element.py", line 1560, in __init__
self.text = self._normalize_search_value(text)
File "/Users/luis/Desktop/Py/code3/bs4/element.py", line 1565, in _normalize_search_value
if (isinstance(value, str) or isinstance(value, collections.Callable) or hasattr(value, 'match')
AttributeError: module 'collections' has no attribute 'Callable'
I tried already:
to install beautifulsoup using the sudo pip
to download the zip file, unzipped and placed in the same folder
do anyone had the same issue?
Thanks a lot guys!!
Note: im using macOS
It looks like the BS4 library is trying to use a module collections.Callable and not finding it. As of Python 3.3, the Callable base class was moved to collections.abc.Callable, which is maybe why it can’t find it. There has been some built-in support for handling this, but it looks like support has ended as of Python 3.10.
I would check your current version of Python and BS4 (something like python --version and pip freeze from the terminal). Judging by the file paths in your error, it looks like python might be using the unzipped BS4 files and not using the module that you later installed via pip. I don’t know how up-to-date that zipped module is but you could try removing those unzipped files and reinstalling from the official publication on PyPi via pip if need be.
Welcome there,
I’ve edited your post for readability. When you enter a code block into a forum post, please precede it with a separate line of three backticks and follow it with a separate line of three backticks to make it easier to read.
You can also use the “preformatted text” tool in the editor (</>) to add backticks around text.
Pre-formatted-text1356×380 401 KB
See this post to find the backtick on your keyboard. Note: Backticks (`) are not single quotes (’).
Hi, you are right. After searching in the internet, i found the solution to be to amend the code in “bs4 > element.py”,
ie change all “collections.Callable” to " collections.abc.Callable".
it works for me thereafter.
cheers
locate your “bs4” folder (you can search in your file explorer. )
open the “bs4” folder and you can see the “element” file.
open the “element” file in an editor, eg Atom.
search and change all “collections.Callable” to " collections.abc.Callable". see attached for example.
remember to save the “element” file after you have made the changes.
Also remember to change all the element.py files if you have more than 1 copy of “bs4”
hope this helps.
I’m having the same problem. I am using Python 3.10.5. and BeautifulSoup doesn’t work.
I have changed all the collections.Callable to collections.abc.Callable and still get following error message.
Traceback (most recent call last):
File “C:\Joe\12_tags.py”, line 6, in
from bs4 import BeautifulSoup
File “C:\Joe\bs4_init_.py”, line 30, in
from .builder import builder_registry, ParserRejectedMarkup
File “C:\Joe\bs4\builder_init_.py”, line 314, in
from . import _html5lib
File “C:\Joe\bs4\builder_html5lib.py”, line 151, in
class Element(html5lib.treebuilders._base.Node):
AttributeError: module ‘html5lib.treebuilders’ has no attribute ‘_base’. Did you mean: ‘base’?
Any help will be greatly appreciated!
# Do the imports
from urllib import request, parse, error
from bs4 import BeautifulSoup as bs
import ssl
# SSL
ctx = ssl.create_default_context()
ctx.check_hostname = False
ctx.verify_mode = ssl.CERT_NONE
# Ask for user input
url = input('Enter url\n>> ')
# Check if http:// is in url if not add
if 'http://' not in url:
url = 'http://'+url
# Get the url
html = request.urlopen(url, context=ctx).read()
# Parse with BeautifulSoup
soup = bs(html, 'html.parser')
# Get anchor tags
tags = soup('a')
# Loop through and print
for tag in tags:
print(tag.get('href', None))
It won’t allow me to post output because of the links