NLTK's `stopwords` requires the stopwords to be first downloaded via the NLTK Data installer. This is a one-time setup, after which you will be able to freely use `from nltk.corpus import stopwords`.
To download the stopwords, open the Python interpreter with python
in your terminal of choice, and type:
>>> import nltk
>>> nltk.download("stopwords")
Afterwards, you're good to go!
Originally posted by @tomaarsen in #3063 (comment)
import pandas as pd
from sklearn.feature_extraction import text
from sklearn.metrics.pairwise import cosine_similarity
data = pd.read_csv("/Users/atatekeli/PycharmProjects/NetflixRecm/netflixData.csv")
print(data.head())
print(data.info)
print(data.isnull().sum())
data = data[["Title", "Description", "Content Type", "Genres"]]
print(data.head())
data = data.dropna()
import nltk
import re
nltk.download('stopwords')
stemmer = nltk.SnowballStemmer("english")
from nltk.corpus import stopwords
import string
stopword=set(stopwords.words('english'))
def clean(text):
text = str(text).lower()
text = re.sub('[.?]', '', text)
text = re.sub('https?://\S+|www.\S+', '', text)
text = re.sub('<.?>+', '', text)
text = re.sub('[%s]' % re.escape(string.punctuation), '', text)
text = re.sub('\n', '', text)
text = re.sub('\w*\d\w*', '', text)
text = [word for word in text.split(' ') if word not in stopword]
text=" ".join(text)
text = [stemmer.stem(word) for word in text.split(' ')]
text=" ".join(text)
return text
data["Title"] = data["Title"].apply(clean)
print(data.Title.sample(10))`
And this is the error message
[nltk_data] Error loading stopwords: <urlopen error [SSL:
[nltk_data] CERTIFICATE_VERIFY_FAILED] certificate verify failed:
[nltk_data] unable to get local issuer certificate (_ssl.c:1129)>
Traceback (most recent call last):
File "/Users/atatekeli/PycharmProjects/NetflixRecm/venv/lib/python3.9/site-packages/nltk/corpus/util.py", line 84, in __load
root = nltk.data.find(f"{self.subdir}/{zip_name}")
File "/Users/atatekeli/PycharmProjects/NetflixRecm/venv/lib/python3.9/site-packages/nltk/data.py", line 583, in find
raise LookupError(resource_not_found)
LookupError:
Resource stopwords not found.
Please use the NLTK Downloader to obtain the resource:
import nltk
nltk.download('stopwords')
For more information see: https://www.nltk.org/data.html
Attempted to load corpora/stopwords.zip/stopwords/
Searched in:
- '/Users/atatekeli/nltk_data'
- '/Users/atatekeli/PycharmProjects/NetflixRecm/venv/nltk_data'
- '/Users/atatekeli/PycharmProjects/NetflixRecm/venv/share/nltk_data'
- '/Users/atatekeli/PycharmProjects/NetflixRecm/venv/lib/nltk_data'
- '/usr/share/nltk_data'
- '/usr/local/share/nltk_data'
- '/usr/lib/nltk_data'
- '/usr/local/lib/nltk_data'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/Users/atatekeli/PycharmProjects/NetflixRecm/netflix_recm.py", line 22, in
stopword=set(stopwords.words('english'))
File "/Users/atatekeli/PycharmProjects/NetflixRecm/venv/lib/python3.9/site-packages/nltk/corpus/util.py", line 121, in getattr
self.__load()
File "/Users/atatekeli/PycharmProjects/NetflixRecm/venv/lib/python3.9/site-packages/nltk/corpus/util.py", line 86, in __load
raise e
File "/Users/atatekeli/PycharmProjects/NetflixRecm/venv/lib/python3.9/site-packages/nltk/corpus/util.py", line 81, in __load
root = nltk.data.find(f"{self.subdir}/{self.__name}")
File "/Users/atatekeli/PycharmProjects/NetflixRecm/venv/lib/python3.9/site-packages/nltk/data.py", line 583, in find
raise LookupError(resource_not_found)
LookupError:
Resource stopwords not found.
Please use the NLTK Downloader to obtain the resource:
import nltk
nltk.download('stopwords')
For more information see: https://www.nltk.org/data.html
Attempted to load corpora/stopwords
Searched in:
- '/Users/atatekeli/nltk_data'
- '/Users/atatekeli/PycharmProjects/NetflixRecm/venv/nltk_data'
- '/Users/atatekeli/PycharmProjects/NetflixRecm/venv/share/nltk_data'
- '/Users/atatekeli/PycharmProjects/NetflixRecm/venv/lib/nltk_data'
- '/usr/share/nltk_data'
- '/usr/local/share/nltk_data'
- '/usr/lib/nltk_data'
- '/usr/local/lib/nltk_data'
Process finished with exit code 1
This seems to be the problem:
[nltk_data] Error loading stopwords: <urlopen error [SSL:
[nltk_data] CERTIFICATE_VERIFY_FAILED] certificate verify failed:
[nltk_data] unable to get local issuer certificate (_ssl.c:1129)>
I haven't seen this before. Perhaps the help from this comment (or other comments in the thread) will resolve your issue:
https://stackoverflow.com/a/45018725/17936326. Presumably you are using Python 3.6? There are certainly some useful tips in that thread I think.