添加链接
link管理
链接快照平台
  • 输入网页链接,自动生成快照
  • 标签化管理网页链接

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement . We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account NLTK's stopwords requires the stopwords to be first downloaded via the NLTK Data installer. This is a one-time setup, after which you will be able to freely use from nltk.corpus import stopwords . #3107 NLTK's stopwords requires the stopwords to be first downloaded via the NLTK Data installer. This is a one-time setup, after which you will be able to freely use from nltk.corpus import stopwords . #3107 Killpit opened this issue Jan 18, 2023 · 6 comments
          NLTK's `stopwords` requires the stopwords to be first downloaded via the NLTK Data installer. This is a one-time setup, after which you will be able to freely use `from nltk.corpus import stopwords`.

To download the stopwords, open the Python interpreter with python in your terminal of choice, and type:

>>> import nltk
>>> nltk.download("stopwords")

Afterwards, you're good to go!

Originally posted by @tomaarsen in #3063 (comment)

import pandas as pd
from sklearn.feature_extraction import text
from sklearn.metrics.pairwise import cosine_similarity

data = pd.read_csv("/Users/atatekeli/PycharmProjects/NetflixRecm/netflixData.csv")
print(data.head())
print(data.info)
print(data.isnull().sum())

data = data[["Title", "Description", "Content Type", "Genres"]]
print(data.head())

data = data.dropna()

import nltk
import re
nltk.download('stopwords')
stemmer = nltk.SnowballStemmer("english")
from nltk.corpus import stopwords
import string
stopword=set(stopwords.words('english'))

def clean(text):
text = str(text).lower()
text = re.sub('[.?]', '', text)
text = re.sub('https?://\S+|www.\S+', '', text)
text = re.sub('<.
?>+', '', text)
text = re.sub('[%s]' % re.escape(string.punctuation), '', text)
text = re.sub('\n', '', text)
text = re.sub('\w*\d\w*', '', text)
text = [word for word in text.split(' ') if word not in stopword]
text=" ".join(text)
text = [stemmer.stem(word) for word in text.split(' ')]
text=" ".join(text)
return text
data["Title"] = data["Title"].apply(clean)

print(data.Title.sample(10))`

And this is the error message

[nltk_data] Error loading stopwords: <urlopen error [SSL:
[nltk_data] CERTIFICATE_VERIFY_FAILED] certificate verify failed:
[nltk_data] unable to get local issuer certificate (_ssl.c:1129)>
Traceback (most recent call last):
File "/Users/atatekeli/PycharmProjects/NetflixRecm/venv/lib/python3.9/site-packages/nltk/corpus/util.py", line 84, in __load
root = nltk.data.find(f"{self.subdir}/{zip_name}")
File "/Users/atatekeli/PycharmProjects/NetflixRecm/venv/lib/python3.9/site-packages/nltk/data.py", line 583, in find
raise LookupError(resource_not_found)
LookupError:

Resource stopwords not found.
Please use the NLTK Downloader to obtain the resource:

import nltk
nltk.download('stopwords')

For more information see: https://www.nltk.org/data.html

Attempted to load corpora/stopwords.zip/stopwords/

Searched in:
- '/Users/atatekeli/nltk_data'
- '/Users/atatekeli/PycharmProjects/NetflixRecm/venv/nltk_data'
- '/Users/atatekeli/PycharmProjects/NetflixRecm/venv/share/nltk_data'
- '/Users/atatekeli/PycharmProjects/NetflixRecm/venv/lib/nltk_data'
- '/usr/share/nltk_data'
- '/usr/local/share/nltk_data'
- '/usr/lib/nltk_data'
- '/usr/local/lib/nltk_data'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/Users/atatekeli/PycharmProjects/NetflixRecm/netflix_recm.py", line 22, in
stopword=set(stopwords.words('english'))
File "/Users/atatekeli/PycharmProjects/NetflixRecm/venv/lib/python3.9/site-packages/nltk/corpus/util.py", line 121, in getattr
self.__load()
File "/Users/atatekeli/PycharmProjects/NetflixRecm/venv/lib/python3.9/site-packages/nltk/corpus/util.py", line 86, in __load
raise e
File "/Users/atatekeli/PycharmProjects/NetflixRecm/venv/lib/python3.9/site-packages/nltk/corpus/util.py", line 81, in __load
root = nltk.data.find(f"{self.subdir}/{self.__name}")
File "/Users/atatekeli/PycharmProjects/NetflixRecm/venv/lib/python3.9/site-packages/nltk/data.py", line 583, in find
raise LookupError(resource_not_found)
LookupError:

Resource stopwords not found.
Please use the NLTK Downloader to obtain the resource:

import nltk
nltk.download('stopwords')

For more information see: https://www.nltk.org/data.html

Attempted to load corpora/stopwords

Searched in:
- '/Users/atatekeli/nltk_data'
- '/Users/atatekeli/PycharmProjects/NetflixRecm/venv/nltk_data'
- '/Users/atatekeli/PycharmProjects/NetflixRecm/venv/share/nltk_data'
- '/Users/atatekeli/PycharmProjects/NetflixRecm/venv/lib/nltk_data'
- '/usr/share/nltk_data'
- '/usr/local/share/nltk_data'
- '/usr/lib/nltk_data'
- '/usr/local/lib/nltk_data'

Process finished with exit code 1

This seems to be the problem:

[nltk_data] Error loading stopwords: <urlopen error [SSL:
[nltk_data] CERTIFICATE_VERIFY_FAILED] certificate verify failed:
[nltk_data] unable to get local issuer certificate (_ssl.c:1129)>

I haven't seen this before. Perhaps the help from this comment (or other comments in the thread) will resolve your issue:
https://stackoverflow.com/a/45018725/17936326. Presumably you are using Python 3.6? There are certainly some useful tips in that thread I think.