NLTK's `stopwords` requires the stopwords to be first downloaded via the NLTK Data installer. This i

link管理

链接快照平台

输入网页链接，自动生成快照
标签化管理网页链接

相关文章推荐

有胆有识的砖头 · 在Anaconda环境中安装textblob ...· 6 月前 ·

买醉的梨子 · 常见问题 · ...· 6 月前 ·

烦恼的手电筒 · python 3.x - NLTK is ...· 10 月前 ·

慷慨的烤面包 · Python 删除文本数据中的错误数据 - 知乎· 11 月前 ·

年轻有为的香蕉 · Cannot import NLTK ...· 12 月前 ·

气势凌人的可乐 · 《恶魔之魂重制版》全戒指获得攻略 ...· 3 周前 ·

踢足球的洋葱 · 《斗罗大陆》：唐三成神后蓝银皇的魂环配置和魂 ...· 1 月前 ·

耍酷的西红柿 · IHttpClientFactory ...· 2 月前 ·

忧郁的薯片 · 做化工设计中，哪款三维配管软件比较好？|三维 ...· 3 月前 ·

率性的爆米花 · How to Get an Object ...· 4 月前 ·

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement . We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account NLTK's


    stopwords

requires the stopwords to be first downloaded via the NLTK Data installer. This is a one-time setup, after which you will be able to freely use


    from nltk.corpus import stopwords

. #3107 NLTK's


   stopwords

requires the stopwords to be first downloaded via the NLTK Data installer. This is a one-time setup, after which you will be able to freely use


   from nltk.corpus import stopwords

. #3107 Killpit opened this issue Jan 18, 2023 · 6 comments

          NLTK's `stopwords` requires the stopwords to be first downloaded via the NLTK Data installer. This is a one-time setup, after which you will be able to freely use `from nltk.corpus import stopwords`.
To download the stopwords, open the Python interpreter with python in your terminal of choice, and type:
>>> import nltk
>>> nltk.download("stopwords")
Afterwards, you're good to go!
Originally posted by @tomaarsen in #3063 (comment)
import pandas as pd

from sklearn.feature_extraction import text

from sklearn.metrics.pairwise import cosine_similarity
data = pd.read_csv("/Users/atatekeli/PycharmProjects/NetflixRecm/netflixData.csv")

print(data.head())

print(data.info)

print(data.isnull().sum())
data = data[["Title", "Description", "Content Type", "Genres"]]

print(data.head())
data = data.dropna()
import nltk

import re

nltk.download('stopwords')

stemmer = nltk.SnowballStemmer("english")

from nltk.corpus import stopwords

import string

stopword=set(stopwords.words('english'))
def clean(text):

text = str(text).lower()

text = re.sub('[.?]', '', text)

text = re.sub('https?://\S+|www.\S+', '', text)

text = re.sub('<.?>+', '', text)

text = re.sub('[%s]' % re.escape(string.punctuation), '', text)

text = re.sub('\n', '', text)

text = re.sub('\w*\d\w*', '', text)

text = [word for word in text.split(' ') if word not in stopword]

text=" ".join(text)

text = [stemmer.stem(word) for word in text.split(' ')]

text=" ".join(text)

return text

data["Title"] = data["Title"].apply(clean)
print(data.Title.sample(10))`
          And this is the error message
[nltk_data] Error loading stopwords: <urlopen error [SSL:

[nltk_data]     CERTIFICATE_VERIFY_FAILED] certificate verify failed:

[nltk_data]     unable to get local issuer certificate (_ssl.c:1129)>

Traceback (most recent call last):

File "/Users/atatekeli/PycharmProjects/NetflixRecm/venv/lib/python3.9/site-packages/nltk/corpus/util.py", line 84, in __load

root = nltk.data.find(f"{self.subdir}/{zip_name}")

File "/Users/atatekeli/PycharmProjects/NetflixRecm/venv/lib/python3.9/site-packages/nltk/data.py", line 583, in find

raise LookupError(resource_not_found)

LookupError:
Resource stopwords not found.

Please use the NLTK Downloader to obtain the resource:
import nltk

nltk.download('stopwords')
For more information see: https://www.nltk.org/data.html
Attempted to load corpora/stopwords.zip/stopwords/
Searched in:

- '/Users/atatekeli/nltk_data'

- '/Users/atatekeli/PycharmProjects/NetflixRecm/venv/nltk_data'

- '/Users/atatekeli/PycharmProjects/NetflixRecm/venv/share/nltk_data'

- '/Users/atatekeli/PycharmProjects/NetflixRecm/venv/lib/nltk_data'

- '/usr/share/nltk_data'

- '/usr/local/share/nltk_data'

- '/usr/lib/nltk_data'

- '/usr/local/lib/nltk_data'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):

File "/Users/atatekeli/PycharmProjects/NetflixRecm/netflix_recm.py", line 22, in 

stopword=set(stopwords.words('english'))

File "/Users/atatekeli/PycharmProjects/NetflixRecm/venv/lib/python3.9/site-packages/nltk/corpus/util.py", line 121, in getattr

self.__load()

File "/Users/atatekeli/PycharmProjects/NetflixRecm/venv/lib/python3.9/site-packages/nltk/corpus/util.py", line 86, in __load

raise e

File "/Users/atatekeli/PycharmProjects/NetflixRecm/venv/lib/python3.9/site-packages/nltk/corpus/util.py", line 81, in __load

root = nltk.data.find(f"{self.subdir}/{self.__name}")

File "/Users/atatekeli/PycharmProjects/NetflixRecm/venv/lib/python3.9/site-packages/nltk/data.py", line 583, in find

raise LookupError(resource_not_found)

LookupError:
Resource stopwords not found.

Please use the NLTK Downloader to obtain the resource:
import nltk

nltk.download('stopwords')
For more information see: https://www.nltk.org/data.html
Attempted to load corpora/stopwords
Searched in:

- '/Users/atatekeli/nltk_data'

- '/Users/atatekeli/PycharmProjects/NetflixRecm/venv/nltk_data'

- '/Users/atatekeli/PycharmProjects/NetflixRecm/venv/share/nltk_data'

- '/Users/atatekeli/PycharmProjects/NetflixRecm/venv/lib/nltk_data'

- '/usr/share/nltk_data'

- '/usr/local/share/nltk_data'

- '/usr/lib/nltk_data'

- '/usr/local/lib/nltk_data'
Process finished with exit code 1
          This seems to be the problem:
[nltk_data] Error loading stopwords: <urlopen error [SSL:
[nltk_data] CERTIFICATE_VERIFY_FAILED] certificate verify failed:
[nltk_data] unable to get local issuer certificate (_ssl.c:1129)>
I haven't seen this before. Perhaps the help from this comment (or other comments in the thread) will resolve your issue:

https://stackoverflow.com/a/45018725/17936326. Presumably you are using Python 3.6? There are certainly some useful tips in that thread I think.