Unable to download nltk stopwords due to permission error - Community Cloud

link管理
链接快照平台
输入网页链接，自动生成快照
标签化管理网页链接
相关文章推荐
深情的单杠 · Resource stopwords ...· 2 周前 ·
帅气的玉米 · Unable to download ...· 2 周前 ·
性感的灯泡 · Error while creating ...· 2 周前 ·
宽容的莲藕 · Implementing N-Grams ...· 2 周前 ·
爱健身的跑步鞋 · Paperless Error NLTK ...· 2 周前 ·
精明的数据线 · 莆田至长乐机场城际铁路（F2线）及宁德至长乐 ...· 3 月前 ·
干练的皮带 · 小熊油耗加油站· 4 月前 ·
闷骚的四季豆 · 十余部科幻题材剧“潜力股”，你看好哪一部？_ ...· 10 月前 ·
跑龙套的小蝌蚪 · 你好对方辩友2，鲁照华颜值爆表演技吸睛，泡芙 ...· 1 年前 ·
没有腹肌的麦片 · mysql查询逗号,分隔的多个id连表查询_ ...· 2 年前 ·
Hey there folks. First time user of Streamlit and I’m loving it. Also first time trying to deploy to Streamlit Cloud. My local seems to be working fine since it can download the nltk dataset for stopwords but I don’t think it has permission to do so in the vm.
Here’s the error:
[05:54:32] 🐍 Python dependencies were installed from /mount/src/streamlit_llamadocs_chat/requirements.txt using pip.
Check if streamlit is installed
Streamlit is already installed
[05:54:34] 📦 Processed dependencies!
[nltk_data] Downloading package stopwords to
[nltk_data]     /home/appuser/nltk_data...[2024-02-15 05:54:43.125453] 
[nltk_data]   Unzipping corpora/stopwords.zip.
[nltk_data] Downloading package stopwords to
[nltk_data]     /home/adminuser/venv/lib/python3.10/site-
[nltk_data]     packages/llama_index/core/_static/nltk_cache...
2024-02-15 05:54:43.763 Uncaught app exception
Traceback (most recent call last):
  File "/home/adminuser/venv/lib/python3.10/site-packages/llama_index/core/utils.py", line 60, in __init__
    nltk.data.find("corpora/stopwords", paths=[self._nltk_data_dir])
  File "/home/adminuser/venv/lib/python3.10/site-packages/nltk/data.py", line 583, in find
    raise LookupError(resource_not_found)
LookupError: 
**********************************************************************
  Resource stopwords not found.
  Please use the NLTK Downloader to obtain the resource:
  >>> import nltk
  >>> nltk.download('stopwords')
  For more information see: https://www.nltk.org/data.html
  Attempted to load corpora/stopwords
  Searched in:
    - '/home/adminuser/venv/lib/python3.10/site-packages/llama_index/core/_static/nltk_cache'
**********************************************************************
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
  File "/home/adminuser/venv/lib/python3.10/site-packages/streamlit/runtime/scriptrunner/script_runner.py", line 535, in _run_script
    exec(code, module.__dict__)
  File "/mount/src/streamlit_llamadocs_chat/main.py", line 8, in <module>
    from llama_index.core import VectorStoreIndex
  File "/home/adminuser/venv/lib/python3.10/site-packages/llama_index/core/__init__.py", line 8, in <module>
    from llama_index.core.base.response.schema import Response
  File "/home/adminuser/venv/lib/python3.10/site-packages/llama_index/core/base/response/schema.py", line 7, in <module>
    from llama_index.core.schema import NodeWithScore
  File "/home/adminuser/venv/lib/python3.10/site-packages/llama_index/core/schema.py", line 14, in <module>
    from llama_index.core.utils import SAMPLE_TEXT, truncate_text
  File "/home/adminuser/venv/lib/python3.10/site-packages/llama_index/core/utils.py", line 89, in <module>
    globals_helper = GlobalsHelper()
  File "/home/adminuser/venv/lib/python3.10/site-packages/llama_index/core/utils.py", line 62, in __init__
    nltk.download("stopwords", download_dir=self._nltk_data_dir)
  File "/home/adminuser/venv/lib/python3.10/site-packages/nltk/downloader.py", line 777, in download
    for msg in self.incr_download(info_or_id, download_dir, force):
  File "/home/adminuser/venv/lib/python3.10/site-packages/nltk/downloader.py", line 642, in incr_download
    yield from self._download_package(info, download_dir, force)
  File "/home/adminuser/venv/lib/python3.10/site-packages/nltk/downloader.py", line 701, in _download_package
    os.makedirs(os.path.join(download_dir, info.subdir))
  File "/usr/local/lib/python3.10/os.py", line 225, in makedirs
    mkdir(name, mode)
PermissionError: [Errno 13] Permission denied: '/home/adminuser/venv/lib/python3.10/site-packages/llama_index/core/_static/nltk_cache/corpora'
[05:54:43] ❗️
Here’s my app: https://llamadocschat.streamlit.app/

Here’s the source code: streamlit_llamadocs_chat/main.py at main · amnotme/streamlit_llamadocs_chat · GitHub

Python: 3.10

I’m using Llama_Index V0.10.3 which requires Nltk 3.8.1 so I can’t downgrade.
Any help is welcome. 
              I think the application is running perfectly. It’s better to mention the package in single quotes like follows:-
nltk.download('stopwords')
Happy Streamlit-ing 
              @Guna_Sekhar_Venkata Unfortunately It didn’t work. I redeployed doing the suggestions and I still get the permission error.
PermissionError: [Errno 13] Permission denied: ‘/home/adminuser/venv/lib/python3.9/site-packages/llama_index/core/_static/nltk_cache/corpora’
app: https://llamachatdocs.streamlit.app/

repo: streamlit_llamadocs_chat/main.py at main · amnotme/streamlit_llamadocs_chat · GitHub

python version: 3.9
any other suggestions 
11:29:12] 🐍 Python dependencies were installed from /mount/src/streamlit_llamadocs_chat/requirements.txt using pip.
Check if streamlit is installed
Streamlit is already installed
[11:29:13] 📦 Processed dependencies!
[nltk_data]   Unzipping corpora/stopwords.zip.
[nltk_data] Downloading package stopwords to
[nltk_data]     /home/adminuser/venv/lib/python3.9/site-
[nltk_data]     packages/llama_index/core/_static/nltk_cache...
2024-02-15 11:29:24.810 Uncaught app exception
Traceback (most recent call last):
  File "/home/adminuser/venv/lib/python3.9/site-packages/llama_index/core/utils.py", line 60, in __init__
    nltk.data.find("corpora/stopwords", paths=[self._nltk_data_dir])
  File "/home/adminuser/venv/lib/python3.9/site-packages/nltk/data.py", line 583, in find
    raise LookupError(resource_not_found)
LookupError: 
**********************************************************************
  Resource stopwords not found.
  Please use the NLTK Downloader to obtain the resource:
  >>> import nltk
  >>> nltk.download('stopwords')
  For more information see: https://www.nltk.org/data.html
  Attempted to load corpora/stopwords
  Searched in:
    - '/home/adminuser/venv/lib/python3.9/site-packages/llama_index/core/_static/nltk_cache'
**********************************************************************
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
  File "/home/adminuser/venv/lib/python3.9/site-packages/streamlit/runtime/scriptrunner/script_runner.py", line 535, in _run_script
    exec(code, module.__dict__)
  File "/mount/src/streamlit_llamadocs_chat/main.py", line 8, in <module>
    from llama_index.core import VectorStoreIndex
  File "/home/adminuser/venv/lib/python3.9/site-packages/llama_index/core/__init__.py", line 8, in <module>
    from llama_index.core.base.response.schema import Response
  File "/home/adminuser/venv/lib/python3.9/site-packages/llama_index/core/base/response/schema.py", line 7, in <module>
    from llama_index.core.schema import NodeWithScore
  File "/home/adminuser/venv/lib/python3.9/site-packages/llama_index/core/schema.py", line 14, in <module>
    from llama_index.core.utils import SAMPLE_TEXT, truncate_text
  File "/home/adminuser/venv/lib/python3.9/site-packages/llama_index/core/utils.py", line 89, in <module>
    globals_helper = GlobalsHelper()
  File "/home/adminuser/venv/lib/python3.9/site-packages/llama_index/core/utils.py", line 62, in __init__
    nltk.download("stopwords", download_dir=self._nltk_data_dir)
  File "/home/adminuser/venv/lib/python3.9/site-packages/nltk/downloader.py", line 777, in download
    for msg in self.incr_download(info_or_id, download_dir, force):
  File "/home/adminuser/venv/lib/python3.9/site-packages/nltk/downloader.py", line 642, in incr_download
    yield from self._download_package(info, download_dir, force)
  File "/home/adminuser/venv/lib/python3.9/site-packages/nltk/downloader.py", line 701, in _download_package
    os.makedirs(os.path.join(download_dir, info.subdir))
  File "/usr/local/lib/python3.9/os.py", line 225, in makedirs
    mkdir(name, mode)
PermissionError: [Errno 13] Permission denied: '/home/adminuser/venv/lib/python3.9/site-packages/llama_index/core/_static/nltk_cache/corpora'
[11:29:24] ❗️ 
              please fix this as this is not userland control but done lazily by the lib upon first run, so we can’t do anything about this. And streamlit should not restrict anything in the app from doing file operations imo. That is severely limiting the UX perspective.
are you part of the team @Guna_Sekhar_Venkata ?
Why not create proper jails and chown it to the runner of the code ?
              I see that the problem came down to a module attempting to set a download path

from an sdk module
the llamaIndex.core.utils.GlobalHelper wants to set a the download path here if there is no NLTK_DATA set as a global variable
class GlobalsHelper:
    """Helper to retrieve globals.
    Helpful for global caching of certain variables that can be expensive to load.
    (e.g. tokenization)
    _stopwords: Optional[List[str]] = None
    _nltk_data_dir: Optional[str] = None
    def __init__(self) -> None:
        """Initialize NLTK stopwords and punkt."""
        import nltk
        self._nltk_data_dir = os.environ.get(
            "NLTK_DATA",
            os.path.join(
                os.path.dirname(os.path.abspath(__file__)),
                "_static/nltk_cache",
you can set the global variable programmatically or simply add it to the secrets.toml via the manage app menu… i set it through the latter.
then you’ll need to point all of your downloads there.
import os
import nltk
nltk_data_dir = "./resources/nltk_data_dir/"
if not os.path.exists(nltk_data_dir):
    os.makedirs(nltk_data_dir, exist_ok=True)
nltk.data.path.clear()
nltk.data.path.append(nltk_data_dir)
nltk.download("stopwords", download_dir=nltk_data_dir)
nltk.download('punkt', download_dir=nltk_data_dir)
this has solved THIS issue but once the app is up and running when one attempts to cache resource… well… you need to write to the system and this… breaks as well
File "/home/adminuser/venv/lib/python3.10/site-packages/streamlit/runtime/scriptrunner/script_runner.py", line 535, in _run_script
    exec(code, module.__dict__)
File "/mount/src/streamlit_llamadocs_chat/main.py", line 310, in <module>
    main_chat_functionality()
File "/mount/src/streamlit_llamadocs_chat/main.py", line 286, in main_chat_functionality
    index = get_index(api_key=st.session_state.openai_key)
File "/home/adminuser/venv/lib/python3.10/site-packages/streamlit/runtime/caching/cache_utils.py", line 212, in wrapper
    return cached_func(*args, **kwargs)
File "/home/adminuser/venv/lib/python3.10/site-packages/streamlit/runtime/caching/cache_utils.py", line 241, in __call__
    return self._get_or_create_cached_value(args, kwargs)
File "/home/adminuser/venv/lib/python3.10/site-packages/streamlit/runtime/caching/cache_utils.py", line 268, in _get_or_create_cached_value
    return self._handle_cache_miss(cache, value_key, func_args, func_kwargs)
File "/home/adminuser/venv/lib/python3.10/site-packages/streamlit/runtime/caching/cache_utils.py", line 324, in _handle_cache_miss
    computed_value = self._info.func(*func_args, **func_kwargs)
File "/mount/src/streamlit_llamadocs_chat/main.py", line 104, in get_index
    return VectorStoreIndex.from_vector_store(vector_store=vector_store)
File "/home/adminuser/venv/lib/python3.10/site-packages/llama_index/core/indices/vector_store/base.py", line 103, in from_vector_store
    return cls(
File "/home/adminuser/venv/lib/python3.10/site-packages/llama_index/core/indices/vector_store/base.py", line 74, in __init__
    super().__init__(
File "/home/adminuser/venv/lib/python3.10/site-packages/llama_index/core/indices/base.py", line 99, in __init__
    or transformations_from_settings_or_context(Settings, service_context)
File "/home/adminuser/venv/lib/python3.10/site-packages/llama_index/core/settings.py", line 316, in transformations_from_settings_or_context
    return settings.transformations
File "/home/adminuser/venv/lib/python3.10/site-packages/llama_index/core/settings.py", line 243, in transformations
    self._transformations = [self.node_parser]
File "/home/adminuser/venv/lib/python3.10/site-packages/llama_index/core/settings.py", line 144, in node_parser
    self._node_parser = SentenceSplitter()
File "/home/adminuser/venv/lib/python3.10/site-packages/llama_index/core/node_parser/text/sentence.py", line 91, in __init__
    self._tokenizer = tokenizer or get_tokenizer()
File "/home/adminuser/venv/lib/python3.10/site-packages/llama_index/core/utils.py", line 129, in get_tokenizer
    enc = tiktoken.encoding_for_model("gpt-3.5-turbo")
File "/home/adminuser/venv/lib/python3.10/site-packages/tiktoken/model.py", line 101, in encoding_for_model
    return get_encoding(encoding_name_for_model(model_name))
File "/home/adminuser/venv/lib/python3.10/site-packages/tiktoken/registry.py", line 73, in get_encoding
    enc = Encoding(**constructor())
File "/home/adminuser/venv/lib/python3.10/site-packages/tiktoken_ext/openai_public.py", line 72, in cl100k_base
    mergeable_ranks = load_tiktoken_bpe(
File "/home/adminuser/venv/lib/python3.10/site-packages/tiktoken/load.py", line 147, in load_tiktoken_bpe
    contents = read_file_cached(tiktoken_bpe_file, expected_hash)
File "/home/adminuser/venv/lib/python3.10/site-packages/tiktoken/load.py", line 74, in read_file_cached
    with open(tmp_filename, "wb") as f:
Sooo… still looking but at least nltk is up and running
              Last part for me was to set a caching directory for tiktoken module here.

Fortunately there was a global variable that I could also set up.
"TIKTOKEN_CACHE_DIR"
App is up and running
              Sure thing.  The secrets.toml file is populated via the advanced features or settings of the app. You can do this JUST before deploying it or after deploying it.
you’ll see the Settings gear icon once you click on the three dots.
image1718×1734 210 KB
you will then see the Secrets menu icon and it will display

an editor for you to add your secrets. THESE are effectively your environment variables to set a runtime.
image1812×1146 112 KB
I don’t use the secrets.toml file locally as I use a .env file with dotenv module to load them.
Add your secrets as follows
ONE_API="thisIsTheSecret"
ANOTHER_ENV_VAR="thisIsTheOtherSecret"
              Sure. So that variable should be the directory where you want your nltk data
Please make sure that this folder is writeable. So I just added mine directly where the app lives under resources.
NLTK_DATA="./resources/nltk_data_dir/"
              Thanks! For some reason, after implementing these changes and running my app for a little bit I’m now getting this error…

Screen Shot 2024-02-15 at 4.57.02 PM786×644 21.3 KB
llama-index

Permission denied: ‘/home/adminuser/venv/lib/python3.9/site-packages/llama_index/core/_static/nltk_cache/corpora’
LookupError:
Resource stopwords not found.
Please use the NLTK Downloader to obtain the resource:
import nltk
For more information see: NLTK :: Installing NLTK Data
Attempted to load corpora/stopwords
Searched in:
- '/home/adminuser/venv/lib/python3.9/site-packages/llama_index/core/_static/nltk_cache'