Hey there folks. First time user of Streamlit and I’m loving it. Also first time trying to deploy to Streamlit Cloud. My local seems to be working fine since it can download the nltk dataset for stopwords but I don’t think it has permission to do so in the vm.
Here’s the error:
[05:54:32] 🐍 Python dependencies were installed from /mount/src/streamlit_llamadocs_chat/requirements.txt using pip.
Check if streamlit is installed
Streamlit is already installed
[05:54:34] 📦 Processed dependencies!
[nltk_data] Downloading package stopwords to
[nltk_data] /home/appuser/nltk_data...[2024-02-15 05:54:43.125453]
[nltk_data] Unzipping corpora/stopwords.zip.
[nltk_data] Downloading package stopwords to
[nltk_data] /home/adminuser/venv/lib/python3.10/site-
[nltk_data] packages/llama_index/core/_static/nltk_cache...
2024-02-15 05:54:43.763 Uncaught app exception
Traceback (most recent call last):
File "/home/adminuser/venv/lib/python3.10/site-packages/llama_index/core/utils.py", line 60, in __init__
nltk.data.find("corpora/stopwords", paths=[self._nltk_data_dir])
File "/home/adminuser/venv/lib/python3.10/site-packages/nltk/data.py", line 583, in find
raise LookupError(resource_not_found)
LookupError:
**********************************************************************
Resource stopwords not found.
Please use the NLTK Downloader to obtain the resource:
>>> import nltk
>>> nltk.download('stopwords')
For more information see: https://www.nltk.org/data.html
Attempted to load corpora/stopwords
Searched in:
- '/home/adminuser/venv/lib/python3.10/site-packages/llama_index/core/_static/nltk_cache'
**********************************************************************
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/adminuser/venv/lib/python3.10/site-packages/streamlit/runtime/scriptrunner/script_runner.py", line 535, in _run_script
exec(code, module.__dict__)
File "/mount/src/streamlit_llamadocs_chat/main.py", line 8, in <module>
from llama_index.core import VectorStoreIndex
File "/home/adminuser/venv/lib/python3.10/site-packages/llama_index/core/__init__.py", line 8, in <module>
from llama_index.core.base.response.schema import Response
File "/home/adminuser/venv/lib/python3.10/site-packages/llama_index/core/base/response/schema.py", line 7, in <module>
from llama_index.core.schema import NodeWithScore
File "/home/adminuser/venv/lib/python3.10/site-packages/llama_index/core/schema.py", line 14, in <module>
from llama_index.core.utils import SAMPLE_TEXT, truncate_text
File "/home/adminuser/venv/lib/python3.10/site-packages/llama_index/core/utils.py", line 89, in <module>
globals_helper = GlobalsHelper()
File "/home/adminuser/venv/lib/python3.10/site-packages/llama_index/core/utils.py", line 62, in __init__
nltk.download("stopwords", download_dir=self._nltk_data_dir)
File "/home/adminuser/venv/lib/python3.10/site-packages/nltk/downloader.py", line 777, in download
for msg in self.incr_download(info_or_id, download_dir, force):
File "/home/adminuser/venv/lib/python3.10/site-packages/nltk/downloader.py", line 642, in incr_download
yield from self._download_package(info, download_dir, force)
File "/home/adminuser/venv/lib/python3.10/site-packages/nltk/downloader.py", line 701, in _download_package
os.makedirs(os.path.join(download_dir, info.subdir))
File "/usr/local/lib/python3.10/os.py", line 225, in makedirs
mkdir(name, mode)
PermissionError: [Errno 13] Permission denied: '/home/adminuser/venv/lib/python3.10/site-packages/llama_index/core/_static/nltk_cache/corpora'
[05:54:43] ❗️
Here’s my app: https://llamadocschat.streamlit.app/
Here’s the source code: streamlit_llamadocs_chat/main.py at main · amnotme/streamlit_llamadocs_chat · GitHub
Python: 3.10
I’m using Llama_Index V0.10.3 which requires Nltk 3.8.1 so I can’t downgrade.
Any help is welcome. 
I think the application is running perfectly. It’s better to mention the package in single quotes like follows:-
nltk.download('stopwords')
Happy Streamlit-ing 
@Guna_Sekhar_Venkata Unfortunately It didn’t work. I redeployed doing the suggestions and I still get the permission error.
PermissionError: [Errno 13] Permission denied: ‘/home/adminuser/venv/lib/python3.9/site-packages/llama_index/core/_static/nltk_cache/corpora’
app: https://llamachatdocs.streamlit.app/
repo: streamlit_llamadocs_chat/main.py at main · amnotme/streamlit_llamadocs_chat · GitHub
python version: 3.9
any other suggestions 
11:29:12] 🐍 Python dependencies were installed from /mount/src/streamlit_llamadocs_chat/requirements.txt using pip.
Check if streamlit is installed
Streamlit is already installed
[11:29:13] 📦 Processed dependencies!
[nltk_data] Unzipping corpora/stopwords.zip.
[nltk_data] Downloading package stopwords to
[nltk_data] /home/adminuser/venv/lib/python3.9/site-
[nltk_data] packages/llama_index/core/_static/nltk_cache...
2024-02-15 11:29:24.810 Uncaught app exception
Traceback (most recent call last):
File "/home/adminuser/venv/lib/python3.9/site-packages/llama_index/core/utils.py", line 60, in __init__
nltk.data.find("corpora/stopwords", paths=[self._nltk_data_dir])
File "/home/adminuser/venv/lib/python3.9/site-packages/nltk/data.py", line 583, in find
raise LookupError(resource_not_found)
LookupError:
**********************************************************************
Resource stopwords not found.
Please use the NLTK Downloader to obtain the resource:
>>> import nltk
>>> nltk.download('stopwords')
For more information see: https://www.nltk.org/data.html
Attempted to load corpora/stopwords
Searched in:
- '/home/adminuser/venv/lib/python3.9/site-packages/llama_index/core/_static/nltk_cache'
**********************************************************************
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/adminuser/venv/lib/python3.9/site-packages/streamlit/runtime/scriptrunner/script_runner.py", line 535, in _run_script
exec(code, module.__dict__)
File "/mount/src/streamlit_llamadocs_chat/main.py", line 8, in <module>
from llama_index.core import VectorStoreIndex
File "/home/adminuser/venv/lib/python3.9/site-packages/llama_index/core/__init__.py", line 8, in <module>
from llama_index.core.base.response.schema import Response
File "/home/adminuser/venv/lib/python3.9/site-packages/llama_index/core/base/response/schema.py", line 7, in <module>
from llama_index.core.schema import NodeWithScore
File "/home/adminuser/venv/lib/python3.9/site-packages/llama_index/core/schema.py", line 14, in <module>
from llama_index.core.utils import SAMPLE_TEXT, truncate_text
File "/home/adminuser/venv/lib/python3.9/site-packages/llama_index/core/utils.py", line 89, in <module>
globals_helper = GlobalsHelper()
File "/home/adminuser/venv/lib/python3.9/site-packages/llama_index/core/utils.py", line 62, in __init__
nltk.download("stopwords", download_dir=self._nltk_data_dir)
File "/home/adminuser/venv/lib/python3.9/site-packages/nltk/downloader.py", line 777, in download
for msg in self.incr_download(info_or_id, download_dir, force):
File "/home/adminuser/venv/lib/python3.9/site-packages/nltk/downloader.py", line 642, in incr_download
yield from self._download_package(info, download_dir, force)
File "/home/adminuser/venv/lib/python3.9/site-packages/nltk/downloader.py", line 701, in _download_package
os.makedirs(os.path.join(download_dir, info.subdir))
File "/usr/local/lib/python3.9/os.py", line 225, in makedirs
mkdir(name, mode)
PermissionError: [Errno 13] Permission denied: '/home/adminuser/venv/lib/python3.9/site-packages/llama_index/core/_static/nltk_cache/corpora'
[11:29:24] ❗️
please fix this as this is not userland control but done lazily by the lib upon first run, so we can’t do anything about this. And streamlit should not restrict anything in the app from doing file operations imo. That is severely limiting the UX perspective.
are you part of the team @Guna_Sekhar_Venkata ?
Why not create proper jails and chown it to the runner of the code ?
I see that the problem came down to a module attempting to set a download path
from an sdk module
the llamaIndex.core.utils.GlobalHelper wants to set a the download path here if there is no NLTK_DATA set as a global variable
class GlobalsHelper:
"""Helper to retrieve globals.
Helpful for global caching of certain variables that can be expensive to load.
(e.g. tokenization)
_stopwords: Optional[List[str]] = None
_nltk_data_dir: Optional[str] = None
def __init__(self) -> None:
"""Initialize NLTK stopwords and punkt."""
import nltk
self._nltk_data_dir = os.environ.get(
"NLTK_DATA",
os.path.join(
os.path.dirname(os.path.abspath(__file__)),
"_static/nltk_cache",
you can set the global variable programmatically or simply add it to the secrets.toml via the manage app menu… i set it through the latter.
then you’ll need to point all of your downloads there.
import os
import nltk
nltk_data_dir = "./resources/nltk_data_dir/"
if not os.path.exists(nltk_data_dir):
os.makedirs(nltk_data_dir, exist_ok=True)
nltk.data.path.clear()
nltk.data.path.append(nltk_data_dir)
nltk.download("stopwords", download_dir=nltk_data_dir)
nltk.download('punkt', download_dir=nltk_data_dir)
this has solved THIS issue but once the app is up and running when one attempts to cache resource… well… you need to write to the system and this… breaks as well
File "/home/adminuser/venv/lib/python3.10/site-packages/streamlit/runtime/scriptrunner/script_runner.py", line 535, in _run_script
exec(code, module.__dict__)
File "/mount/src/streamlit_llamadocs_chat/main.py", line 310, in <module>
main_chat_functionality()
File "/mount/src/streamlit_llamadocs_chat/main.py", line 286, in main_chat_functionality
index = get_index(api_key=st.session_state.openai_key)
File "/home/adminuser/venv/lib/python3.10/site-packages/streamlit/runtime/caching/cache_utils.py", line 212, in wrapper
return cached_func(*args, **kwargs)
File "/home/adminuser/venv/lib/python3.10/site-packages/streamlit/runtime/caching/cache_utils.py", line 241, in __call__
return self._get_or_create_cached_value(args, kwargs)
File "/home/adminuser/venv/lib/python3.10/site-packages/streamlit/runtime/caching/cache_utils.py", line 268, in _get_or_create_cached_value
return self._handle_cache_miss(cache, value_key, func_args, func_kwargs)
File "/home/adminuser/venv/lib/python3.10/site-packages/streamlit/runtime/caching/cache_utils.py", line 324, in _handle_cache_miss
computed_value = self._info.func(*func_args, **func_kwargs)
File "/mount/src/streamlit_llamadocs_chat/main.py", line 104, in get_index
return VectorStoreIndex.from_vector_store(vector_store=vector_store)
File "/home/adminuser/venv/lib/python3.10/site-packages/llama_index/core/indices/vector_store/base.py", line 103, in from_vector_store
return cls(
File "/home/adminuser/venv/lib/python3.10/site-packages/llama_index/core/indices/vector_store/base.py", line 74, in __init__
super().__init__(
File "/home/adminuser/venv/lib/python3.10/site-packages/llama_index/core/indices/base.py", line 99, in __init__
or transformations_from_settings_or_context(Settings, service_context)
File "/home/adminuser/venv/lib/python3.10/site-packages/llama_index/core/settings.py", line 316, in transformations_from_settings_or_context
return settings.transformations
File "/home/adminuser/venv/lib/python3.10/site-packages/llama_index/core/settings.py", line 243, in transformations
self._transformations = [self.node_parser]
File "/home/adminuser/venv/lib/python3.10/site-packages/llama_index/core/settings.py", line 144, in node_parser
self._node_parser = SentenceSplitter()
File "/home/adminuser/venv/lib/python3.10/site-packages/llama_index/core/node_parser/text/sentence.py", line 91, in __init__
self._tokenizer = tokenizer or get_tokenizer()
File "/home/adminuser/venv/lib/python3.10/site-packages/llama_index/core/utils.py", line 129, in get_tokenizer
enc = tiktoken.encoding_for_model("gpt-3.5-turbo")
File "/home/adminuser/venv/lib/python3.10/site-packages/tiktoken/model.py", line 101, in encoding_for_model
return get_encoding(encoding_name_for_model(model_name))
File "/home/adminuser/venv/lib/python3.10/site-packages/tiktoken/registry.py", line 73, in get_encoding
enc = Encoding(**constructor())
File "/home/adminuser/venv/lib/python3.10/site-packages/tiktoken_ext/openai_public.py", line 72, in cl100k_base
mergeable_ranks = load_tiktoken_bpe(
File "/home/adminuser/venv/lib/python3.10/site-packages/tiktoken/load.py", line 147, in load_tiktoken_bpe
contents = read_file_cached(tiktoken_bpe_file, expected_hash)
File "/home/adminuser/venv/lib/python3.10/site-packages/tiktoken/load.py", line 74, in read_file_cached
with open(tmp_filename, "wb") as f:
Sooo… still looking but at least nltk is up and running
Last part for me was to set a caching directory for tiktoken module here.
Fortunately there was a global variable that I could also set up.
"TIKTOKEN_CACHE_DIR"
App is up and running
Sure thing. The secrets.toml file is populated via the advanced features or settings of the app. You can do this JUST before deploying it or after deploying it.
you’ll see the Settings gear icon once you click on the three dots.
you will then see the Secrets menu icon and it will display
an editor for you to add your secrets. THESE are effectively your environment variables to set a runtime.
I don’t use the secrets.toml file locally as I use a .env file with dotenv module to load them.
Add your secrets as follows
ONE_API="thisIsTheSecret"
ANOTHER_ENV_VAR="thisIsTheOtherSecret"
Sure. So that variable should be the directory where you want your nltk data
Please make sure that this folder is writeable. So I just added mine directly where the app lives under resources.
NLTK_DATA="./resources/nltk_data_dir/"
Thanks! For some reason, after implementing these changes and running my app for a little bit I’m now getting this error…
llama-index
Permission denied: ‘/home/adminuser/venv/lib/python3.9/site-packages/llama_index/core/_static/nltk_cache/corpora’
LookupError:
Resource stopwords not found.
Please use the NLTK Downloader to obtain the resource:
import nltk
For more information see: NLTK :: Installing NLTK Data
Attempted to load corpora/stopwords
Searched in:
- '/home/adminuser/venv/lib/python3.9/site-packages/llama_index/core/_static/nltk_cache'