I'm trying to run 'lda2vec' which calls 'chainer' on my AWS g2.2xlarge Ubuntu 14.04 LTS instance.
I'm getting this error:
---------------------------------------------------------------------------
RuntimeError Traceback (most recent call last)
<ipython-input-4-bccd002b2788> in <module>()
21 gpu_id = int(os.getenv('CUDA_GPU', 0))
---> 22 cuda.get_device(gpu_id).use()
23 print "Using GPU " + str(gpu_id)
/home/ubuntu/anaconda/lib/python2.7/site-packages/chainer-1.22.0-py2.7-linux-x86_64.egg/chainer/cuda.pyc in get_device(*args)
220 for arg in args:
221 if type(arg) in _integer_types:
--> 222 check_cuda_available()
223 return Device(arg)
224 if isinstance(arg, ndarray):
/home/ubuntu/anaconda/lib/python2.7/site-packages/chainer-1.22.0-py2.7-linux-x86_64.egg/chainer/cuda.pyc in check_cuda_available()
85 '(see https://github.com/pfnet/chainer#installation).')
86 msg += str(_resolution_error)
---> 87 raise RuntimeError(msg)
88 if (not cudnn_enabled and
89 not _cudnn_disabled_by_user and
RuntimeError: CUDA environment is not correctly set up
(see https://github.com/pfnet/chainer#installation).CuPy is not correctly installed. Please check your environment, uninstall Chainer and reinstall it with `pip install chainer --no-cache-dir -vvvv`.
original error: libcublas.so.7.0: cannot open shared object file: No such file or directory
I installed 'chainer' from source:
In [9]: import chainer
In [10]: chainer.__version__
Out[10]: '1.22.0'
Tensorflow runs and utilizes the GPU (so we know CUDA 7.5 is working):
In [1]: import tensorflow
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcublas.so.7.5 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcudnn.so.5 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcufft.so.7.5 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcuda.so.1 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcurand.so.7.5 locally
NVIDIA graphics driver: nvidia-graphics-drivers-367_367.57.orig.tar
cuda 7.5: cudnn-7.5-linux-x64-v5.1.tgz
ubuntu@ip-10-0-1-127:~$ ls -l /usr/local/cuda-7.0/lib64/libcublas.so.7.0*
lrwxrwxrwx 1 root root 19 Mar 27 16:01 /usr/local/cuda-7.0/lib64/libcublas.so.7.0 -> libcublas.so.7.0.28
-rwxr-xr-x 1 root root 31160168 Mar 27 16:02 /usr/local/cuda-7.0/lib64/libcublas.so.7.0.28
ubuntu@ip-10-0-1-127:~$ ls -l /usr/local/cuda-7.0/lib64/
total 723244
-rw-r--r-- 1 root root 26032916 Mar 27 16:01 libcublas_device.a
lrwxrwxrwx 1 root root 16 Mar 27 16:01 libcublas.so -> libcublas.so.7.0
lrwxrwxrwx 1 root root 19 Mar 27 16:01 libcublas.so.7.0 -> libcublas.so.7.0.28
-rwxr-xr-x 1 root root 31160168 Mar 27 16:02 libcublas.so.7.0.28
-rw-r--r-- 1 root root 35269768 Mar 27 16:02 libcublas_static.a
-rw-r--r-- 1 root root 310328 Mar 27 16:01 libcudadevrt.a
lrwxrwxrwx 1 root root 16 Mar 27 16:02 libcudart.so -> libcudart.so.7.0
lrwxrwxrwx 1 root root 19 Mar 27 16:01 libcudart.so.7.0 -> libcudart.so.7.0.28
-rwxr-xr-x 1 root root 377896 Mar 27 16:01 libcudart.so.7.0.28
-rw-r--r-- 1 root root 708938 Mar 27 16:02 libcudart_static.a
lrwxrwxrwx 1 root root 15 Mar 27 16:01 libcufft.so -> libcufft.so.7.0
lrwxrwxrwx 1 root root 18 Mar 27 16:01 libcufft.so.7.0 -> libcufft.so.7.0.28
-rwxr-xr-x 1 root root 62554648 Mar 27 16:01 libcufft.so.7.0.28
-rw-r--r-- 1 root root 91554528 Mar 27 16:02 libcufft_static.a
lrwxrwxrwx 1 root root 16 Mar 27 16:02 libcufftw.so -> libcufftw.so.7.0
lrwxrwxrwx 1 root root 19 Mar 27 16:02 libcufftw.so.7.0 -> libcufftw.so.7.0.28
-rwxr-xr-x 1 root root 443648 Mar 27 16:01 libcufftw.so.7.0.28
-rw-r--r-- 1 root root 44288 Mar 27 16:01 libcufftw_static.a
lrwxrwxrwx 1 root root 17 Mar 27 16:01 libcuinj64.so -> libcuinj64.so.7.0
lrwxrwxrwx 1 root root 20 Mar 27 16:01 libcuinj64.so.7.0 -> libcuinj64.so.7.0.28
-rwxr-xr-x 1 root root 5276064 Mar 27 16:01 libcuinj64.so.7.0.28
-rw-r--r-- 1 root root 1649726 Mar 27 16:01 libculibos.a
lrwxrwxrwx 1 root root 16 Mar 27 16:01 libcurand.so -> libcurand.so.7.0
lrwxrwxrwx 1 root root 19 Mar 27 16:02 libcurand.so.7.0 -> libcurand.so.7.0.28
-rwxr-xr-x 1 root root 51746464 Mar 27 16:01 libcurand.so.7.0.28
-rw-r--r-- 1 root root 51981092 Mar 27 16:02 libcurand_static.a
lrwxrwxrwx 1 root root 18 Mar 27 16:01 libcusolver.so -> libcusolver.so.7.0
lrwxrwxrwx 1 root root 21 Mar 27 16:01 libcusolver.so.7.0 -> libcusolver.so.7.0.28
-rwxr-xr-x 1 root root 41170544 Mar 27 16:01 libcusolver.so.7.0.28
-rw-r--r-- 1 root root 17900430 Mar 27 16:01 libcusolver_static.a
lrwxrwxrwx 1 root root 18 Mar 27 16:01 libcusparse.so -> libcusparse.so.7.0
lrwxrwxrwx 1 root root 21 Mar 27 16:01 libcusparse.so.7.0 -> libcusparse.so.7.0.28
-rwxr-xr-x 1 root root 43486576 Mar 27 16:01 libcusparse.so.7.0.28
-rw-r--r-- 1 root root 51094980 Mar 27 16:01 libcusparse_static.a
lrwxrwxrwx 1 root root 14 Mar 27 16:01 libnppc.so -> libnppc.so.7.0
lrwxrwxrwx 1 root root 17 Mar 27 16:02 libnppc.so.7.0 -> libnppc.so.7.0.28
-rwxr-xr-x 1 root root 418944 Mar 27 16:02 libnppc.so.7.0.28
-rw-r--r-- 1 root root 20568 Mar 27 16:01 libnppc_static.a
lrwxrwxrwx 1 root root 14 Mar 27 16:02 libnppi.so -> libnppi.so.7.0
lrwxrwxrwx 1 root root 17 Mar 27 16:02 libnppi.so.7.0 -> libnppi.so.7.0.28
-rwxr-xr-x 1 root root 70595696 Mar 27 16:01 libnppi.so.7.0.28
-rw-r--r-- 1 root root 99608960 Mar 27 16:01 libnppi_static.a
lrwxrwxrwx 1 root root 14 Mar 27 16:02 libnpps.so -> libnpps.so.7.0
lrwxrwxrwx 1 root root 17 Mar 27 16:01 libnpps.so.7.0 -> libnpps.so.7.0.28
-rwxr-xr-x 1 root root 5851016 Mar 27 16:01 libnpps.so.7.0.28
-rw-r--r-- 1 root root 8461132 Mar 27 16:02 libnpps_static.a
lrwxrwxrwx 1 root root 16 Mar 27 16:02 libnvblas.so -> libnvblas.so.7.0
lrwxrwxrwx 1 root root 19 Mar 27 16:02 libnvblas.so.7.0 -> libnvblas.so.7.0.28
-rwxr-xr-x 1 root root 451984 Mar 27 16:01 libnvblas.so.7.0.28
lrwxrwxrwx 1 root root 24 Mar 27 16:01 libnvrtc-builtins.so -> libnvrtc-builtins.so.7.0
lrwxrwxrwx 1 root root 27 Mar 27 16:01 libnvrtc-builtins.so.7.0 -> libnvrtc-builtins.so.7.0.28
-rwxr-xr-x 1 root root 24079328 Mar 27 16:01 libnvrtc-builtins.so.7.0.28
lrwxrwxrwx 1 root root 15 Mar 27 16:01 libnvrtc.so -> libnvrtc.so.7.0
lrwxrwxrwx 1 root root 18 Mar 27 16:01 libnvrtc.so.7.0 -> libnvrtc.so.7.0.27
-rwxr-xr-x 1 root root 18179472 Mar 27 16:01 libnvrtc.so.7.0.27
lrwxrwxrwx 1 root root 18 Mar 27 16:02 libnvToolsExt.so -> libnvToolsExt.so.1
lrwxrwxrwx 1 root root 22 Mar 27 16:01 libnvToolsExt.so.1 -> libnvToolsExt.so.1.0.0
-rwxr-xr-x 1 root root 37568 Mar 27 16:01 libnvToolsExt.so.1.0.0
-rw-r--r-- 1 root root 25840 Mar 27 16:02 libOpenCL.so
drwxr-xr-x 2 root root 4096 Mar 27 16:02 stubs
ubuntu@ip-10-0-1-127:~$
Is this going to be a problem?
ERROR (theano.sandbox.cuda): nvcc compiler not found on $PATH. Check your nvcc installation and try again.
OSError Traceback (most recent call last)
in ()
102 t0 = time.time()
103 optimizer.zero_grads()
--> 104 l = model.fit_partial(s.copy(), a.copy(), f.copy())
105 prior = model.prior()
106 loss = prior * fraction
/hdfs/lda2vec/examples/hacker_news/lda2vec/lda2vec_model.pyc in fit_partial(self, rsty_ids, raut_ids, rwrd_ids, window)
40 sty_ids, aut_ids, wrd_ids = move(self.xp, rsty_ids, raut_ids, rwrd_ids)
41 pivot_idx = next(move(self.xp, rwrd_ids[window: -window]))
---> 42 pivot = F.embed_id(pivot_idx, self.sampler.W)
43 sty_at_pivot = rsty_ids[window: -window]
44 aut_at_pivot = raut_ids[window: -window]
/home/ubuntu/anaconda/lib/python2.7/site-packages/chainer-1.22.0-py2.7-linux-x86_64.egg/chainer/functions/connection/embed_id.pyc in embed_id(x, W, ignore_label)
110 """
--> 111 return EmbedIDFunction(ignore_label=ignore_label)(x, W)
/home/ubuntu/anaconda/lib/python2.7/site-packages/chainer-1.22.0-py2.7-linux-x86_64.egg/chainer/function.pyc in call(self, *inputs)
197 # Forward prop
198 with cuda.get_device(*in_data):
--> 199 outputs = self.forward(in_data)
200 assert type(outputs) == tuple
201 for hook in six.itervalues(hooks):
/home/ubuntu/anaconda/lib/python2.7/site-packages/chainer-1.22.0-py2.7-linux-x86_64.egg/chainer/functions/connection/embed_id.pyc in forward(self, inputs)
47 mask[..., None], 0, W.take(xp.where(mask, 0, x), axis=0)),
---> 49 return W.take(x, axis=0),
51 def backward(self, inputs, grad_outputs):
cupy/core/core.pyx in cupy.core.core.ndarray.take (cupy/core/core.cpp:13176)()
cupy/core/core.pyx in cupy.core.core.ndarray.take (cupy/core/core.cpp:13065)()
cupy/core/core.pyx in cupy.core.core._take (cupy/core/core.cpp:67356)()
cupy/core/elementwise.pxi in cupy.core.core.ElementwiseKernel.call (cupy/core/core.cpp:42995)()
cupy/util.pyx in cupy.util.memoize.decorator.ret (cupy/util.cpp:1471)()
cupy/core/elementwise.pxi in cupy.core.core._get_elementwise_kernel (cupy/core/core.cpp:41341)()
cupy/core/elementwise.pxi in cupy.core.core._get_simple_elementwise_kernel (cupy/core/core.cpp:33972)()
cupy/core/elementwise.pxi in cupy.core.core._get_simple_elementwise_kernel (cupy/core/core.cpp:33794)()
cupy/core/carray.pxi in cupy.core.core.compile_with_cache (cupy/core/core.cpp:33449)()
/home/ubuntu/anaconda/lib/python2.7/site-packages/chainer-1.22.0-py2.7-linux-x86_64.egg/cupy/cuda/compiler.pyc in compile_with_cache(source, options, arch, cache_dir)
123 options += '-m32',
--> 125 env = (arch, options, _get_nvcc_version())
126 if '#include' in source:
127 pp_src = '%s %s' % (env, preprocess(source, options))
/home/ubuntu/anaconda/lib/python2.7/site-packages/chainer-1.22.0-py2.7-linux-x86_64.egg/cupy/cuda/compiler.pyc in _get_nvcc_version()
20 if _nvcc_version is None:
21 cmd = ['nvcc', '--version']
---> 22 _nvcc_version = _run_nvcc(cmd, '.')
24 return _nvcc_version
/home/ubuntu/anaconda/lib/python2.7/site-packages/chainer-1.22.0-py2.7-linux-x86_64.egg/cupy/cuda/compiler.pyc in _run_nvcc(cmd, cwd)
59 'Check PATH environment variable: '
60 + str(e)
---> 61 raise OSError(msg)
OSError: Failed to run nvcc
command. Check PATH environment variable: [Errno 2] No such file or directory
In [5]:
I tried installing the cuda-toolkit to to get nvcc.
$ sudo apt-get install nvidia-cuda-toolkit
It installed the 375 version of the driver and broke CUDA. I need the 376 version.
WARNING (theano.sandbox.cuda): CUDA is installed, but device gpu is not available (error: Unable to get the number of gpus available: unknown error)
How did you install CUDA? Chainer uses nvcc
command, that is a compiler for CUDA environment and is provided by CUDA. You need to set PATH
environment correctly.
If you are using amazon linux AMI provided by NVIDIA, nvcc
is already in PATH
. And, all you need to do is pip install chainer
.
If you don't use it and install CUDA manually, I recommend you to install CUDA from deb
package. And, you need to add /usr/local/cuda/bin
to PATH
environment variable. My fabfile may helps you: https://github.com/unnonouno/chainer-vagrant-ec2/blob/master/fabfile.py
I’m using my own AMI. I’ve been having trouble getting the GPU on the g2.2xlarge instance to work.
I had to use the 367 version of the NVIDIA driver to get TensorFlow to work.
nvidia-graphics-drivers-367_367.57.orig.tar
Unfortunately, this package did not install ‘nvcc’. Other attempts to install CUDA installed the 375 version of the driver which doesn’t work
on the g2.2xlarge instance.
I’m trying this now:
cuda_7.5.18_linux.run
from:
https://developer.nvidia.com/cuda-75-downloads-archive
On Apr 9, 2017, at 12:13 AM, Yuya Unno ***@***.***> wrote:
How did you install CUDA? Chainer uses nvcc command, that is a compiler for CUDA environment and is provided by CUDA. You need to set PATH environment correctly.
If you are using amazon linux AMI provided by NVIDIA, nvcc is already in PATH. And, all you need to do is pip install chainer.
If you don't use it and install CUDA manually, I recommend you to install CUDA from deb package. And, you need to add /usr/local/cuda/bin to PATH environment variable. My fabfile may helps you:
https://github.com/unnonouno/chainer-vagrant-ec2/blob/master/fabfile.py <
https://github.com/unnonouno/chainer-vagrant-ec2/blob/master/fabfile.py>
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub <
#2516 (comment)>, or mute the thread <
https://github.com/notifications/unsubscribe-auth/AC9i2zplWQK-tP2vhBtafTWSJk03Ckouks5ruIUvgaJpZM4M3z8a>.
The NVIDIA CUDA Toolkit provides command-line and graphical
Do you accept the previously read EULA? (accept/decline/quit): accept
Install NVIDIA Accelerated Graphics Driver for Linux-x86_64 352.39? ((y)es/(n)o/(q)uit): y
Do you want to install the OpenGL libraries? ((y)es/(n)o/(q)uit) [ default is yes ]: n
Install the CUDA 7.5 Toolkit? ((y)es/(n)o/(q)uit): y
Enter Toolkit Location [ default is /usr/local/cuda-7.5 ]: y
Toolkit location must be an absolute path.
Enter Toolkit Location [ default is /usr/local/cuda-7.5 ]:
Do you want to install a symbolic link at /usr/local/cuda? ((y)es/(n)o/(q)uit): y
Install the CUDA 7.5 Samples? ((y)es/(n)o/(q)uit): y
Enter CUDA Samples Location [ default is /home/ubuntu ]:
Installing the NVIDIA display driver...
The driver installation has failed due to an unknown error. Please consult the driver installation log located at /var/log/nvidia-installer.log.
===========
= Summary =
===========
Driver: Installation Failed
Toolkit: Installation skipped
Samples: Installation skipped
Logfile is /tmp/cuda_install_14884.log
ubuntu@ip-10-0-1-189:~$
ERROR: An NVIDIA kernel module 'nvidia' appears to already be loaded in your kernel. This may be because it is in use (for example, by an X server, a CUDA program, or the NVIDIA Persistence Daemon), but this may also happen if your kernel was configured without support for module unloading. Please be sure to exit any programs that may be using the GPU(s) before attempting to upgrade your driver. If no GPU-based programs are running, you know that your kernel supports module unloading, and you still receive this message, then an error may have occured that has corrupted an NVIDIA kernel module's usage count, for which the simplest remedy is to reboot your computer.
ERROR: Installation has failed. Please see the file '/var/log/nvidia-installer.log' for details. You may find suggestions on fixing installation problems in the README available on the Linux driver download page at
www.nvidia.com.
18,1 Bot
ubuntu@ip-10-0-1-189:~$ lsmod | grep nvidia
nvidia_drm 14412 0
nvidia_modeset 764387 1 nvidia_drm
nvidia 11488748 1 nvidia_modeset
drm 303102 4 ttm,drm_kms_helper,cirrus,nvidia_drm
ubuntu@ip-10-0-1-189:~$ sudo apt-get --purge remove nvidia*
Reading package lists... Done
Building dependency tree
Reading state information... Done
E: Unable to locate package nvidia-graphics-drivers
E: Unable to locate package nvidia-graphics-drivers-367_367.57.orig.tar
E: Couldn't find any package by regex 'nvidia-graphics-drivers-367_367.57.orig.tar'
On Apr 9, 2017, at 8:39 AM, David Laxer ***@***.***> wrote:
I’m using my own AMI. I’ve been having trouble getting the GPU on the g2.2xlarge instance to work.
I had to use the 367 version of the NVIDIA driver to get TensorFlow to work.
nvidia-graphics-drivers-367_367.57.orig.tar
Unfortunately, this package did not install ‘nvcc’. Other attempts to install CUDA installed the 375 version of the driver which doesn’t work
on the g2.2xlarge instance.
I’m trying this now:
cuda_7.5.18_linux.run
from:
https://developer.nvidia.com/cuda-75-downloads-archive <
https://developer.nvidia.com/cuda-75-downloads-archive>
> On Apr 9, 2017, at 12:13 AM, Yuya Unno ***@***.*** ***@***.***>> wrote:
> How did you install CUDA? Chainer uses nvcc command, that is a compiler for CUDA environment and is provided by CUDA. You need to set PATH environment correctly.
> If you are using amazon linux AMI provided by NVIDIA, nvcc is already in PATH. And, all you need to do is pip install chainer.
> If you don't use it and install CUDA manually, I recommend you to install CUDA from deb package. And, you need to add /usr/local/cuda/bin to PATH environment variable. My fabfile may helps you:
https://github.com/unnonouno/chainer-vagrant-ec2/blob/master/fabfile.py <
https://github.com/unnonouno/chainer-vagrant-ec2/blob/master/fabfile.py>
> You are receiving this because you authored the thread.
> Reply to this email directly, view it on GitHub <
#2516 (comment)>, or mute the thread <
https://github.com/notifications/unsubscribe-auth/AC9i2zplWQK-tP2vhBtafTWSJk03Ckouks5ruIUvgaJpZM4M3z8a>.
This is the history of the issue:
http://stackoverflow.com/questions/42984743/nvidia-smi-has-failed-because-it-couldnt-communicate-with-the-nvidia-driver
On Apr 9, 2017, at 9:04 AM, David Laxer ***@***.***> wrote:
Installation failed:
The NVIDIA CUDA Toolkit provides command-line and graphical
Do you accept the previously read EULA? (accept/decline/quit): accept
Install NVIDIA Accelerated Graphics Driver for Linux-x86_64 352.39? ((y)es/(n)o/(q)uit): y
Do you want to install the OpenGL libraries? ((y)es/(n)o/(q)uit) [ default is yes ]: n
Install the CUDA 7.5 Toolkit? ((y)es/(n)o/(q)uit): y
Enter Toolkit Location [ default is /usr/local/cuda-7.5 ]: y
Toolkit location must be an absolute path.
Enter Toolkit Location [ default is /usr/local/cuda-7.5 ]:
Do you want to install a symbolic link at /usr/local/cuda? ((y)es/(n)o/(q)uit): y
Install the CUDA 7.5 Samples? ((y)es/(n)o/(q)uit): y
Enter CUDA Samples Location [ default is /home/ubuntu ]:
Installing the NVIDIA display driver...
The driver installation has failed due to an unknown error. Please consult the driver installation log located at /var/log/nvidia-installer.log.
===========
= Summary =
===========
Driver: Installation Failed
Toolkit: Installation skipped
Samples: Installation skipped
Logfile is /tmp/cuda_install_14884.log
***@***.***:~$
ERROR: An NVIDIA kernel module 'nvidia' appears to already be loaded in your kernel. This may be because it is in use (for example, by an X server, a CUDA program, or the NVIDIA Persistence Daemon), but this may also happen if your kernel was configured without support for module unloading. Please be sure to exit any programs that may be using the GPU(s) before attempting to upgrade your driver. If no GPU-based programs are running, you know that your kernel supports module unloading, and you still receive this message, then an error may have occured that has corrupted an NVIDIA kernel module's usage count, for which the simplest remedy is to reboot your computer.
ERROR: Installation has failed. Please see the file '/var/log/nvidia-installer.log' for details. You may find suggestions on fixing installation problems in the README available on the Linux driver download page at
www.nvidia.com <
http://www.nvidia.com/>.
18,1 Bot
***@***.***:~$ lsmod | grep nvidia
nvidia_drm 14412 0
nvidia_modeset 764387 1 nvidia_drm
nvidia 11488748 1 nvidia_modeset
drm 303102 4 ttm,drm_kms_helper,cirrus,nvidia_drm
***@***.***:~$ sudo apt-get --purge remove nvidia*
Reading package lists... Done
Building dependency tree
Reading state information... Done
E: Unable to locate package nvidia-graphics-drivers
E: Unable to locate package nvidia-graphics-drivers-367_367.57.orig.tar
E: Couldn't find any package by regex 'nvidia-graphics-drivers-367_367.57.orig.tar'
> On Apr 9, 2017, at 8:39 AM, David Laxer ***@***.*** ***@***.***>> wrote:
> I’m using my own AMI. I’ve been having trouble getting the GPU on the g2.2xlarge instance to work.
> I had to use the 367 version of the NVIDIA driver to get TensorFlow to work.
> nvidia-graphics-drivers-367_367.57.orig.tar
> Unfortunately, this package did not install ‘nvcc’. Other attempts to install CUDA installed the 375 version of the driver which doesn’t work
> on the g2.2xlarge instance.
> I’m trying this now:
> cuda_7.5.18_linux.run
> from:
>
https://developer.nvidia.com/cuda-75-downloads-archive <
https://developer.nvidia.com/cuda-75-downloads-archive>
>> On Apr 9, 2017, at 12:13 AM, Yuya Unno ***@***.*** ***@***.***>> wrote:
>> How did you install CUDA? Chainer uses nvcc command, that is a compiler for CUDA environment and is provided by CUDA. You need to set PATH environment correctly.
>> If you are using amazon linux AMI provided by NVIDIA, nvcc is already in PATH. And, all you need to do is pip install chainer.
>> If you don't use it and install CUDA manually, I recommend you to install CUDA from deb package. And, you need to add /usr/local/cuda/bin to PATH environment variable. My fabfile may helps you:
https://github.com/unnonouno/chainer-vagrant-ec2/blob/master/fabfile.py <
https://github.com/unnonouno/chainer-vagrant-ec2/blob/master/fabfile.py>
>> You are receiving this because you authored the thread.
>> Reply to this email directly, view it on GitHub <
#2516 (comment)>, or mute the thread <
https://github.com/notifications/unsubscribe-auth/AC9i2zplWQK-tP2vhBtafTWSJk03Ckouks5ruIUvgaJpZM4M3z8a>.
Tensorflow still works with the GPU. Any suggestions on how to get only the toolkit with ‘nvcc’?
ubuntu@ip-10-0-1-189:~$ nvidia-smi
Sun Apr 9 16:23:03 2017
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 367.57 Driver Version: 367.57 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GRID K520 Off | 0000:00:03.0 Off | N/A |
| N/A 33C P0 41W / 125W | 3742MiB / 4036MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 3222 C python 3740MiB |
+-----------------------------------------------------------------------------+
ubuntu@ip-10-0-1-189:~$
ubuntu@ip-10-0-1-189:~/models/neural_gpu$ python neural_gpu_trainer.py --problem=bmul
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcublas.so.7.5 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcudnn.so.5 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcufft.so.7.5 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcuda.so.1 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcurand.so.7.5 locally
Error appending to /tmp/neural_gpu/log
Creating checkpoint directory /tmp/neural_gpu.
Generating data for bmul.
cut 1.20 lr 0.100 iw 0.80 cr 0.30 nm 64 d0.1000 gn 4.00 layers 2 kw 3 h 4 kh 3 batch 32 noise 0.00
Creating model.
Creating backward pass for the model.
WARNING:tensorflow:Tried to colocate gpu0/gradients/gpu0/Gather_2_grad/Shape with an op target_embedding/read that had a different device: /device:GPU:0 vs /device:CPU:0. Ignoring colocation property.
Created model for gpu 0 in 6.29 s.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.1 instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations.
I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:910] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
I tensorflow/core/common_runtime/gpu/gpu_device.cc:885] Found device 0 with properties:
name: GRID K520
major: 3 minor: 0 memoryClockRate (GHz) 0.797
pciBusID 0000:00:03.0
Total memory: 3.94GiB
Free memory: 3.91GiB
I tensorflow/core/common_runtime/gpu/gpu_device.cc:906] DMA: 0
I tensorflow/core/common_runtime/gpu/gpu_device.cc:916] 0: Y
I tensorflow/core/common_runtime/gpu/gpu_device.cc:975] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GRID K520, pci bus id: 0000:00:03.0)
Created model. Checkpoint dir /tmp/neural_gpu
I tensorflow/core/common_runtime/gpu/pool_allocator.cc:247] PoolAllocator: After 2495 get requests, put_count=2461 evicted_count=1000 eviction_rate=0.406339 and unsatisfied allocation rate=0.454509
I tensorflow/core/common_runtime/gpu/pool_allocator.cc:259] Raising pool_size_limit_ from 100 to 110
I tensorflow/core/common_runtime/gpu/pool_allocator.cc:247] PoolAllocator: After 6023 get requests, put_count=4617 evicted_count=1000 eviction_rate=0.216591 and unsatisfied allocation rate=0.403287
I tensorflow/core/common_runtime/gpu/pool_allocator.cc:259] Raising pool_size_limit_ from 256 to 281
On Apr 9, 2017, at 9:04 AM, David Laxer ***@***.***> wrote:
Installation failed:
The NVIDIA CUDA Toolkit provides command-line and graphical
Do you accept the previously read EULA? (accept/decline/quit): accept
Install NVIDIA Accelerated Graphics Driver for Linux-x86_64 352.39? ((y)es/(n)o/(q)uit): y
Do you want to install the OpenGL libraries? ((y)es/(n)o/(q)uit) [ default is yes ]: n
Install the CUDA 7.5 Toolkit? ((y)es/(n)o/(q)uit): y
Enter Toolkit Location [ default is /usr/local/cuda-7.5 ]: y
Toolkit location must be an absolute path.
Enter Toolkit Location [ default is /usr/local/cuda-7.5 ]:
Do you want to install a symbolic link at /usr/local/cuda? ((y)es/(n)o/(q)uit): y
Install the CUDA 7.5 Samples? ((y)es/(n)o/(q)uit): y
Enter CUDA Samples Location [ default is /home/ubuntu ]:
Installing the NVIDIA display driver...
The driver installation has failed due to an unknown error. Please consult the driver installation log located at /var/log/nvidia-installer.log.
===========
= Summary =
===========
Driver: Installation Failed
Toolkit: Installation skipped
Samples: Installation skipped
Logfile is /tmp/cuda_install_14884.log
***@***.***:~$
ERROR: An NVIDIA kernel module 'nvidia' appears to already be loaded in your kernel. This may be because it is in use (for example, by an X server, a CUDA program, or the NVIDIA Persistence Daemon), but this may also happen if your kernel was configured without support for module unloading. Please be sure to exit any programs that may be using the GPU(s) before attempting to upgrade your driver. If no GPU-based programs are running, you know that your kernel supports module unloading, and you still receive this message, then an error may have occured that has corrupted an NVIDIA kernel module's usage count, for which the simplest remedy is to reboot your computer.
ERROR: Installation has failed. Please see the file '/var/log/nvidia-installer.log' for details. You may find suggestions on fixing installation problems in the README available on the Linux driver download page at
www.nvidia.com <
http://www.nvidia.com/>.
18,1 Bot
***@***.***:~$ lsmod | grep nvidia
nvidia_drm 14412 0
nvidia_modeset 764387 1 nvidia_drm
nvidia 11488748 1 nvidia_modeset
drm 303102 4 ttm,drm_kms_helper,cirrus,nvidia_drm
***@***.***:~$ sudo apt-get --purge remove nvidia*
Reading package lists... Done
Building dependency tree
Reading state information... Done
E: Unable to locate package nvidia-graphics-drivers
E: Unable to locate package nvidia-graphics-drivers-367_367.57.orig.tar
E: Couldn't find any package by regex 'nvidia-graphics-drivers-367_367.57.orig.tar'
> On Apr 9, 2017, at 8:39 AM, David Laxer ***@***.*** ***@***.***>> wrote:
> I’m using my own AMI. I’ve been having trouble getting the GPU on the g2.2xlarge instance to work.
> I had to use the 367 version of the NVIDIA driver to get TensorFlow to work.
> nvidia-graphics-drivers-367_367.57.orig.tar
> Unfortunately, this package did not install ‘nvcc’. Other attempts to install CUDA installed the 375 version of the driver which doesn’t work
> on the g2.2xlarge instance.
> I’m trying this now:
> cuda_7.5.18_linux.run
> from:
>
https://developer.nvidia.com/cuda-75-downloads-archive <
https://developer.nvidia.com/cuda-75-downloads-archive>
>> On Apr 9, 2017, at 12:13 AM, Yuya Unno ***@***.*** ***@***.***>> wrote:
>> How did you install CUDA? Chainer uses nvcc command, that is a compiler for CUDA environment and is provided by CUDA. You need to set PATH environment correctly.
>> If you are using amazon linux AMI provided by NVIDIA, nvcc is already in PATH. And, all you need to do is pip install chainer.
>> If you don't use it and install CUDA manually, I recommend you to install CUDA from deb package. And, you need to add /usr/local/cuda/bin to PATH environment variable. My fabfile may helps you:
https://github.com/unnonouno/chainer-vagrant-ec2/blob/master/fabfile.py <
https://github.com/unnonouno/chainer-vagrant-ec2/blob/master/fabfile.py>
>> You are receiving this because you authored the thread.
>> Reply to this email directly, view it on GitHub <
#2516 (comment)>, or mute the thread <
https://github.com/notifications/unsubscribe-auth/AC9i2zplWQK-tP2vhBtafTWSJk03Ckouks5ruIUvgaJpZM4M3z8a>.
I got further … by copying ‘nvcc’ from an NVIDIA AMI into /usr/local/cuda-7.0/bin.
Any ideas?
===============================
nvcc fatal : Path to libdevice library not specified
['nvcc', '-shared', '-O3', '-use_fast_math', '-m64', '-Xcompiler', '-DCUDA_NDARRAY_CUH=c72d035fdf91890f3b36710688069b2e,-DNPY_NO_DEPRECATED_API=NPY_1_7_API_VERSION,-fPIC,-fvisibility=hidden', '-Xlinker', '-rpath,/home/ubuntu/.theano/compiledir_Linux-3.13--generic-x86_64-with-debian-jessie-sid-x86_64-2.7.11-64/cuda_ndarray', '-I/home/ubuntu/anaconda/lib/python2.7/site-packages/theano/sandbox/cuda', '-I/home/ubuntu/anaconda/lib/python2.7/site-packages/numpy/core/include', '-I/home/ubuntu/anaconda/include/python2.7', '-I/home/ubuntu/anaconda/lib/python2.7/site-packages/theano/gof', '-o', '/home/ubuntu/.theano/compiledir_Linux-3.13--generic-x86_64-with-debian-jessie-sid-x86_64-2.7.11-64/cuda_ndarray/cuda_ndarray.so', 'mod.cu', '-L/home/ubuntu/anaconda/lib', '-lcublas', '-lpython2.7', '-lcudart']
ERROR (theano.sandbox.cuda): Failed to compile cuda_ndarray.cu: ('nvcc return status', 1, 'for cmd', 'nvcc -shared -O3 -use_fast_math -m64 -Xcompiler -DCUDA_NDARRAY_CUH=c72d035fdf91890f3b36710688069b2e,-DNPY_NO_DEPRECATED_API=NPY_1_7_API_VERSION,-fPIC,-fvisibility=hidden -Xlinker -rpath,/home/ubuntu/.theano/compiledir_Linux-3.13--generic-x86_64-with-debian-jessie-sid-x86_64-2.7.11-64/cuda_ndarray -I/home/ubuntu/anaconda/lib/python2.7/site-packages/theano/sandbox/cuda -I/home/ubuntu/anaconda/lib/python2.7/site-packages/numpy/core/include -I/home/ubuntu/anaconda/include/python2.7 -I/home/ubuntu/anaconda/lib/python2.7/site-packages/theano/gof -o /home/ubuntu/.theano/compiledir_Linux-3.13--generic-x86_64-with-debian-jessie-sid-x86_64-2.7.11-64/cuda_ndarray/cuda_ndarray.so mod.cu -L/home/ubuntu/anaconda/lib -lcublas -lpython2.7 -lcudart')
WARNING (theano.sandbox.cuda): CUDA is installed, but device gpu is not available (error: cuda unavailable)
In [3]: quit
ubuntu@ip-10-0-1-189:/hdfs/lda2vec/examples/hacker_news/lda2vec$ nvcc -shared -O3 -use_fast_math -m64 -Xcompiler -DCUDA_NDARRAY_CUH=c72d035fdf91890f3b36710688069b2e,-DNPY_NO_DEPRECATED_API=NPY_1_7_API_VERSION,-fPIC,-fvisibility=hidden -Xlinker -rpath,/home/ubuntu/.theano/compiledir_Linux-3.13--generic-x86_64-with-debian-jessie-sid-x86_64-2.7.11-64/cuda_ndarray -I/home/ubuntu/anaconda/lib/python2.7/site-packages/theano/sandbox/cuda -I/home/ubuntu/anaconda/lib/python2.7/site-packages/numpy/core/include -I/home/ubuntu/anaconda/include/python2.7 -I/home/ubuntu/anaconda/lib/python2.7/site-packages/theano/gof -o /home/ubuntu/.theano/compiledir_Linux-3.13--generic-x86_64-with-debian-jessie-sid-x86_64-2.7.11-64/cuda_ndarray/cuda_ndarray.so mod.cu -L/home/ubuntu/anaconda/lib -lcublas -lpython2.7 -lcudart
nvcc fatal : Path to libdevice library not specified
On Apr 9, 2017, at 9:24 AM, David Laxer ***@***.***> wrote:
Tensorflow still works with the GPU. Any suggestions on how to get only the toolkit with ‘nvcc’?
***@***.***:~$ nvidia-smi
Sun Apr 9 16:23:03 2017
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 367.57 Driver Version: 367.57 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GRID K520 Off | 0000:00:03.0 Off | N/A |
| N/A 33C P0 41W / 125W | 3742MiB / 4036MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 3222 C python 3740MiB |
+-----------------------------------------------------------------------------+
***@***.***:~$
***@***.***:~/models/neural_gpu$ python neural_gpu_trainer.py --problem=bmul
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcublas.so.7.5 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcudnn.so.5 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcufft.so.7.5 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcuda.so.1 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcurand.so.7.5 locally
Error appending to /tmp/neural_gpu/log
Creating checkpoint directory /tmp/neural_gpu.
Generating data for bmul.
cut 1.20 lr 0.100 iw 0.80 cr 0.30 nm 64 d0.1000 gn 4.00 layers 2 kw 3 h 4 kh 3 batch 32 noise 0.00
Creating model.
Creating backward pass for the model.
WARNING:tensorflow:Tried to colocate gpu0/gradients/gpu0/Gather_2_grad/Shape with an op target_embedding/read that had a different device: /device:GPU:0 vs /device:CPU:0. Ignoring colocation property.
Created model for gpu 0 in 6.29 s.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.1 instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations.
I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:910] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
I tensorflow/core/common_runtime/gpu/gpu_device.cc:885] Found device 0 with properties:
name: GRID K520
major: 3 minor: 0 memoryClockRate (GHz) 0.797
pciBusID 0000:00:03.0
Total memory: 3.94GiB
Free memory: 3.91GiB
I tensorflow/core/common_runtime/gpu/gpu_device.cc:906] DMA: 0
I tensorflow/core/common_runtime/gpu/gpu_device.cc:916] 0: Y
I tensorflow/core/common_runtime/gpu/gpu_device.cc:975] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GRID K520, pci bus id: 0000:00:03.0)
Created model. Checkpoint dir /tmp/neural_gpu
I tensorflow/core/common_runtime/gpu/pool_allocator.cc:247] PoolAllocator: After 2495 get requests, put_count=2461 evicted_count=1000 eviction_rate=0.406339 and unsatisfied allocation rate=0.454509
I tensorflow/core/common_runtime/gpu/pool_allocator.cc:259] Raising pool_size_limit_ from 100 to 110
I tensorflow/core/common_runtime/gpu/pool_allocator.cc:247] PoolAllocator: After 6023 get requests, put_count=4617 evicted_count=1000 eviction_rate=0.216591 and unsatisfied allocation rate=0.403287
I tensorflow/core/common_runtime/gpu/pool_allocator.cc:259] Raising pool_size_limit_ from 256 to 281
> On Apr 9, 2017, at 9:04 AM, David Laxer ***@***.*** ***@***.***>> wrote:
> Installation failed:
> The NVIDIA CUDA Toolkit provides command-line and graphical
> Do you accept the previously read EULA? (accept/decline/quit): accept
> Install NVIDIA Accelerated Graphics Driver for Linux-x86_64 352.39? ((y)es/(n)o/(q)uit): y
> Do you want to install the OpenGL libraries? ((y)es/(n)o/(q)uit) [ default is yes ]: n
> Install the CUDA 7.5 Toolkit? ((y)es/(n)o/(q)uit): y
> Enter Toolkit Location [ default is /usr/local/cuda-7.5 ]: y
> Toolkit location must be an absolute path.
> Enter Toolkit Location [ default is /usr/local/cuda-7.5 ]:
> Do you want to install a symbolic link at /usr/local/cuda? ((y)es/(n)o/(q)uit): y
> Install the CUDA 7.5 Samples? ((y)es/(n)o/(q)uit): y
> Enter CUDA Samples Location [ default is /home/ubuntu ]:
> Installing the NVIDIA display driver...
> The driver installation has failed due to an unknown error. Please consult the driver installation log located at /var/log/nvidia-installer.log.
> ===========
> = Summary =
> ===========
> Driver: Installation Failed
> Toolkit: Installation skipped
> Samples: Installation skipped
> Logfile is /tmp/cuda_install_14884.log
> ***@***.***:~$
> ERROR: An NVIDIA kernel module 'nvidia' appears to already be loaded in your kernel. This may be because it is in use (for example, by an X server, a CUDA program, or the NVIDIA Persistence Daemon), but this may also happen if your kernel was configured without support for module unloading. Please be sure to exit any programs that may be using the GPU(s) before attempting to upgrade your driver. If no GPU-based programs are running, you know that your kernel supports module unloading, and you still receive this message, then an error may have occured that has corrupted an NVIDIA kernel module's usage count, for which the simplest remedy is to reboot your computer.
> ERROR: Installation has failed. Please see the file '/var/log/nvidia-installer.log' for details. You may find suggestions on fixing installation problems in the README available on the Linux driver download page at
www.nvidia.com <
http://www.nvidia.com/>.
> 18,1 Bot
> ***@***.***:~$ lsmod | grep nvidia
> nvidia_drm 14412 0
> nvidia_modeset 764387 1 nvidia_drm
> nvidia 11488748 1 nvidia_modeset
> drm 303102 4 ttm,drm_kms_helper,cirrus,nvidia_drm
> ***@***.***:~$ sudo apt-get --purge remove nvidia*
> Reading package lists... Done
> Building dependency tree
> Reading state information... Done
> E: Unable to locate package nvidia-graphics-drivers
> E: Unable to locate package nvidia-graphics-drivers-367_367.57.orig.tar
> E: Couldn't find any package by regex 'nvidia-graphics-drivers-367_367.57.orig.tar'
>> On Apr 9, 2017, at 8:39 AM, David Laxer ***@***.*** ***@***.***>> wrote:
>> I’m using my own AMI. I’ve been having trouble getting the GPU on the g2.2xlarge instance to work.
>> I had to use the 367 version of the NVIDIA driver to get TensorFlow to work.
>> nvidia-graphics-drivers-367_367.57.orig.tar
>> Unfortunately, this package did not install ‘nvcc’. Other attempts to install CUDA installed the 375 version of the driver which doesn’t work
>> on the g2.2xlarge instance.
>> I’m trying this now:
>> cuda_7.5.18_linux.run
>> from:
>>
https://developer.nvidia.com/cuda-75-downloads-archive <
https://developer.nvidia.com/cuda-75-downloads-archive>
>>> On Apr 9, 2017, at 12:13 AM, Yuya Unno ***@***.*** ***@***.***>> wrote:
>>> How did you install CUDA? Chainer uses nvcc command, that is a compiler for CUDA environment and is provided by CUDA. You need to set PATH environment correctly.
>>> If you are using amazon linux AMI provided by NVIDIA, nvcc is already in PATH. And, all you need to do is pip install chainer.
>>> If you don't use it and install CUDA manually, I recommend you to install CUDA from deb package. And, you need to add /usr/local/cuda/bin to PATH environment variable. My fabfile may helps you:
https://github.com/unnonouno/chainer-vagrant-ec2/blob/master/fabfile.py <
https://github.com/unnonouno/chainer-vagrant-ec2/blob/master/fabfile.py>
>>> You are receiving this because you authored the thread.
>>> Reply to this email directly, view it on GitHub <
#2516 (comment)>, or mute the thread <
https://github.com/notifications/unsubscribe-auth/AC9i2zplWQK-tP2vhBtafTWSJk03Ckouks5ruIUvgaJpZM4M3z8a>.
I was told CUDA 8.0 doesn’t work with the g2.2xlarge AWS machine.
I tried running ‘lda2vec’ & chainer on the public AMI:
NVIDIA DIGITS 4.0 on Ubuntu 14.04-6250ff83-f573-4050-80fc-413ac84d044d-ami-94054383.3
ami-d5ce80b5
This works. Chainer can’t detect cuDNN 6.5 - which I installed. cudnn-6.5-linux-x64-v2.tar
But, lda2vec appears to runs properly.
In [3]: %load lda2vec_run.py
In [4]: for s, a, f in utils.chunks(batchsize, story_id, author_id, flattene
...: d):
...: t0 = time.time()
...: optimizer.zero_grads()
...: l = model.fit_partial(s.copy(), a.copy(), f.copy())
...: prior = model.prior()
...: loss = prior * fraction
...: loss.backward()
...: optimizer.update()
...: msg = ("J:{j:05d} E:{epoch:05d} L:{loss:1.3e} "
...: "P:{prior:1.3e} R:{rate:1.3e}")
...: prior.to_cpu()
...: loss.to_cpu()
...: t1 = time.time()
...: dt = t1 - t0
...: rate = batchsize / dt
...: logs = dict(loss=float(l), epoch=epoch, j=j,
...: prior=float(prior.data), rate=rate)
...: print msg.format(**logs)
...: j += 1
...: serializers.save_hdf5("lda2vec.hdf5", model)
In [4]: for s, a, f in utils.chunks(batchsize, story_id, author_id, flattene
...: d):
...: t0 = time.time()
...: optimizer.zero_grads()
...: l = model.fit_partial(s.copy(), a.copy(), f.copy())
...: prior = model.prior()
...: loss = prior * fraction
...: loss.backward()
...: optimizer.update()
...: msg = ("J:{j:05d} E:{epoch:05d} L:{loss:1.3e} "
...: "P:{prior:1.3e} R:{rate:1.3e}")
...: prior.to_cpu()
...: loss.to_cpu()
...: t1 = time.time()
...: dt = t1 - t0
...: rate = batchsize / dt
...: logs = dict(loss=float(l), epoch=epoch, j=j,
...: prior=float(prior.data), rate=rate)
...: print msg.format(**logs)
...: j += 1
...: serializers.save_hdf5("lda2vec.hdf5", model)
---------------------------------------------------------------------------
RuntimeError Traceback (most recent call last)
<ipython-input-4-baa2abd917df> in <module>()
21 gpu_id = int(os.getenv('CUDA_GPU', 0))
---> 22 cuda.get_device(gpu_id).use()
23 print "Using GPU " + str(gpu_id)
/home/ubuntu/anaconda2/lib/python2.7/site-packages/chainer-2.0.0b1-py2.7.egg/chainer/cuda.pyc in get_device(*args)
168 for arg in args:
169 if type(arg) in _integer_types:
--> 170 check_cuda_available()
171 return Device(arg)
172 if isinstance(arg, ndarray):
/home/ubuntu/anaconda2/lib/python2.7/site-packages/chainer-2.0.0b1-py2.7.egg/chainer/cuda.pyc in check_cuda_available()
73 '(see
https://github.com/pfnet/chainer#installation).')
74 msg += str(_resolution_error)
---> 75 raise RuntimeError(msg)
76 if (not cudnn_enabled and
77 not _cudnn_disabled_by_user and
RuntimeError: CUDA environment is not correctly set up
(see
https://github.com/pfnet/chainer#installation).No module named cupy
Still not sure how to configure my AMI.
On Apr 11, 2017, at 1:46 AM, Yuya Unno ***@***.***> wrote:
How about using latest CUDA 8.0? I recommend to use deb file instead of run file. I feel run is unstable.
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub <
#2516 (comment)>, or mute the thread <
https://github.com/notifications/unsubscribe-auth/AC9i2y9e4btK1QjdfJ1Zcib6vtty8MhMks5ruz33gaJpZM4M3z8a>.
I’m running on AWS EC2 g2.x2large spot instances.
Does chainer support checkpointing, so that I can continue training my models at the point that the spot instance was preempted?
On Apr 11, 2017, at 3:35 AM, David Laxer ***@***.***> wrote:
I was told CUDA 8.0 doesn’t work with the g2.2xlarge AWS machine.
I tried running ‘lda2vec’ & chainer on the public AMI:
NVIDIA DIGITS 4.0 on Ubuntu 14.04-6250ff83-f573-4050-80fc-413ac84d044d-ami-94054383.3
ami-d5ce80b5
This works. Chainer can’t detect cuDNN 6.5 - which I installed. cudnn-6.5-linux-x64-v2.tar
But, lda2vec appears to runs properly.
In [3]: %load lda2vec_run.py
In [4]: for s, a, f in utils.chunks(batchsize, story_id, author_id, flattene
...: d):
...: t0 = time.time()
...: optimizer.zero_grads()
...: l = model.fit_partial(s.copy(), a.copy(), f.copy())
...: prior = model.prior()
...: loss = prior * fraction
...: loss.backward()
...: optimizer.update()
...: msg = ("J:{j:05d} E:{epoch:05d} L:{loss:1.3e} "
...: "P:{prior:1.3e} R:{rate:1.3e}")
...: prior.to_cpu()
...: loss.to_cpu()
...: t1 = time.time()
...: dt = t1 - t0
...: rate = batchsize / dt
...: logs = dict(loss=float(l), epoch=epoch, j=j,
...: prior=float(prior.data), rate=rate)
...: print msg.format(**logs)
...: j += 1
...: serializers.save_hdf5("lda2vec.hdf5", model)
In [4]: for s, a, f in utils.chunks(batchsize, story_id, author_id, flattene
...: d):
...: t0 = time.time()
...: optimizer.zero_grads()
...: l = model.fit_partial(s.copy(), a.copy(), f.copy())
...: prior = model.prior()
...: loss = prior * fraction
...: loss.backward()
...: optimizer.update()
...: msg = ("J:{j:05d} E:{epoch:05d} L:{loss:1.3e} "
...: "P:{prior:1.3e} R:{rate:1.3e}")
...: prior.to_cpu()
...: loss.to_cpu()
...: t1 = time.time()
...: dt = t1 - t0
...: rate = batchsize / dt
...: logs = dict(loss=float(l), epoch=epoch, j=j,
...: prior=float(prior.data), rate=rate)
...: print msg.format(**logs)
...: j += 1
...: serializers.save_hdf5("lda2vec.hdf5", model)
---------------------------------------------------------------------------
RuntimeError Traceback (most recent call last)
<ipython-input-4-baa2abd917df> in <module>()
21 gpu_id = int(os.getenv('CUDA_GPU', 0))
---> 22 cuda.get_device(gpu_id).use()
23 print "Using GPU " + str(gpu_id)
/home/ubuntu/anaconda2/lib/python2.7/site-packages/chainer-2.0.0b1-py2.7.egg/chainer/cuda.pyc in get_device(*args)
168 for arg in args:
169 if type(arg) in _integer_types:
--> 170 check_cuda_available()
171 return Device(arg)
172 if isinstance(arg, ndarray):
/home/ubuntu/anaconda2/lib/python2.7/site-packages/chainer-2.0.0b1-py2.7.egg/chainer/cuda.pyc in check_cuda_available()
73 '(see
https://github.com/pfnet/chainer#installation <
https://github.com/pfnet/chainer#installation>).')
74 msg += str(_resolution_error)
---> 75 raise RuntimeError(msg)
76 if (not cudnn_enabled and
77 not _cudnn_disabled_by_user and
RuntimeError: CUDA environment is not correctly set up
(see
https://github.com/pfnet/chainer#installation <
https://github.com/pfnet/chainer#installation>).No module named cupy
Still not sure how to configure my AMI.
> On Apr 11, 2017, at 1:46 AM, Yuya Unno ***@***.*** ***@***.***>> wrote:
> How about using latest CUDA 8.0? I recommend to use deb file instead of run file. I feel run is unstable.
> You are receiving this because you authored the thread.
> Reply to this email directly, view it on GitHub <
#2516 (comment)>, or mute the thread <
https://github.com/notifications/unsubscribe-auth/AC9i2y9e4btK1QjdfJ1Zcib6vtty8MhMks5ruz33gaJpZM4M3z8a>.