添加链接
link管理
链接快照平台
  • 输入网页链接,自动生成快照
  • 标签化管理网页链接

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement . We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

I'm trying to run 'lda2vec' which calls 'chainer' on my AWS g2.2xlarge Ubuntu 14.04 LTS instance.
I'm getting this error:

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-4-bccd002b2788> in <module>()
     21 gpu_id = int(os.getenv('CUDA_GPU', 0))
---> 22 cuda.get_device(gpu_id).use()
     23 print "Using GPU " + str(gpu_id)
/home/ubuntu/anaconda/lib/python2.7/site-packages/chainer-1.22.0-py2.7-linux-x86_64.egg/chainer/cuda.pyc in get_device(*args)
    220     for arg in args:
    221         if type(arg) in _integer_types:
--> 222             check_cuda_available()
    223             return Device(arg)
    224         if isinstance(arg, ndarray):
/home/ubuntu/anaconda/lib/python2.7/site-packages/chainer-1.22.0-py2.7-linux-x86_64.egg/chainer/cuda.pyc in check_cuda_available()
     85                '(see https://github.com/pfnet/chainer#installation).')
     86         msg += str(_resolution_error)
---> 87         raise RuntimeError(msg)
     88     if (not cudnn_enabled and
     89             not _cudnn_disabled_by_user and
RuntimeError: CUDA environment is not correctly set up
(see https://github.com/pfnet/chainer#installation).CuPy is not correctly installed. Please check your environment, uninstall Chainer and reinstall it with `pip install chainer --no-cache-dir -vvvv`.
original error: libcublas.so.7.0: cannot open shared object file: No such file or directory

I installed 'chainer' from source:

In [9]: import chainer
In [10]: chainer.__version__
Out[10]: '1.22.0'

Tensorflow runs and utilizes the GPU (so we know CUDA 7.5 is working):

In [1]: import tensorflow
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcublas.so.7.5 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcudnn.so.5 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcufft.so.7.5 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcuda.so.1 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcurand.so.7.5 locally

NVIDIA graphics driver: nvidia-graphics-drivers-367_367.57.orig.tar
cuda 7.5: cudnn-7.5-linux-x64-v5.1.tgz

ubuntu@ip-10-0-1-127:~$ ls -l /usr/local/cuda-7.0/lib64/libcublas.so.7.0*
lrwxrwxrwx 1 root root       19 Mar 27 16:01 /usr/local/cuda-7.0/lib64/libcublas.so.7.0 -> libcublas.so.7.0.28
-rwxr-xr-x 1 root root 31160168 Mar 27 16:02 /usr/local/cuda-7.0/lib64/libcublas.so.7.0.28
ubuntu@ip-10-0-1-127:~$ ls -l /usr/local/cuda-7.0/lib64/
total 723244
-rw-r--r-- 1 root root 26032916 Mar 27 16:01 libcublas_device.a
lrwxrwxrwx 1 root root       16 Mar 27 16:01 libcublas.so -> libcublas.so.7.0
lrwxrwxrwx 1 root root       19 Mar 27 16:01 libcublas.so.7.0 -> libcublas.so.7.0.28
-rwxr-xr-x 1 root root 31160168 Mar 27 16:02 libcublas.so.7.0.28
-rw-r--r-- 1 root root 35269768 Mar 27 16:02 libcublas_static.a
-rw-r--r-- 1 root root   310328 Mar 27 16:01 libcudadevrt.a
lrwxrwxrwx 1 root root       16 Mar 27 16:02 libcudart.so -> libcudart.so.7.0
lrwxrwxrwx 1 root root       19 Mar 27 16:01 libcudart.so.7.0 -> libcudart.so.7.0.28
-rwxr-xr-x 1 root root   377896 Mar 27 16:01 libcudart.so.7.0.28
-rw-r--r-- 1 root root   708938 Mar 27 16:02 libcudart_static.a
lrwxrwxrwx 1 root root       15 Mar 27 16:01 libcufft.so -> libcufft.so.7.0
lrwxrwxrwx 1 root root       18 Mar 27 16:01 libcufft.so.7.0 -> libcufft.so.7.0.28
-rwxr-xr-x 1 root root 62554648 Mar 27 16:01 libcufft.so.7.0.28
-rw-r--r-- 1 root root 91554528 Mar 27 16:02 libcufft_static.a
lrwxrwxrwx 1 root root       16 Mar 27 16:02 libcufftw.so -> libcufftw.so.7.0
lrwxrwxrwx 1 root root       19 Mar 27 16:02 libcufftw.so.7.0 -> libcufftw.so.7.0.28
-rwxr-xr-x 1 root root   443648 Mar 27 16:01 libcufftw.so.7.0.28
-rw-r--r-- 1 root root    44288 Mar 27 16:01 libcufftw_static.a
lrwxrwxrwx 1 root root       17 Mar 27 16:01 libcuinj64.so -> libcuinj64.so.7.0
lrwxrwxrwx 1 root root       20 Mar 27 16:01 libcuinj64.so.7.0 -> libcuinj64.so.7.0.28
-rwxr-xr-x 1 root root  5276064 Mar 27 16:01 libcuinj64.so.7.0.28
-rw-r--r-- 1 root root  1649726 Mar 27 16:01 libculibos.a
lrwxrwxrwx 1 root root       16 Mar 27 16:01 libcurand.so -> libcurand.so.7.0
lrwxrwxrwx 1 root root       19 Mar 27 16:02 libcurand.so.7.0 -> libcurand.so.7.0.28
-rwxr-xr-x 1 root root 51746464 Mar 27 16:01 libcurand.so.7.0.28
-rw-r--r-- 1 root root 51981092 Mar 27 16:02 libcurand_static.a
lrwxrwxrwx 1 root root       18 Mar 27 16:01 libcusolver.so -> libcusolver.so.7.0
lrwxrwxrwx 1 root root       21 Mar 27 16:01 libcusolver.so.7.0 -> libcusolver.so.7.0.28
-rwxr-xr-x 1 root root 41170544 Mar 27 16:01 libcusolver.so.7.0.28
-rw-r--r-- 1 root root 17900430 Mar 27 16:01 libcusolver_static.a
lrwxrwxrwx 1 root root       18 Mar 27 16:01 libcusparse.so -> libcusparse.so.7.0
lrwxrwxrwx 1 root root       21 Mar 27 16:01 libcusparse.so.7.0 -> libcusparse.so.7.0.28
-rwxr-xr-x 1 root root 43486576 Mar 27 16:01 libcusparse.so.7.0.28
-rw-r--r-- 1 root root 51094980 Mar 27 16:01 libcusparse_static.a
lrwxrwxrwx 1 root root       14 Mar 27 16:01 libnppc.so -> libnppc.so.7.0
lrwxrwxrwx 1 root root       17 Mar 27 16:02 libnppc.so.7.0 -> libnppc.so.7.0.28
-rwxr-xr-x 1 root root   418944 Mar 27 16:02 libnppc.so.7.0.28
-rw-r--r-- 1 root root    20568 Mar 27 16:01 libnppc_static.a
lrwxrwxrwx 1 root root       14 Mar 27 16:02 libnppi.so -> libnppi.so.7.0
lrwxrwxrwx 1 root root       17 Mar 27 16:02 libnppi.so.7.0 -> libnppi.so.7.0.28
-rwxr-xr-x 1 root root 70595696 Mar 27 16:01 libnppi.so.7.0.28
-rw-r--r-- 1 root root 99608960 Mar 27 16:01 libnppi_static.a
lrwxrwxrwx 1 root root       14 Mar 27 16:02 libnpps.so -> libnpps.so.7.0
lrwxrwxrwx 1 root root       17 Mar 27 16:01 libnpps.so.7.0 -> libnpps.so.7.0.28
-rwxr-xr-x 1 root root  5851016 Mar 27 16:01 libnpps.so.7.0.28
-rw-r--r-- 1 root root  8461132 Mar 27 16:02 libnpps_static.a
lrwxrwxrwx 1 root root       16 Mar 27 16:02 libnvblas.so -> libnvblas.so.7.0
lrwxrwxrwx 1 root root       19 Mar 27 16:02 libnvblas.so.7.0 -> libnvblas.so.7.0.28
-rwxr-xr-x 1 root root   451984 Mar 27 16:01 libnvblas.so.7.0.28
lrwxrwxrwx 1 root root       24 Mar 27 16:01 libnvrtc-builtins.so -> libnvrtc-builtins.so.7.0
lrwxrwxrwx 1 root root       27 Mar 27 16:01 libnvrtc-builtins.so.7.0 -> libnvrtc-builtins.so.7.0.28
-rwxr-xr-x 1 root root 24079328 Mar 27 16:01 libnvrtc-builtins.so.7.0.28
lrwxrwxrwx 1 root root       15 Mar 27 16:01 libnvrtc.so -> libnvrtc.so.7.0
lrwxrwxrwx 1 root root       18 Mar 27 16:01 libnvrtc.so.7.0 -> libnvrtc.so.7.0.27
-rwxr-xr-x 1 root root 18179472 Mar 27 16:01 libnvrtc.so.7.0.27
lrwxrwxrwx 1 root root       18 Mar 27 16:02 libnvToolsExt.so -> libnvToolsExt.so.1
lrwxrwxrwx 1 root root       22 Mar 27 16:01 libnvToolsExt.so.1 -> libnvToolsExt.so.1.0.0
-rwxr-xr-x 1 root root    37568 Mar 27 16:01 libnvToolsExt.so.1.0.0
-rw-r--r-- 1 root root    25840 Mar 27 16:02 libOpenCL.so
drwxr-xr-x 2 root root     4096 Mar 27 16:02 stubs
ubuntu@ip-10-0-1-127:~$ 
  • Is this going to be a problem?
  • ERROR (theano.sandbox.cuda): nvcc compiler not found on $PATH. Check your nvcc installation and try again.

    OSError Traceback (most recent call last)
    in ()
    102 t0 = time.time()
    103 optimizer.zero_grads()
    --> 104 l = model.fit_partial(s.copy(), a.copy(), f.copy())
    105 prior = model.prior()
    106 loss = prior * fraction

    /hdfs/lda2vec/examples/hacker_news/lda2vec/lda2vec_model.pyc in fit_partial(self, rsty_ids, raut_ids, rwrd_ids, window)
    40 sty_ids, aut_ids, wrd_ids = move(self.xp, rsty_ids, raut_ids, rwrd_ids)
    41 pivot_idx = next(move(self.xp, rwrd_ids[window: -window]))
    ---> 42 pivot = F.embed_id(pivot_idx, self.sampler.W)
    43 sty_at_pivot = rsty_ids[window: -window]
    44 aut_at_pivot = raut_ids[window: -window]

    /home/ubuntu/anaconda/lib/python2.7/site-packages/chainer-1.22.0-py2.7-linux-x86_64.egg/chainer/functions/connection/embed_id.pyc in embed_id(x, W, ignore_label)
    110 """
    --> 111 return EmbedIDFunction(ignore_label=ignore_label)(x, W)

    /home/ubuntu/anaconda/lib/python2.7/site-packages/chainer-1.22.0-py2.7-linux-x86_64.egg/chainer/function.pyc in call(self, *inputs)
    197 # Forward prop
    198 with cuda.get_device(*in_data):
    --> 199 outputs = self.forward(in_data)
    200 assert type(outputs) == tuple
    201 for hook in six.itervalues(hooks):

    /home/ubuntu/anaconda/lib/python2.7/site-packages/chainer-1.22.0-py2.7-linux-x86_64.egg/chainer/functions/connection/embed_id.pyc in forward(self, inputs)
    47 mask[..., None], 0, W.take(xp.where(mask, 0, x), axis=0)),
    ---> 49 return W.take(x, axis=0),
    51 def backward(self, inputs, grad_outputs):

    cupy/core/core.pyx in cupy.core.core.ndarray.take (cupy/core/core.cpp:13176)()

    cupy/core/core.pyx in cupy.core.core.ndarray.take (cupy/core/core.cpp:13065)()

    cupy/core/core.pyx in cupy.core.core._take (cupy/core/core.cpp:67356)()

    cupy/core/elementwise.pxi in cupy.core.core.ElementwiseKernel.call (cupy/core/core.cpp:42995)()

    cupy/util.pyx in cupy.util.memoize.decorator.ret (cupy/util.cpp:1471)()

    cupy/core/elementwise.pxi in cupy.core.core._get_elementwise_kernel (cupy/core/core.cpp:41341)()

    cupy/core/elementwise.pxi in cupy.core.core._get_simple_elementwise_kernel (cupy/core/core.cpp:33972)()

    cupy/core/elementwise.pxi in cupy.core.core._get_simple_elementwise_kernel (cupy/core/core.cpp:33794)()

    cupy/core/carray.pxi in cupy.core.core.compile_with_cache (cupy/core/core.cpp:33449)()

    /home/ubuntu/anaconda/lib/python2.7/site-packages/chainer-1.22.0-py2.7-linux-x86_64.egg/cupy/cuda/compiler.pyc in compile_with_cache(source, options, arch, cache_dir)
    123 options += '-m32',
    --> 125 env = (arch, options, _get_nvcc_version())
    126 if '#include' in source:
    127 pp_src = '%s %s' % (env, preprocess(source, options))

    /home/ubuntu/anaconda/lib/python2.7/site-packages/chainer-1.22.0-py2.7-linux-x86_64.egg/cupy/cuda/compiler.pyc in _get_nvcc_version()
    20 if _nvcc_version is None:
    21 cmd = ['nvcc', '--version']
    ---> 22 _nvcc_version = _run_nvcc(cmd, '.')
    24 return _nvcc_version

    /home/ubuntu/anaconda/lib/python2.7/site-packages/chainer-1.22.0-py2.7-linux-x86_64.egg/cupy/cuda/compiler.pyc in _run_nvcc(cmd, cwd)
    59 'Check PATH environment variable: '
    60 + str(e)
    ---> 61 raise OSError(msg)

    OSError: Failed to run nvcc command. Check PATH environment variable: [Errno 2] No such file or directory

    In [5]:

    I tried installing the cuda-toolkit to to get nvcc.

    $ sudo apt-get install nvidia-cuda-toolkit

    It installed the 375 version of the driver and broke CUDA. I need the 376 version.

    WARNING (theano.sandbox.cuda): CUDA is installed, but device gpu is not available (error: Unable to get the number of gpus available: unknown error)

    How did you install CUDA? Chainer uses nvcc command, that is a compiler for CUDA environment and is provided by CUDA. You need to set PATH environment correctly.
    If you are using amazon linux AMI provided by NVIDIA, nvcc is already in PATH. And, all you need to do is pip install chainer.
    If you don't use it and install CUDA manually, I recommend you to install CUDA from deb package. And, you need to add /usr/local/cuda/bin to PATH environment variable. My fabfile may helps you: https://github.com/unnonouno/chainer-vagrant-ec2/blob/master/fabfile.py