添加链接
link管理
链接快照平台
  • 输入网页链接,自动生成快照
  • 标签化管理网页链接
相关文章推荐
慈祥的课本  ·  AttributeError: ...·  1 周前    · 
好帅的小熊猫  ·  OpenPDF/pdf-toolbox/sr ...·  5 天前    · 
重情义的炒粉  ·  programiz-zh/docs/cpp/ ...·  昨天    · 
光明磊落的卤蛋  ·  event 关键字 - C# ...·  10 月前    · 
豁达的钢笔  ·  Python List count()方法 ...·  1 年前    · 

get_data(32,4) brings up an error
NotADirectoryError: [Errno 20] Not a directory: ‘data/cifar10/train/0_frog.png’
How do if ix this ?

Can you be bit more specific? What Notebook / Location. Screenshots?

Setup: Are you using Local system (git clone?) or Paperspace Fast.ai or Crestle or any other environment?

Try a few thing -

!pwd - to see what is the current working directory !ls or !dir to see what’s in your current working dir. Do you see a folder called data? Then do the same for subfolders.

Seeing this as well. I solved it by making folders for classes in “train” and “test”.

Remember how we did cats and dogs in lesson 1?
train/cats and train/dogs

I wanted to see what classes we had:
cd train && find . | grep -o [a-z]*.png | sort -u && cd .

We have: airplane, automobile, bird, cat, deer, dog, frog, horse, ship, truck

I made new folders
mkdir train_ test_

I went into one of them to make our classes, created the fn to organize the files, and executed it:

cd train_

mkdir airplane automobile bird cat deer dog frog horse ship truck

cd ..

function copytrain { for arg in $@; do cp $(find train -name '*'$arg'.png') train_/$arg/; done; };

copytrain $(ls train_ | grep -o "[a-z]*")

It took a few minutes to run. Lots of files.

Then repeat 1-5, but with test and test_ instead of train and train_

Now it all works. This is because that from_paths method is expecting folders for the classes.

Make sure the new folders you created match the names you provide to from_paths val_name and trn_name .

confirmed. when I moved the 50 000 train and 10 000 test files into nested subdirectories, dataloader worked. I don’t know why this is ‘new’ Excuse if the below code is awful python. I don’t like looping individual files. (note I haven’t put in code to delete the original files) #note plane would work for airplane, car wouldnt for automobile classes = ('airplane', 'automobile', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck') cd ./data/cifar10/train #cd ./data/cifar10/test

If someone wants to do this in python (went with python since I’m on windows) this was the code I used (assumes either your .py file or notebook file is located in the courses/dl1 directory):

import os
import glob
import shutil
classes = ('airplane', 'automobile', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck')
cwd = os.getcwd()
train_path = cwd + '/data/cifar/train/'
# go through classes and make a directory for each one
for class_now in classes:
    path_now = train_path + class_now
    if not os.path.exists(path_now):
        os.makedirs(path_now)
# go through classes and match them with file names
# file names are e.g. '123_frog.png' so glob picks out all the e.g. frog files
for class_now in classes:
    identifier = train_path + '*' + class_now + '.png'
    class_files = glob.glob(identifier)
    file_destination = train_path + class_now
    # move all frog files to proper class directory
    for file_to_move in class_files:
        shutil.move(file_to_move, file_destination)
# do all the same but now for the test data
test_path = cwd + '/data/cifar/test/'
for class_now in classes:
    path_now = test_path + class_now
    if not os.path.exists(path_now):
        os.makedirs(path_now)
for class_now in classes:
    identifier = test_path + '*' + class_now + '.png'
    class_files = glob.glob(identifier)
    file_destination = test_path + class_now
    for file_to_move in class_files:
        shutil.move(file_to_move, file_destination)

Hi @jsonm
Thanks a lot. This was very helpful. But I run into a different error after following your instructions.
This is the error I get when I run data = get_data(32,4) :

ValueError                                Traceback (most recent call last)
<ipython-input-47-6a185ac353fc> in <module>()
----> 1 data = get_data(32,4)
<ipython-input-45-88c9e0487857> in get_data(sz, bs)
      1 def get_data(sz,bs):
      2     tfms = tfms_from_stats(stats, sz, aug_tfms=[RandomFlip()], pad=sz//8)
----> 3     return ImageClassifierData.from_paths(PATH, val_name='test', tfms=tfms, bs=bs)
~/fastai/courses/dl1/fastai/dataset.py in from_paths(cls, path, bs, tfms, trn_name, val_name, test_name, test_with_labels, num_workers)
    423             test = folder_source(path, test_name) if test_with_labels else read_dir(path, test_name)
    424         else: test = None
--> 425         datasets = cls.get_ds(FilesIndexArrayDataset, trn, val, tfms, path=path, test=test)
    426         return cls(path, datasets, bs, num_workers, classes=trn[2])
~/fastai/courses/dl1/fastai/dataset.py in get_ds(fn, trn, val, tfms, test, **kwargs)
    362         res = [
    363             fn(trn[0], trn[1], tfms[0], **kwargs), # train
--> 364             fn(val[0], val[1], tfms[1], **kwargs), # val
    365             fn(trn[0], trn[1], tfms[1], **kwargs), # fix
    366             fn(val[0], val[1], tfms[0], **kwargs)  # aug
~/fastai/courses/dl1/fastai/dataset.py in __init__(self, fnames, y, transform, path)
    259         self.y=y
    260         assert(len(fnames)==len(y))
--> 261         super().__init__(fnames, transform, path)
    262     def get_y(self, i): return self.y[i]
    263     def get_c(self):
~/fastai/courses/dl1/fastai/dataset.py in __init__(self, fnames, transform, path)
    235     def __init__(self, fnames, transform, path):
    236         self.path,self.fnames = path,fnames
--> 237         super().__init__(transform)
    238     def get_sz(self): return self.transform.sz
    239     def get_x(self, i): return open_image(os.path.join(self.path, self.fnames[i]))
~/fastai/courses/dl1/fastai/dataset.py in __init__(self, transform)
    154         self.transform = transform
    155         self.n = self.get_n()
--> 156         self.c = self.get_c()
    157         self.sz = self.get_sz()
~/fastai/courses/dl1/fastai/dataset.py in get_c(self)
    266 class FilesIndexArrayDataset(FilesArrayDataset):
--> 267     def get_c(self): return int(self.y.max())+1
~/anaconda3/envs/fastai/lib/python3.6/site-packages/numpy/core/_methods.py in _amax(a, axis, out, keepdims)
     24 # small reductions
     25 def _amax(a, axis=None, out=None, keepdims=False):
---> 26     return umr_maximum(a, axis, None, out, keepdims)
     28 def _amin(a, axis=None, out=None, keepdims=False):
ValueError: zero-size array to reduction operation maximum which has no identity

That’s written in bash.

So if you’re on a unix machine (linux / mac os), you can just run it from the target directory in the terminal.

If you’re on windows, you’ll need a bash emulator- Git Bash works well.

Bash is useful- but it’s just one way to interact with unix (which is what you’ll actually want to learn)

Unlike learning a language, it’s not vital to be fluent in bash to get some incredibly useful things done.

Learning basic syntax and how to use some tools like grep and awk are helpful for most things you’ll want to do, but honestly, it’s very task specific.

Depending on what you’re trying to do, you’ll be using dramatically different tools/binaries which will have their own docs/usage.

I’d say the most useful things to learn would be (loosely in this order):

  • Variables (and inline execution)
  • Piping / Redirection
  • Popular unix commands
  • Regex
  • Conditions
  • Loops
  • This is a pretty good resource.

    It’s one of those things that is easiest to learn by doing.

    -----os.mkdir(‘cifar/train1/’ + x)
    for x in images_name:
    -----dir_name = x.split(’_’)[1][:-4]
    -----os.renames(‘cifar/train/’+x,‘cifar/train1/’+ dir_name + ‘/’ +x)

    images_name = os.listdir(‘cifar/test’)
    os.mkdir(‘cifar/test1’)
    for x in classes:
    -----os.mkdir(‘cifar/test1/’ + x)
    for x in images_name:
    -----dir_name = x.split(’_’)[1][:-4]
    -----os.renames(‘cifar/test/’+x,‘cifar/test1/’+ dir_name + ‘/’ +x)