get_data(32,4) brings up an error
NotADirectoryError: [Errno 20] Not a directory: ‘data/cifar10/train/0_frog.png’
How do if ix this ?
Can you be bit more specific? What Notebook / Location. Screenshots?
Setup: Are you using Local system (git clone?) or Paperspace Fast.ai or Crestle or any other environment?
Try a few thing -
!pwd
- to see what is the current working directory
!ls
or
!dir
to see what’s in your current working dir. Do you see a folder called data? Then do the same for subfolders.
Seeing this as well. I solved it by making folders for classes in “train” and “test”.
Remember how we did cats and dogs in lesson 1?
train/cats
and
train/dogs
I wanted to see what classes we had:
cd train && find . | grep -o [a-z]*.png | sort -u && cd .
We have: airplane, automobile, bird, cat, deer, dog, frog, horse, ship, truck
I made new folders
mkdir train_ test_
I went into one of them to make our classes, created the fn to organize the files, and executed it:
cd train_
mkdir airplane automobile bird cat deer dog frog horse ship truck
cd ..
function copytrain { for arg in $@; do cp $(find train -name '*'$arg'.png') train_/$arg/; done; };
copytrain $(ls train_ | grep -o "[a-z]*")
It took a few minutes to run. Lots of files.
Then repeat 1-5, but with
test
and
test_
instead of
train
and
train_
Now it all works. This is because that
from_paths
method is expecting folders for the classes.
Make sure the new folders you created match the names you provide to
from_paths
val_name
and
trn_name
.
confirmed. when I moved the 50 000 train and 10 000 test files into nested subdirectories, dataloader worked. I don’t know why this is ‘new’
Excuse if the below code is awful python. I don’t like looping individual files. (note I haven’t put in code to delete the original files)
#note plane would work for airplane, car wouldnt for automobile
classes = ('airplane', 'automobile', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck')
cd ./data/cifar10/train
#cd ./data/cifar10/test
If someone wants to do this in python (went with python since I’m on windows) this was the code I used (assumes either your .py file or notebook file is located in the courses/dl1 directory):
import os
import glob
import shutil
classes = ('airplane', 'automobile', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck')
cwd = os.getcwd()
train_path = cwd + '/data/cifar/train/'
# go through classes and make a directory for each one
for class_now in classes:
path_now = train_path + class_now
if not os.path.exists(path_now):
os.makedirs(path_now)
# go through classes and match them with file names
# file names are e.g. '123_frog.png' so glob picks out all the e.g. frog files
for class_now in classes:
identifier = train_path + '*' + class_now + '.png'
class_files = glob.glob(identifier)
file_destination = train_path + class_now
# move all frog files to proper class directory
for file_to_move in class_files:
shutil.move(file_to_move, file_destination)
# do all the same but now for the test data
test_path = cwd + '/data/cifar/test/'
for class_now in classes:
path_now = test_path + class_now
if not os.path.exists(path_now):
os.makedirs(path_now)
for class_now in classes:
identifier = test_path + '*' + class_now + '.png'
class_files = glob.glob(identifier)
file_destination = test_path + class_now
for file_to_move in class_files:
shutil.move(file_to_move, file_destination)
Hi
@jsonm
Thanks a lot. This was very helpful. But I run into a different error after following your instructions.
This is the error I get when I run
data = get_data(32,4)
:
ValueError Traceback (most recent call last)
<ipython-input-47-6a185ac353fc> in <module>()
----> 1 data = get_data(32,4)
<ipython-input-45-88c9e0487857> in get_data(sz, bs)
1 def get_data(sz,bs):
2 tfms = tfms_from_stats(stats, sz, aug_tfms=[RandomFlip()], pad=sz//8)
----> 3 return ImageClassifierData.from_paths(PATH, val_name='test', tfms=tfms, bs=bs)
~/fastai/courses/dl1/fastai/dataset.py in from_paths(cls, path, bs, tfms, trn_name, val_name, test_name, test_with_labels, num_workers)
423 test = folder_source(path, test_name) if test_with_labels else read_dir(path, test_name)
424 else: test = None
--> 425 datasets = cls.get_ds(FilesIndexArrayDataset, trn, val, tfms, path=path, test=test)
426 return cls(path, datasets, bs, num_workers, classes=trn[2])
~/fastai/courses/dl1/fastai/dataset.py in get_ds(fn, trn, val, tfms, test, **kwargs)
362 res = [
363 fn(trn[0], trn[1], tfms[0], **kwargs), # train
--> 364 fn(val[0], val[1], tfms[1], **kwargs), # val
365 fn(trn[0], trn[1], tfms[1], **kwargs), # fix
366 fn(val[0], val[1], tfms[0], **kwargs) # aug
~/fastai/courses/dl1/fastai/dataset.py in __init__(self, fnames, y, transform, path)
259 self.y=y
260 assert(len(fnames)==len(y))
--> 261 super().__init__(fnames, transform, path)
262 def get_y(self, i): return self.y[i]
263 def get_c(self):
~/fastai/courses/dl1/fastai/dataset.py in __init__(self, fnames, transform, path)
235 def __init__(self, fnames, transform, path):
236 self.path,self.fnames = path,fnames
--> 237 super().__init__(transform)
238 def get_sz(self): return self.transform.sz
239 def get_x(self, i): return open_image(os.path.join(self.path, self.fnames[i]))
~/fastai/courses/dl1/fastai/dataset.py in __init__(self, transform)
154 self.transform = transform
155 self.n = self.get_n()
--> 156 self.c = self.get_c()
157 self.sz = self.get_sz()
~/fastai/courses/dl1/fastai/dataset.py in get_c(self)
266 class FilesIndexArrayDataset(FilesArrayDataset):
--> 267 def get_c(self): return int(self.y.max())+1
~/anaconda3/envs/fastai/lib/python3.6/site-packages/numpy/core/_methods.py in _amax(a, axis, out, keepdims)
24 # small reductions
25 def _amax(a, axis=None, out=None, keepdims=False):
---> 26 return umr_maximum(a, axis, None, out, keepdims)
28 def _amin(a, axis=None, out=None, keepdims=False):
ValueError: zero-size array to reduction operation maximum which has no identity
That’s written in bash.
So if you’re on a unix machine (linux / mac os), you can just run it from the target directory in the terminal.
If you’re on windows, you’ll need a bash emulator- Git Bash works well.
Bash is useful- but it’s just one way to interact with unix (which is what you’ll actually want to learn)
Unlike learning a language, it’s not vital to be fluent in bash to get some incredibly useful things done.
Learning basic syntax and how to use some tools like
grep
and
awk
are helpful for most things you’ll want to do, but honestly, it’s very task specific.
Depending on what you’re trying to do, you’ll be using dramatically different tools/binaries which will have their own docs/usage.
I’d say the most useful things to learn would be (loosely in this order):
Variables (and inline execution)
Piping / Redirection
Popular unix commands
Regex
Conditions
Loops
This is a pretty good resource.
It’s one of those things that is easiest to learn by doing.
-----os.mkdir(‘cifar/train1/’ + x)
for x in images_name:
-----dir_name = x.split(’_’)[1][:-4]
-----os.renames(‘cifar/train/’+x,‘cifar/train1/’+ dir_name + ‘/’ +x)
images_name = os.listdir(‘cifar/test’)
os.mkdir(‘cifar/test1’)
for x in classes:
-----os.mkdir(‘cifar/test1/’ + x)
for x in images_name:
-----dir_name = x.split(’_’)[1][:-4]
-----os.renames(‘cifar/test/’+x,‘cifar/test1/’+ dir_name + ‘/’ +x)