添加链接
link管理
链接快照平台
  • 输入网页链接,自动生成快照
  • 标签化管理网页链接

Hello all

I am very confuse with how data structure works in fast.ai
I am trying to predict the test data after training the model

This shows no error at all (default)

data = ImageDataBunch.from_df(path=‘data/’, df=df_trn, ds_tfms=tfms,
size=224, bs=bs).normalize(imagenet_stats)

When I try to pass a test dataset into (test=df_tst), I got an error

data = ImageDataBunch.from_df(path=‘data/’, df=df_trn, ds_tfms=tfms,
size=224, bs=bs, test=df_tst).normalize(imagenet_stats)

TypeError: expected str, bytes or os.PathLike object, not list

So now I try to pass just the text list (test=list(df_tst[‘name’])

data = ImageDataBunch.from_df(path=‘data/’, df=df_trn, ds_tfms=tfms,
size=224, bs=bs, test=list(df_tst[‘name’]).normalize(imagenet_stats)

and still got error
please help

Thanks for fast reply :slight_smile:
The train and test dataframe look the same just like in the lecture, something like this

|name|label|
| path_img_1 | category_1 |
| path_img_2 | category_2 |
| path_img_3 | category_1 |
| path_img_4 | category_2 |

So it has path, and answer

You’ll want to make a seperate databunch for just it, and pass in the ImageList to the validation when you want to run analysis via learn.validate(). The reason is the test sets in fastai are unlabeled. So here we can make a labeled test set to work with. An example using tabular is shown below:

data = (TabularList.from_df(train_set, path=Path(''), cat_names=cat_var, 
                            cont_names=cont_var, procs=procs)
       .split_by_rand_pct(0.2)
       .label_from_df(dep_var, classes=classes)
       .databunch(bs=5000))
data_test = (TabularList.from_df(test, path=Path(''), cat_names=cat_var, 
                            cont_names=cont_var, procs=procs, processor=data.processor)
       .split_none()
       .label_from_df(dep_var, classes=classes)
       .databunch(bs=5000))

Notice here that I make a separate databunch, classes is to ensure they have the same classes, and the processor is to make sure they align correctly transformation wise. If you need it I can quickly write one for an ImageList in a moment but see if you can’t work it out yourself first. Also notice the split_none() on the test set databunch.

Then when you are ready to use it and validate, you can do the following:

learn.data.valid_dl = data_test.train_dl
learn.validate()

Good luck!

hi @muellerzr ,

  • How do I set aside validation Set , from a dataframe ? Looking at the code of fast.ai, this from_df set the valid_pct to 0.2 which means 20% data is kept aside for validation set automatically ?
  • Below is my code , when I plot learn.recorder.plot_losses() , it doesn’t show the validation loss graph.

    I have one Dataframe with Image path and Label .

    image_dataset = pd.concat([df['image_path'], df['lesion']], axis=1, keys=['name', 'label'])
    bs = 8
    tfms = get_transforms(flip_vert=True)
    data = ImageDataBunch.from_df(".", image_dataset, ds_tfms=tfms, size=450, bs=bs).normalize(imagenet_stats)
    

    Fast.ai code

       @classmethod
        def from_df(cls, path:PathOrStr, df:pd.DataFrame, folder:PathOrStr=None, label_delim:str=None, valid_pct:float=0.2,
                    seed:int=None, fn_col:IntsOrStrs=0, label_col:IntsOrStrs=1, suffix:str='', **kwargs:Any)->'ImageDataBunch':
            "Create from a `DataFrame` `df`."
            src = (ImageList.from_df(df, path=path, folder=folder, suffix=suffix, cols=fn_col)
                    .split_by_rand_pct(valid_pct, seed)
                    .label_from_df(label_delim=label_delim, cols=label_col))
            return cls.create_from_ll(src, **kwargs)
    

    Thanks.

    from sklearn.model_selection import train_test_split
    train, test = train_test_split(df, test_size=0.2)
    

    Followed by you can go through and run through the datablock api and pass in a train and validation dataframe (see the API for ImageList). If you need help with that let me know

    I am having a similiar issue but my test set is unlabelled!!
    (Leaf Identification Kaggle Problem)!!
    Can I please get some help since I am stuck?

    I have an image net style folder i.e. with ‘test/’ ‘train/’ and ‘valid/’ sub folders. When I create an imagedatabunch from this folder structure like this:

    Screen Shot 2019-10-27 at 15.57.05.png1127×382 36.7 KB

    It only seems to detect the train and valid folders ignoring the test folder.

    How can I test my model on the ‘test’ folder images once it has been trained?

    Many thanks!

    I faced a very similar issue today where I *did not * explicitly pass the test folder as part of ImageDataBunch.from_folder. This caused them to include test images as part of training dataset.

    Here’s databunch creation code which also added test as one of the labels to be present in Confusion Matrix as you can see it includes all the images from the child directories as part of the training set (Test dataset is NONE).

    screenshot-www.kaggle.com-2019.12 (2).png847×827 54.3 KB

    After fixing the data bunch code creation and explicitly specifying the train, test directory names, we can confirm that the databunch is created correctly.

    screenshot-www.kaggle.com-2019.12 (3).png838×875 62 KB

    Hi All,
    Wants some help, I want to apply quantile transformation(or any other transformation into the image) and the want to extract HOG features followed by multiple other features. And then want to give that feature set a input for my CNN learner.

    How can I create that data frame in fastai? and pass to CNN learner and to validate the test data set how to apply all that and validate the test data set.

    Any information would be helpful, please help :slight_smile: