You signed in with another tab or window.
Reload
to refresh your session.
You signed out in another tab or window.
Reload
to refresh your session.
You switched accounts on another tab or window.
Reload
to refresh your session.
This project aims to
deepen knowledges in CNNs
, especially in features extraction and images similarity computation. I decided to work with 2 pre-trained CNN (on ImageNet): the
VGG16
and the
ResNet50
and to compare their cosine similarity performances. You can choose to load models:
- to
make predictions
(
include_top = True
: the model will be composed of all layers: 'feature learning block' + 'classification block')
- to
extract features
(
include_top = False
: the classification block is omitted)
[Figure 1]: Architecture of the VGG16 (left) and ResNet50 (right)
In a first time, I wondered
which model could predict an image whith the most accuracy
. Here I chose to compare their performances for a vase image: the ResNet50 was the best with 99.89% accuracy against 95.06% for the VGG16. The idea in this part was
to manipulate and to understand how prediction works
.
[Figure 2]: Comparison of predictions (VGG16/ResNet18)
Then I decided to
visualize features maps from main blocks
in the VGG16. These feature maps output from each block are collected in a single pass to create an image. There are 5 main blocks in the image (e.g. block1, block2, etc.) that end in a pooling layer for the VGG16. You can
choose blocks to visualize
by the layers index:
idx = [2, 5, 9, 13, 17] # [block1, block2, block3, block4, block5]
. Figure 3 highlights that
quality-level features extraction is proportional with the network depth
[Figure 3]: Visualization of the 5 main blocks from the VGG16
Now let's focus on
features vector extraction
. Removing the last layer of the model enables to extract a
feature vector
as explained previously. Then, the input images is
preprocessed
(reshaping, RGB->BGR conversion, zero-centering with dataset). The global process on the Figure 4 depicts how to compute similarity between two images. Images were stored on AWS S3 and I used an notebook instance in AWS SageMaker. A features vector was extracted for each image, then the latter compared with
cosine similarity
. It computes the cosine of the angle between both features vectors with the
compute_similarity_img()
function.
[Figure 5]: Cosine similarity using VGG16
I decided to increase the dataset and to compare results with
data augmentation
as shown in Figure 6. For the data augmentation, I used a
ImageDataGenerator
object to set up data augmentation parameters. It generated batches of tensor image data with real-time data augmentation:
gen = ImageDataGenerator(
rotation_range=30, # Int: degree range for random rotations
width_shift_range=0.1, # Float: fraction of total width, if < 1, or pixels if >= 1
height_shift_range=0.1, # Float: fraction of total height, if < 1, or pixels if >= 1
shear_range=0.15, # Float: shear Intensity (shear angle in counter-clockwise direction in degrees)
zoom_range=0.1, # Float: range for random zoom
channel_shift_range=10., # Float: range for random channel shifts
horizontal_flip=True # Boolean: randomly flip inputs horizontally