This example shows how to perform semantic segmentation of a multispectral image with seven channels using U-Net.
Semantic segmentation involves labeling each pixel in an image with a class. One application of semantic segmentation is tracking deforestation, which is the change in forest cover over time. Environmental agencies track deforestation to assess and quantify the environmental and ecological health of a region.
Deep learning based semantic segmentation can yield a precise measurement of vegetation cover from high-resolution aerial photographs. One challenge is differentiating classes with similar visual characteristics, such as trying to classify a green pixel as grass, shrubbery, or tree. To increase classification accuracy, some data sets contain multispectral images that provide additional information about each pixel. For example, the Hamlin Beach State Park data set supplements the color images with three near-infrared channels that provide a clearer separation of the classes.
This example first shows you how to perform semantic segmentation using a pretrained U-Net and then use the segmentation results to calculate the extent of vegetation cover. Then, you can optionally train a U-Net network on the Hamlin Beach State Park data set using a patch-based training methodology.
Download Pretrained U-Net
Specify
dataDir
as the desired location of the trained network and data set.
Download a pretrained U-Net network.
Download Data Set
This example uses a high-resolution multispectral data set to train the network [
1
]. The image set was captured using a drone over the Hamlin Beach State Park, NY. The data contains labeled training, validation, and test sets, with 18 object class labels. The size of the data file is 3.0 GB.
Download the MAT file version of the data set using the
downloadHamlinBeachMSIData
helper function. This function is attached to the example as a supporting file.
Load the data set.
Name Size Bytes Class Attributes
test_data 7x12446x7654 1333663576 uint16
train_data 7x9393x5642 741934284 uint16
val_data 7x8833x6918 855493716 uint16
The multispectral image data is arranged as
numChannels
-by-
width
-by-
height
arrays. However, in MATLAB®, multichannel images are arranged as
width
-by-
height
-by-
numChannels
arrays. To reshape the data so that the channels are in the third dimension, use the
permute
function.
Confirm that the data has the correct structure.
Name Size Bytes Class Attributes
test_data 12446x7654x7 1333663576 uint16
train_data 9393x5642x7 741934284 uint16
val_data 8833x6918x7 855493716 uint16
Visualize Multispectral Data
Display the center of each spectral band in nanometers.
In this data set, the RGB color channels are the 3rd, 2nd, and 1st image channels, respectively. Display the RGB component of the training, validation, and test images as a montage. To make the images appear brighter on the screen, equalize their histograms by using the
histeq
function.
The 4th, 5th, and 6th channels of the data correspond to near-infrared bands. Equalize the histogram of these three channels for the training image, then display these channels as a montage. The channels highlight different components of the image based on their heat signatures. For example, the trees are darker in the 4th channel than in the other two infrared channels.
The 7th channel of the data is a binary mask that indicates the valid segmentation region. Display the mask for the training, validation, and test images.
Visualize Ground Truth Labels
The labeled images contain the ground truth data for the segmentation, with each pixel assigned to one of the 18 classes. Get a list of the classes with their corresponding IDs.
0. Other Class/Image Border
1. Road Markings
2. Tree
3. Building
4. Vehicle (Car, Truck, or Bus)
5. Person
6. Lifeguard Chair
7. Picnic Table
8. Black Wood Panel
9. White Wood Panel
10. Orange Landing Pad
11. Water Buoy
12. Rocks
13. Other Vegetation
14. Grass
15. Sand
16. Water (Lake)
17. Water (Pond)
18. Asphalt (Parking Lot/Walkway)
This example aims to segment the images into two classes: vegetation and non-vegetation. Define the target class names.
Group the 18 original classes into the two target classes for the training and validation data. "Vegetation" is a combination of the original classes "Tree", "Other Vegetation", and "Grass", which have class IDs 2, 13, and 14. The original class "Other Class/Image Border" with class ID 0 belongs to the background class. All other original classes belong to the target label "NotVegetation".
Save the ground truth validation labels as a PNG file. The example uses this file to calculate accuracy metrics.
Overlay the labels on the histogram-equalized RGB training image. Add a color bar to the image.
Perform Semantic Segmentation on Test Image
The size of the image prevents segmenting the entire image at once. Instead, segment the image using a blocked image approach. This approach can scale to very large files because it loads and processes one block of data at a time.
Create a blocked image containing the six spectral channels of the test data by using the
blockedImage
function.
Segment a block of data by using the
semanticseg
(Computer Vision Toolbox)
function. Call the
sematicseg
function on all blocks in the blocked image by using the
apply
function.
Assemble all of the segmented blocks into a single image into the workspace by using the
gather
function.
To extract only the valid portion of the segmentation, multiply the segmented image by the mask channel of the validation data.
The output of semantic segmentation is noisy. Perform post image processing to remove noise and stray pixels. Remove salt-and-pepper noise from the segmentation by using the
medfilt2
function. Display the segmented image with the noise removed.
Overlay the segmented image on the histogram-equalized RGB validation image.
Calculate Extent of Vegetation Cover
The semantic segmentation results can be used to answer pertinent ecological questions. For example, what percentage of land area is covered by vegetation? To answer this question, find the number of pixels labeled vegetation in the segmented test image. Also, find the total number of pixels in the ROI by counting the number of nonzero pixels in the segmented image.
Calculate the percentage of vegetation cover by dividing the number of vegetation pixels by the number of pixels in the ROI.
The percentage of vegetation cover is 60.7028%
The rest of the example shows how to train U-Net on the Hamlin Beach data set.
Create Blocked Image Datastores for Training
Use a blocked image datastore to feed the training data to the network. This datastore extracts multiple corresponding patches from an image datastore and pixel label datastore that contain ground truth images and pixel label data.
Read the training images, training labels, and mask as blocked images.
Select blocks of image data that overlap with the mask.
One-hot encode the labels.
Write the data to blocked image datastores by using the
blockedImageDatastore
function.
Create a
CombinedDatastore
from the two blocked image datastores.
The blocked image datastore
dsTrain
provides mini-batches of data to the network at each iteration of the epoch. Preview the datastore to explore the data.
ans=1×2 cell array
{256×256×6 uint16} {256×256×2 double}
Create U-Net Network
This example uses a variation of the U-Net network. In U-Net, the initial series of convolutional layers are interspersed with max pooling layers, successively decreasing the resolution of the input image. These layers are followed by a series of convolutional layers interspersed with upsampling operators, successively increasing the resolution of the input image [
2
]. The name U-Net comes from the fact that the network can be drawn with a symmetric shape like the letter U.
Specify hyperparameters of the U-Net. The input depth is the number of hyperspectral channels, 6.
Create the U-Net network using the
unet
(Computer Vision Toolbox)
function.
Define Loss Function
Define a custom loss function called
lossFcn
that calculates calculates cross entropy loss over all unmasked pixels of an image.
Select Training Options
Train the network using stochastic gradient descent with momentum (SGDM) optimization. Specify the hyperparameter settings for SGDM by using the
trainingOptions
(Deep Learning Toolbox)
function. To enable gradient clipping, specify the
GradientThreshold
name-value argument as
0.05
and specify the
GradientThresholdMethod
to use the L2-norm of the gradients.
Train the Network
To train the network, set the
doTraining
variable in the following code to
true
. Train the model by using the
trainnet
(Deep Learning Toolbox)
function. Specify the loss as the custom loss function,
lossFcn
. By default, the
trainnet
function uses a GPU if one is available. Training on a GPU requires a Parallel Computing Toolbox™ license and a supported GPU device. For information on supported devices, see
GPU Computing Requirements
(Parallel Computing Toolbox)
. Otherwise, the
trainnet
function uses the CPU. To specify the execution environment, use the
ExecutionEnvironment
training option.
Evaluate Segmentation Accuracy
Segment the validation data.
Create a blocked image containing the six spectral channels of the validation data by using the
blockedImage
function.
Segment a block of data by using the
semanticseg
(Computer Vision Toolbox)
function. Call the
sematicseg
function on all blocks in the blocked image by using the
apply
function.
Assemble all of the segmented blocks into a single image into the workspace by using the
gather
function.
Save the segmented image as a PNG file.
Load the segmentation results and ground truth labels by using the
pixelLabelDatastore
(Computer Vision Toolbox)
function.
Measure the accuracy of the semantic segmentation by using the
evaluateSemanticSegmentation
(Computer Vision Toolbox)
function. The global accuracy score indicates that over 96% of the pixels are classified correctly.
Evaluating semantic segmentation results
----------------------------------------
* Selected metrics: global accuracy, class accuracy, IoU, weighted IoU, BF score.
* Processed 1 images.
* Finalizing... Done.
* Data set metrics:
GlobalAccuracy MeanAccuracy MeanIoU WeightedIoU MeanBFScore
______________ ____________ _______ ___________ ___________
0.9694 0.9704 0.9406 0.94064 0.79474
References
[1] Kemker, R., C. Salvaggio, and C. Kanan. "High-Resolution Multispectral Dataset for Semantic Segmentation." CoRR, abs/1703.01918. 2017.
[2] Ronneberger, O., P. Fischer, and T. Brox. "U-Net: Convolutional Networks for Biomedical Image Segmentation." CoRR, abs/1505.04597. 2015.
[3] Kemker, Ronald, Carl Salvaggio, and Christopher Kanan. "Algorithms for Semantic Segmentation of Multispectral Remote Sensing Imagery Using Deep Learning." ISPRS Journal of Photogrammetry and Remote Sensing, Deep Learning RS Data, 145 (November 1, 2018): 60-77. https://doi.org/10.1016/j.isprsjprs.2018.04.014.
Run the command by entering it in the MATLAB Command Window.
Web browsers do not support MATLAB commands.