All supervised deep
learning tasks depend on labeled datasets, which means humans
must apply their knowledge to train the neural network to identify or detect objects of interest. The labeled objects will be used by the neural
network to train a model that can be used to perform inferencing on
data.
Labeling is the process of selecting representative samples of an object of interest. The objects selected for labeling must accurately depict spatial, spectral, orientation, size, and condition characteristics of the objects of interest. The better the labeled objects represent the feature of interest, the better the training of the Deep Learning model and the more accurate the classification and detection of inferencing will be.
Image annotation, or
labeling, is vital for deep learning
tasks such as computer vision and learning. A large amount of labeled data is required to train a good deep learning model. When the right training
data is available, deep learning systems can be accurate in feature
extraction, pattern recognition, and complex problem solving. You can use the
Label Objects for Deep Learning
pane to label data.
Access the
Label Objects for Deep Learning
button
from the
Deep Learning Tools
drop-down menu, in the
Image Classification
group on the
Imagery
tab. When the tool is opened, choose to use an existing layer or create an image collection. For a new image collection, browse to the location of the imagery folder and a layer will be created with the image collection.
Once the
Images/Imagery Collection
parameter value has been specified, the
Label Objects
pane appears. The
pane is divided into two parts. The upper part of the pane is for managing classes, and the lower part of the pane is for managing the collection of the samples and for exporting the training data for the deep
learning frameworks.
Create classes and label objects
The
upper portion of the pane allows you to manage object classes and create the objects used for training the deep learning model. The following sketch tools and artificial intelligence (AI) assisted tools are available for creating labeled objects:
Tool
Description
Create a labeled object by drawing a rectangle around a feature or object in the raster.
Create a labeled object by drawing a polygon around a feature or object in the raster.
Create a labeled object by drawing a circle around a feature or object in the raster.
Create a labeled object by drawing a freehand shape around a feature or object in the raster.
Automatically detect and label the feature or object. A polygon is drawn around the feature or object.
This tool is only available if the deep learning frameworks libraries are installed.
Create a feature by selecting a segment from a segmented layer. This option is only available if there is a segmented layer in the
Contents
pane. Activate the
Segment Picker
by selecting the segmented layer in the
Contents
pane, and select the layer from the
Segment Picker
drop-down list.
Assign the selected class to the current image.
This tool is only available in Image Collection mode.
Select and edit a labeled object.
Create a classification schema.
Choose a classification schema option.
Browse to an existing schema.
Generate a new schema from an existing training sample feature class.
Generate a new schema from an existing classified raster.
Generate a new schema using the default 2011 National Land Cover Database schema.
Save changes to the schema.
Save a copy of the schema.
Add a class category to the schema. Select the name of the schema first to create a parent class at the highest level. Select the name of an existing class to create a subclass.
Remove the selected class or subclass category from the schema.
AI-assisted labeling tools
There are two types of AI tools for labeling objects:
Auto Detect
and
Text Prompt
.
Auto Detect tool
The
Auto Detect
tool
automatically draws a rectangle around a feature. Click the feature, and a rectangle bounding box containing the feature is drawn. If you want a polygon boundary of the feature, press the
Shift
key while clicking the feature, and a perimeter is drawn around the shape of the feature. For the tool to work well, it requires a significant number of pixels of the features to be displayed on the map, requiring you to zoom in close to the features.
The
Auto Detect
tool works well on distinct features characterized by distinctive shapes, sharp edges, and high contrast. It is not recommended for continuous features in close proximity to each other.
Using the
Text Prompt
tool, you can use the detection feature to assist with labelling. Type the name of the object in the
Class Name
text box, and click the
Detect
button. It is recommended that a high end GPU, with at least 12 GB RAM is used for this feature. To run the text prompt feature automatically on an image collection, use the
Shift+O
shortcut key.
To improve results, you can set the
Box threshold
and
Text threshold
values in the
Configure
options:
Box threshold
—This value is used for object detection in the image. A higher value makes the model more selective, identifying only the most confident object instances, leading to fewer overall detections. A lower value makes the model more tolerant, leading to increased detections, including potentially less confident ones. Threshold values range from 0 to 1.
Text threshold
—This value is used to associate the detected objects with the provided text prompt. A higher value requires a stronger association between the object and the text prompt, leading to more precise but potentially fewer associations. A lower value allows for weaker associations, which may increase the number of associations, but also introduce less precise matches. Threshold values range from 0 to 1.
Remove Anomalies
—This option removes any features that are not within the median range in terms of shape area distribution. The default is False.
The
Define Class
dialog box allows you to create a class or define an existing class. If you choose
Use Existing Class
, select the appropriate
Class Name
option for that object. If you choose
Add New Class
, you can edit the information, and click
OK
to create the class.
Labeled Objects tab
The
Labeled Objects
tab is in the lower section of
the
Label Objects
pane and is where you manage
the training samples you collected for each class. Collect
representative sites, or training samples, for each
class in the image. A training sample has location information
(polygon) and an associated class. The image
classification algorithm uses the training samples, saved as a
feature class, to identify the land cover classes in the entire
image.
You can view and manage
training samples by adding, grouping, or removing them. When you
click a training sample, it is selected on the map.
Double-click a training sample in the table to zoom to it on the
map.
The tools on the
Labeled Objects
tab are described in the following table:
Tool
Description
Open an existing training samples feature class.
Save edits made to the current labeled objects feature class.
Save the current labeled objects as a new feature class.
Delete the selected labeled objects.
Export Training Data tab
Once samples have been
collected, you can export them to training data by clicking the
Export Training Data
tab. The training data can then be used in a deep learning model. Once you establish the parameter values described below, click
Run
to create the training data.
Parameter
Description
Output Folder
The output folder where the training data will be saved.
Mask Polygon Features
A polygon feature class that delineates the area where image chips will be created.
Only image chips that fall completely within the polygons will be created.
Image Format
Specifies the raster format for the image chip outputs:
TIFF—This is the default.
MRF (Meta Raster Format)
PNG
JPEG
The PNG and JPEG formats support up to three bands.
Tile Size X
The size of the image chips for the x dimension.
Tile Size Y
The size of the image chips for the y dimension.
Stride X
The distance to move in the x direction when creating the next image chips.
When stride is equal to tile size, there will be no overlap. When stride is equal to half the tile size, there will be 50 percent overlap.
Stride Y
The distance to move in the y direction when creating the next image chips.
When stride is equal to tile size, there will be no overlap. When stride is equal to half the tile size, there will be 50 percent overlap.
Rotation Angle
The rotation angle that will be used to generate image chips.
An image chip will first be generated with no rotation. It will then be rotated at the specified angle to create additional image chips. The image will be rotated and have a chip created, until it has been fully rotated. For example, if you specify a rotation angle of 45 degrees, the tool will create eight image chips. The eight image chips will be created at the following angles: 0, 45, 90, 135, 180, 25, 270, and 315.
Output No Feature Tiles
Specifies whether image chips that do not capture training samples will be exported.
Unchecked—Only image chips that capture training samples will be exported. This is the default.
Checked—All image chips, including those that do not capture training samples, will be exported.
Collecting image chips that do not contain training samples can help the model identify objects that should not be considered part of the results, such as false positive objects. It can also reduce overfitting.
Meta Data Format
Specifies the format that will be used for the output metadata labels.
If the input training sample data is a feature class layer, such as a building layer or a standard classification training sample file, use the
KITTI Labels
or
PASCAL Visual Object Classes
option (
KITTI_rectangles
or
PASCAL_VOC_rectangles
in Python). The output metadata is a
.txt
file or an
.xml
file containing the training sample data contained in the minimum bounding rectangle. The name of the metadata file matches the input source image name. If the input training sample data is a class map, use the
Classified Tiles
option (
Classified_Tiles
in Python) as the output metadata format.
KITTI Labels
—The metadata follows the same format as the Karlsruhe Institute of Technology and Toyota Technological Institute (KITTI) Object Detection Evaluation dataset. The KITTI dataset is a vision benchmark suite. The label files are plain text files. All values, both numerical and strings, are separated by spaces, and each row corresponds to one object.
PASCAL Visual Object Classes
—The metadata follows the same format as the Pattern Analysis, Statistical Modeling and Computational Learning Visual Object Classes (PASCAL VOC) dataset. The PASCAL VOC dataset is a standardized image dataset for object class recognition. The label files are
.xml
files and contain information about image name, class value, and bounding boxes. This is the default.
Classified Tiles
—The output will be one classified image chip per input image chip. No other metadata for each image chip is used. Only the statistics output has more information about the classes, such as class names, class values, and output statistics.
RCNN Masks
—The output will be image chips that have a mask on the areas where the sample exists. The model generates bounding boxes and segmentation masks for each instance of an object in the image. It is based on Feature Pyramid Network (FPN) and a ResNet101 backbone in the deep learning framework model.
Labeled Tiles
—Each output tile will be labeled with a specific class. If you choose this metadata format, you can additionally refine the
Blacken Around Feature
and
Crop Mode
parameters.
Multi-labeled Tiles—Each output tile will be labeled with one or more classes. For example, a tile may be labeled agriculture and also cloudy. This format is used for object classification.
Export Tiles
—The output will be image chips with no label. This format is used
for image translation techniques, such as Pix2Pix and Super
Resolution.
CycleGAN
—The output will be image chips with no label. This format is used
for the image translation technique CycleGAN, which is used to train
images that do not overlap.
Imagenet
—Each output tile will be labeled with a specific class. This format is used for object classification; however, it can also be used for object tracking when the Deep Sort model type is used during training.
For the KITTI metadata format, 15 columns are created, but only 5 of them are used in the tool. The first column is the class value. The next 3 columns are skipped. Columns 5 through 8 define the minimum bounding rectangle, which is composed of four image coordinate locations: left, top, right, and bottom pixels. The minimum bounding rectangle encompasses the training chip used in the deep learning classifier. The remaining columns are not used.
Blacken Around Feature
Specifies whether the pixels around each object or feature in each image tile will be masked out.
Unchecked—Pixels surrounding objects or features will not be masked out. This is the default.
Checked—Pixels surrounding objects or features will be masked out.
This parameter only applies when the
Metadata Format
parameter is set to
Labeled Tiles
and an input feature class or classified raster has been specified.
Crop Mode
Specifies whether the exported tiles will be cropped so that they are all the same size.
Fixed size
—Exported tiles will be
the same size and will center on the feature. This is the default.
Bounding box
—Exported tiles will be cropped so that the bounding geometry surrounds only the feature in the tile.
This parameter only applies when the
Metadata Format
parameter is set to either
Labeled Tiles
or
Imagenet
, and an input feature class or classified raster has been specified.
Reference System
Specifies the type of reference system that will be used to interpret the input image. The reference system specified must match the reference system used to train the deep learning model.
Map space
—The input image is in a map-based coordinate system. This is the default.
Pixel space
—The input image is in image space (rows and columns), with no rotation and no distortion.
Additional Input Raster
An additional input imagery source for image translation methods.
This parameter is valid when the
Metad Dta Format
parameter is set to
Classified Tiles
,
Export Tiles
, or
CycleGAN
.
The exported training data can now be used in a deep learning model.