detectron2.utils¶
detectron2.utils.colormap module¶
An awesome colormap for really neat visualizations. Copied from Detectron, and removed gray colors.
detectron2.utils.comm module¶
This file contains primitives for multi-gpu communication. This is useful when doing distributed training.
-
detectron2.utils.comm.
get_local_rank
() → int[source]¶ - Returns
The rank of the current process within the local (per-machine) process group.
-
detectron2.utils.comm.
get_local_size
() → int[source]¶ - Returns
The size of the per-machine process group, i.e. the number of processes per machine.
-
detectron2.utils.comm.
synchronize
()[source]¶ Helper function to synchronize (barrier) among all processes when using distributed training
-
detectron2.utils.comm.
all_gather
(data, group=None)[source]¶ Run all_gather on arbitrary picklable data (not necessarily tensors).
- Parameters
data – any picklable object
group – a torch process group. By default, will use a group which contains all ranks on gloo backend.
- Returns
list[data] – list of data gathered from each rank
-
detectron2.utils.comm.
gather
(data, dst=0, group=None)[source]¶ Run gather on arbitrary picklable data (not necessarily tensors).
- Parameters
data – any picklable object
dst (int) – destination rank
group – a torch process group. By default, will use a group which contains all ranks on gloo backend.
- Returns
list[data] –
- on dst, a list of data gathered from each rank. Otherwise,
an empty list.
- Returns
int – a random number that is the same across all workers. If workers need a shared RNG, they can use this shared seed to create one.
All workers must call this function, otherwise it will deadlock.
detectron2.utils.events module¶
-
detectron2.utils.events.
get_event_storage
()[source]¶ - Returns
The
EventStorage
object that’s currently being used. Throws an error if noEventStorage
is currently enabled.
-
class
detectron2.utils.events.
JSONWriter
(json_file, window_size=20)[source]¶ Bases:
detectron2.utils.events.EventWriter
Write scalars to a json file.
It saves scalars as one json per line (instead of a big json) for easy parsing.
Examples parsing such a json file:
$ cat metrics.json | jq -s '.[0:2]' [ { "data_time": 0.008433341979980469, "iteration": 19, "loss": 1.9228371381759644, "loss_box_reg": 0.050025828182697296, "loss_classifier": 0.5316952466964722, "loss_mask": 0.7236229181289673, "loss_rpn_box": 0.0856662318110466, "loss_rpn_cls": 0.48198649287223816, "lr": 0.007173333333333333, "time": 0.25401854515075684 }, { "data_time": 0.007216215133666992, "iteration": 39, "loss": 1.282649278640747, "loss_box_reg": 0.06222952902317047, "loss_classifier": 0.30682939291000366, "loss_mask": 0.6970193982124329, "loss_rpn_box": 0.038663312792778015, "loss_rpn_cls": 0.1471673548221588, "lr": 0.007706666666666667, "time": 0.2490077018737793 } ] $ cat metrics.json | jq '.loss_mask' 0.7126231789588928 0.689423680305481 0.6776131987571716 ...
-
class
detectron2.utils.events.
TensorboardXWriter
(log_dir: str, window_size: int = 20, **kwargs)[source]¶ Bases:
detectron2.utils.events.EventWriter
Write all scalars to a tensorboard file.
-
class
detectron2.utils.events.
CommonMetricPrinter
(max_iter: Optional[int] = None, window_size: int = 20)[source]¶ Bases:
detectron2.utils.events.EventWriter
Print common metrics to the terminal, including iteration time, ETA, memory, all losses, and the learning rate. It also applies smoothing using a window of 20 elements.
It’s meant to print common metrics in common ways. To print something in more customized ways, please implement a similar printer by yourself.
-
class
detectron2.utils.events.
EventStorage
(start_iter=0)[source]¶ Bases:
object
The user-facing class that provides metric storage functionalities.
In the future we may add support for storing / logging other types of data if needed.
-
put_image
(img_name, img_tensor)[source]¶ Add an img_tensor associated with img_name, to be shown on tensorboard.
- Parameters
img_name (str) – The name of the image to put into tensorboard.
img_tensor (torch.Tensor or numpy.array) – An uint8 or float Tensor of shape [channel, height, width] where channel is 3. The image format should be RGB. The elements in img_tensor can either have values in [0, 1] (float32) or [0, 255] (uint8). The img_tensor will be visualized in tensorboard.
-
put_scalar
(name, value, smoothing_hint=True)[source]¶ Add a scalar value to the HistoryBuffer associated with name.
- Parameters
smoothing_hint (bool) –
a ‘hint’ on whether this scalar is noisy and should be smoothed when logged. The hint will be accessible through
EventStorage.smoothing_hints()
. A writer may ignore the hint and apply custom smoothing rule.It defaults to True because most scalars we save need to be smoothed to provide any useful signal.
-
put_scalars
(*, smoothing_hint=True, **kwargs)[source]¶ Put multiple scalars from keyword arguments.
Examples
storage.put_scalars(loss=my_loss, accuracy=my_accuracy, smoothing_hint=True)
-
put_histogram
(hist_name, hist_tensor, bins=1000)[source]¶ Create a histogram from a tensor.
- Parameters
hist_name (str) – The name of the histogram to put into tensorboard.
hist_tensor (torch.Tensor) – A Tensor of arbitrary shape to be converted into a histogram.
bins (int) – Number of histogram bins.
-
latest
()[source]¶ - Returns
dict[str -> (float, int)] –
- mapping from the name of each scalar to the most
recent value and the iteration number its added.
-
latest_with_smoothing_hint
(window_size=20)[source]¶ Similar to
latest()
, but the returned values are either the un-smoothed original latest value, or a median of the given window_size, depend on whether the smoothing_hint is True.This provides a default behavior that other writers can use.
-
smoothing_hints
()[source]¶ - Returns
dict[name -> bool] –
- the user-provided hint on whether the scalar
is noisy and needs smoothing.
-
step
()[source]¶ User should either: (1) Call this function to increment storage.iter when needed. Or (2) Set storage.iter to the correct iteration number before each iteration.
The storage will then be able to associate the new data with an iteration number.
-
property
iter
¶ Returns: int: The current iteration number. When used together with a trainer,
this is ensured to be the same as trainer.iter.
-
property
iteration
¶
-
name_scope
(name)[source]¶ - Yields
A context within which all the events added to this storage will be prefixed by the name scope.
-
detectron2.utils.logger module¶
-
detectron2.utils.logger.
setup_logger
(output=None, distributed_rank=0, *, color=True, name='detectron2', abbrev_name=None)[source]¶ Initialize the detectron2 logger and set its verbosity level to “DEBUG”.
- Parameters
output (str) – a file name or a directory to save log. If None, will not save log file. If ends with “.txt” or “.log”, assumed to be a file name. Otherwise, logs will be saved to output/log.txt.
name (str) – the root module name of this logger
abbrev_name (str) – an abbreviation of the module, to avoid long names in logs. Set to “” to not log the root module in logs. By default, will abbreviate “detectron2” to “d2” and leave other modules unchanged.
- Returns
logging.Logger – a logger
-
detectron2.utils.logger.
log_first_n
(lvl, msg, n=1, *, name=None, key='caller')[source]¶ Log only for the first n times.
- Parameters
lvl (int) – the logging level
msg (str) –
n (int) –
name (str) – name of the logger to use. Will use the caller’s module by default.
key (str or tuple[str]) – the string(s) can be one of “caller” or “message”, which defines how to identify duplicated logs. For example, if called with n=1, key=”caller”, this function will only log the first call from the same caller, regardless of the message content. If called with n=1, key=”message”, this function will log the same content only once, even if they are called from different places. If called with n=1, key=(“caller”, “message”), this function will not log only if the same caller has logged the same message before.
detectron2.utils.registry module¶
-
class
detectron2.utils.registry.
Registry
(*args, **kwds)[source]¶ Bases:
collections.abc.Iterable
,typing.Generic
The registry that provides name -> object mapping, to support third-party users’ custom modules.
To create a registry (e.g. a backbone registry):
BACKBONE_REGISTRY = Registry('BACKBONE')
To register an object:
@BACKBONE_REGISTRY.register() class MyBackbone(): ...
Or:
BACKBONE_REGISTRY.register(MyBackbone)
detectron2.utils.memory module¶
-
detectron2.utils.memory.
retry_if_cuda_oom
(func)[source]¶ Makes a function retry itself after encountering pytorch’s CUDA OOM error. It will first retry after calling torch.cuda.empty_cache().
If that still fails, it will then retry by trying to convert inputs to CPUs. In this case, it expects the function to dispatch to CPU implementation. The return values may become CPU tensors as well and it’s user’s responsibility to convert it back to CUDA tensor if needed.
- Parameters
func – a stateless callable that takes tensor-like objects as arguments
- Returns
a callable which retries func if OOM is encountered.
Examples:
output = retry_if_cuda_oom(some_torch_function)(input1, input2) # output may be on CPU even if inputs are on GPU
Note
When converting inputs to CPU, it will only look at each argument and check if it has .device and .to for conversion. Nested structures of tensors are not supported.
Since the function might be called more than once, it has to be stateless.
detectron2.utils.analysis module¶
-
detectron2.utils.analysis.
activation_count_operators
(model: torch.nn.Module, inputs: list, **kwargs) → DefaultDict[str, float][source]¶ Implement operator-level activations counting using jit. This is a wrapper of fvcore.nn.activation_count, that supports standard detection models in detectron2.
Note
The function runs the input through the model to compute activations. The activations of a detection model is often input-dependent, for example, the activations of box & mask head depends on the number of proposals & the number of detected objects.
-
detectron2.utils.analysis.
flop_count_operators
(model: torch.nn.Module, inputs: list) → DefaultDict[str, float][source]¶ Implement operator-level flops counting using jit. This is a wrapper of
fvcore.nn.flop_count()
and adds supports for standard detection models in detectron2. Please useFlopCountAnalysis
for more advanced functionalities.Note
The function runs the input through the model to compute flops. The flops of a detection model is often input-dependent, for example, the flops of box & mask head depends on the number of proposals & the number of detected objects. Therefore, the flops counting using a single input may not accurately reflect the computation cost of a model. It’s recommended to average across a number of inputs.
- Parameters
model – a detectron2 model that takes list[dict] as input.
inputs (list[dict]) – inputs to model, in detectron2’s standard format. Only “image” key will be used.
supported_ops (dict[str, Handle]) – see documentation of
fvcore.nn.flop_count()
- Returns
Counter – Gflop count per operator
-
detectron2.utils.analysis.
parameter_count_table
(model: torch.nn.Module, max_depth: int = 3) → str[source]¶ Format the parameter count of the model (and its submodules or parameters) in a nice table. It looks like this:
| name | #elements or shape | |:--------------------------------|:---------------------| | model | 37.9M | | backbone | 31.5M | | backbone.fpn_lateral3 | 0.1M | | backbone.fpn_lateral3.weight | (256, 512, 1, 1) | | backbone.fpn_lateral3.bias | (256,) | | backbone.fpn_output3 | 0.6M | | backbone.fpn_output3.weight | (256, 256, 3, 3) | | backbone.fpn_output3.bias | (256,) | | backbone.fpn_lateral4 | 0.3M | | backbone.fpn_lateral4.weight | (256, 1024, 1, 1) | | backbone.fpn_lateral4.bias | (256,) | | backbone.fpn_output4 | 0.6M | | backbone.fpn_output4.weight | (256, 256, 3, 3) | | backbone.fpn_output4.bias | (256,) | | backbone.fpn_lateral5 | 0.5M | | backbone.fpn_lateral5.weight | (256, 2048, 1, 1) | | backbone.fpn_lateral5.bias | (256,) | | backbone.fpn_output5 | 0.6M | | backbone.fpn_output5.weight | (256, 256, 3, 3) | | backbone.fpn_output5.bias | (256,) | | backbone.top_block | 5.3M | | backbone.top_block.p6 | 4.7M | | backbone.top_block.p7 | 0.6M | | backbone.bottom_up | 23.5M | | backbone.bottom_up.stem | 9.4K | | backbone.bottom_up.res2 | 0.2M | | backbone.bottom_up.res3 | 1.2M | | backbone.bottom_up.res4 | 7.1M | | backbone.bottom_up.res5 | 14.9M | | ...... | ..... |
- Parameters
model – a torch module
max_depth (int) – maximum depth to recursively print submodules or parameters
- Returns
str – the table to be printed
-
detectron2.utils.analysis.
parameter_count
(model: torch.nn.Module) → DefaultDict[str, int][source]¶ Count parameters of a model and its submodules.
- Parameters
model – a torch module
- Returns
dict (str-> int) – the key is either a parameter name or a module name. The value is the number of elements in the parameter, or in all parameters of the module. The key “” corresponds to the total number of parameters of the model.
-
class
detectron2.utils.analysis.
FlopCountAnalysis
(model, inputs)[source]¶ Bases:
fvcore.nn.flop_count.FlopCountAnalysis
Same as
fvcore.nn.FlopCountAnalysis
, but supports detectron2 models.
detectron2.utils.visualizer module¶
-
class
detectron2.utils.visualizer.
ColorMode
(value)[source]¶ Bases:
enum.Enum
Enum of different color modes to use for instance visualizations.
-
IMAGE
= 0¶ Picks a random color for every instance and overlay segmentations with low opacity.
-
SEGMENTATION
= 1¶ Let instances of the same category have similar colors (from metadata.thing_colors), and overlay them with high opacity. This provides more attention on the quality of segmentation.
-
IMAGE_BW
= 2¶ Same as IMAGE, but convert all areas without masks to gray-scale. Only available for drawing per-instance mask predictions.
-
-
class
detectron2.utils.visualizer.
VisImage
(img, scale=1.0)[source]¶ Bases:
object
-
__init__
(img, scale=1.0)[source]¶ - Parameters
img (ndarray) – an RGB image of shape (H, W, 3) in range [0, 255].
scale (float) – scale the input image
-
-
class
detectron2.utils.visualizer.
Visualizer
(img_rgb, metadata=None, scale=1.0, instance_mode=<ColorMode.IMAGE: 0>)[source]¶ Bases:
object
Visualizer that draws data about detection/segmentation on images.
It contains methods like draw_{text,box,circle,line,binary_mask,polygon} that draw primitive objects to images, as well as high-level wrappers like draw_{instance_predictions,sem_seg,panoptic_seg_predictions,dataset_dict} that draw composite data in some pre-defined style.
Note that the exact visualization style for the high-level wrappers are subject to change. Style such as color, opacity, label contents, visibility of labels, or even the visibility of objects themselves (e.g. when the object is too small) may change according to different heuristics, as long as the results still look visually reasonable.
To obtain a consistent style, you can implement custom drawing functions with the abovementioned primitive methods instead. If you need more customized visualization styles, you can process the data yourself following their format documented in tutorials (모델 사용, 커스텀 데이터셋 사용). This class does not intend to satisfy everyone’s preference on drawing styles.
This visualizer focuses on high rendering quality rather than performance. It is not designed to be used for real-time applications.
-
__init__
(img_rgb, metadata=None, scale=1.0, instance_mode=<ColorMode.IMAGE: 0>)[source]¶ - Parameters
img_rgb – a numpy array of shape (H, W, C), where H and W correspond to the height and width of the image respectively. C is the number of color channels. The image is required to be in RGB format since that is a requirement of the Matplotlib library. The image is also expected to be in the range [0, 255].
metadata (Metadata) – dataset metadata (e.g. class names and colors)
instance_mode (ColorMode) – defines one of the pre-defined style for drawing instances on an image.
-
draw_instance_predictions
(predictions)[source]¶ Draw instance-level prediction results on an image.
- Parameters
predictions (Instances) – the output of an instance detection/segmentation model. Following fields will be used to draw: “pred_boxes”, “pred_classes”, “scores”, “pred_masks” (or “pred_masks_rle”).
- Returns
output (VisImage) – image object with visualizations.
-
draw_sem_seg
(sem_seg, area_threshold=None, alpha=0.8)[source]¶ Draw semantic segmentation predictions/labels.
- Parameters
- Returns
output (VisImage) – image object with visualizations.
-
draw_panoptic_seg
(panoptic_seg, segments_info, area_threshold=None, alpha=0.7)[source]¶ Draw panoptic prediction annotations or results.
- Parameters
panoptic_seg (Tensor) – of shape (height, width) where the values are ids for each segment.
segments_info (list[dict] or None) – Describe each segment in panoptic_seg. If it is a
list[dict]
, each dict contains keys “id”, “category_id”. If None, category id of each pixel is computed bypixel // metadata.label_divisor
.area_threshold (int) – stuff segments with less than area_threshold are not drawn.
- Returns
output (VisImage) – image object with visualizations.
-
draw_dataset_dict
(dic)[source]¶ Draw annotations/segmentaions in Detectron2 Dataset format.
- Parameters
dic (dict) – annotation/segmentation data of one image, in Detectron2 Dataset format.
- Returns
output (VisImage) – image object with visualizations.
-
overlay_instances
(*, boxes=None, labels=None, masks=None, keypoints=None, assigned_colors=None, alpha=0.5)[source]¶ - Parameters
boxes (Boxes, RotatedBoxes or ndarray) – either a
Boxes
, or an Nx4 numpy array of XYXY_ABS format for the N objects in a single image, or aRotatedBoxes
, or an Nx5 numpy array of (x_center, y_center, width, height, angle_degrees) format for the N objects in a single image,labels (list[str]) – the text to be displayed for each instance.
masks (masks-like object) –
Supported types are:
detectron2.structures.PolygonMasks
,detectron2.structures.BitMasks
.list[list[ndarray]]: contains the segmentation masks for all objects in one image. The first level of the list corresponds to individual instances. The second level to all the polygon that compose the instance, and the third level to the polygon coordinates. The third level should have the format of [x0, y0, x1, y1, …, xn, yn] (n >= 3).
list[ndarray]: each ndarray is a binary mask of shape (H, W).
list[dict]: each dict is a COCO-style RLE.
keypoints (Keypoint or array like) – an array-like object of shape (N, K, 3), where the N is the number of instances and K is the number of keypoints. The last dimension corresponds to (x, y, visibility or score).
assigned_colors (list[matplotlib.colors]) – a list of colors, where each color corresponds to each mask or box in the image. Refer to ‘matplotlib.colors’ for full list of formats that the colors are accepted in.
- Returns
output (VisImage) – image object with visualizations.
-
overlay_rotated_instances
(boxes=None, labels=None, assigned_colors=None)[source]¶ - Parameters
boxes (ndarray) – an Nx5 numpy array of (x_center, y_center, width, height, angle_degrees) format for the N objects in a single image.
labels (list[str]) – the text to be displayed for each instance.
assigned_colors (list[matplotlib.colors]) – a list of colors, where each color corresponds to each mask or box in the image. Refer to ‘matplotlib.colors’ for full list of formats that the colors are accepted in.
- Returns
output (VisImage) – image object with visualizations.
-
draw_and_connect_keypoints
(keypoints)[source]¶ Draws keypoints of an instance and follows the rules for keypoint connections to draw lines between appropriate keypoints. This follows color heuristics for line color.
- Parameters
keypoints (Tensor) – a tensor of shape (K, 3), where K is the number of keypoints and the last dimension corresponds to (x, y, probability).
- Returns
output (VisImage) – image object with visualizations.
-
draw_text
(text, position, *, font_size=None, color='g', horizontal_alignment='center', rotation=0)[source]¶ - Parameters
text (str) – class label
position (tuple) – a tuple of the x and y coordinates to place text on image.
font_size (int, optional) – font of the text. If not provided, a font size proportional to the image width is calculated and used.
color – color of the text. Refer to matplotlib.colors for full list of formats that are accepted.
horizontal_alignment (str) – see matplotlib.text.Text
rotation – rotation angle in degrees CCW
- Returns
output (VisImage) – image object with text drawn.
-
draw_box
(box_coord, alpha=0.5, edge_color='g', line_style='-')[source]¶ - Parameters
box_coord (tuple) – a tuple containing x0, y0, x1, y1 coordinates, where x0 and y0 are the coordinates of the image’s top left corner. x1 and y1 are the coordinates of the image’s bottom right corner.
alpha (float) – blending efficient. Smaller values lead to more transparent masks.
edge_color – color of the outline of the box. Refer to matplotlib.colors for full list of formats that are accepted.
line_style (string) – the string to use to create the outline of the boxes.
- Returns
output (VisImage) – image object with box drawn.
-
draw_rotated_box_with_label
(rotated_box, alpha=0.5, edge_color='g', line_style='-', label=None)[source]¶ Draw a rotated box with label on its top-left corner.
- Parameters
rotated_box (tuple) – a tuple containing (cnt_x, cnt_y, w, h, angle), where cnt_x and cnt_y are the center coordinates of the box. w and h are the width and height of the box. angle represents how many degrees the box is rotated CCW with regard to the 0-degree box.
alpha (float) – blending efficient. Smaller values lead to more transparent masks.
edge_color – color of the outline of the box. Refer to matplotlib.colors for full list of formats that are accepted.
line_style (string) – the string to use to create the outline of the boxes.
label (string) – label for rotated box. It will not be rendered when set to None.
- Returns
output (VisImage) – image object with box drawn.
-
draw_circle
(circle_coord, color, radius=3)[source]¶ - Parameters
- Returns
output (VisImage) – image object with box drawn.
-
draw_line
(x_data, y_data, color, linestyle='-', linewidth=None)[source]¶ - Parameters
x_data (list[int]) – a list containing x values of all the points being drawn. Length of list should match the length of y_data.
y_data (list[int]) – a list containing y values of all the points being drawn. Length of list should match the length of x_data.
color – color of the line. Refer to matplotlib.colors for a full list of formats that are accepted.
linestyle – style of the line. Refer to matplotlib.lines.Line2D for a full list of formats that are accepted.
linewidth (float or None) – width of the line. When it’s None, a default value will be computed and used.
- Returns
output (VisImage) – image object with line drawn.
-
draw_binary_mask
(binary_mask, color=None, *, edge_color=None, text=None, alpha=0.5, area_threshold=10)[source]¶ - Parameters
binary_mask (ndarray) – numpy array of shape (H, W), where H is the image height and W is the image width. Each value in the array is either a 0 or 1 value of uint8 type.
color – color of the mask. Refer to matplotlib.colors for a full list of formats that are accepted. If None, will pick a random color.
edge_color – color of the polygon edges. Refer to matplotlib.colors for a full list of formats that are accepted.
text (str) – if None, will be drawn on the object
alpha (float) – blending efficient. Smaller values lead to more transparent masks.
area_threshold (float) – a connected component smaller than this area will not be shown.
- Returns
output (VisImage) – image object with mask drawn.
-
draw_soft_mask
(soft_mask, color=None, *, text=None, alpha=0.5)[source]¶ - Parameters
soft_mask (ndarray) – float array of shape (H, W), each value in [0, 1].
color – color of the mask. Refer to matplotlib.colors for a full list of formats that are accepted. If None, will pick a random color.
text (str) – if None, will be drawn on the object
alpha (float) – blending efficient. Smaller values lead to more transparent masks.
- Returns
output (VisImage) – image object with mask drawn.
-
draw_polygon
(segment, color, edge_color=None, alpha=0.5)[source]¶ - Parameters
segment – numpy array of shape Nx2, containing all the points in the polygon.
color – color of the polygon. Refer to matplotlib.colors for a full list of formats that are accepted.
edge_color – color of the polygon edges. Refer to matplotlib.colors for a full list of formats that are accepted. If not provided, a darker shade of the polygon color will be used instead.
alpha (float) – blending efficient. Smaller values lead to more transparent masks.
- Returns
output (VisImage) – image object with polygon drawn.
-
detectron2.utils.video_visualizer module¶
-
class
detectron2.utils.video_visualizer.
VideoVisualizer
(metadata, instance_mode=<ColorMode.IMAGE: 0>)[source]¶ Bases:
object
-
__init__
(metadata, instance_mode=<ColorMode.IMAGE: 0>)[source]¶ - Parameters
metadata (MetadataCatalog) – image metadata.
-
draw_instance_predictions
(frame, predictions)[source]¶ Draw instance-level prediction results on an image.
- Parameters
frame (ndarray) – an RGB image of shape (H, W, C), in the range [0, 255].
predictions (Instances) – the output of an instance detection/segmentation model. Following fields will be used to draw: “pred_boxes”, “pred_classes”, “scores”, “pred_masks” (or “pred_masks_rle”).
- Returns
output (VisImage) – image object with visualizations.
-