fvcore documentation¶

Detectron2 depends on utilities in fvcore. We include part of fvcore documentation here for easier reference.

fvcore.nn¶

fvcore.nn.activation_count(model: torch.nn.Module, inputs: Tuple[Any, …], supported_ops: Optional[Dict[str, Callable[[List[Any], List[Any]], Union[Counter[str], numbers.Number]]]] = None) → Tuple[DefaultDict[str, float], Counter[str]][source]¶

Given a model and an input to the model, compute the total number of activations of the model.

Parameters

model (nn.Module) – The model to compute activation counts.
inputs (tuple) – Inputs that are passed to model to count activations. Inputs need to be in a tuple.
supported_ops (dict(str,Callable) or None) – provide additional handlers for extra ops, or overwrite the existing handlers for convolution and matmul. The key is operator name and the value is a function that takes (inputs, outputs) of the op.

Returns

tuple[defaultdict, Counter] –

A dictionary that records the number of: activation (mega) for each operation and a Counter that records the number of unsupported operations.

class fvcore.nn.ActivationCountAnalysis(model: torch.nn.Module, inputs: Union[torch.Tensor, Tuple[torch.Tensor, …]])[source]¶

Bases: fvcore.nn.jit_analysis.JitModelAnalysis

Provides access to per-submodule model activation count obtained by tracing a model with pytorch’s jit tracing functionality. By default, comes with standard activation counters for convolutional and dot-product operators.

Handles for additional operators may be added, or the default ones overwritten, using the .set_op_handle(name, func) method. See the method documentation for details.

Activation counts can be obtained as:

.total(module_name=""): total activation count for a module
.by_operator(module_name=""): activation counts for the module, as a Counter over different operator types
.by_module(): Counter of activation counts for all submodules
.by_module_and_operator(): dictionary indexed by descendant of Counters over different operator types

An operator is treated as within a module if it is executed inside the module’s __call__ method. Note that this does not include calls to other methods of the module or explicit calls to module.forward(...).

Example usage:

>>> import torch.nn as nn
>>> import torch
>>> class TestModel(nn.Module):
...     def __init__(self):
...        super().__init__()
...        self.fc = nn.Linear(in_features=1000, out_features=10)
...        self.conv = nn.Conv2d(
...            in_channels=3, out_channels=10, kernel_size=1
...        )
...        self.act = nn.ReLU()
...    def forward(self, x):
...        return self.fc(self.act(self.conv(x)).flatten(1))

>>> model = TestModel()
>>> inputs = (torch.randn((1,3,10,10)),)
>>> acts = ActivationCountAnalysis(model, inputs)
>>> acts.total()
1010
>>> acts.total("fc")
10
>>> acts.by_operator()
Counter({"conv" : 1000, "addmm" : 10})
>>> acts.by_module()
Counter({"" : 1010, "fc" : 10, "conv" : 1000, "act" : 0})
>>> acts.by_module_and_operator()
{"" : Counter({"conv" : 1000, "addmm" : 10}),
 "fc" : Counter({"addmm" : 10}),
 "conv" : Counter({"conv" : 1000}),
 "act" : Counter()
}

__init__(model: torch.nn.Module, inputs: Union[torch.Tensor, Tuple[torch.Tensor, …]]) → None [source]¶

Parameters

model – The model to analyze
inputs – The inputs to the model for analysis.

We will trace the execution of model.forward(inputs). This means inputs have to be tensors or tuple of tensors (see https://pytorch.org/docs/stable/generated/torch.jit.trace.html#torch.jit.trace). In order to trace other methods or unsupported input types, you may need to implement a wrapper module.

ancestor_mode(mode: str) → T¶

Sets how to determine the ancestor modules of an operator. Must be one of “owner” or “caller”.

“caller”: an operator belongs to all modules that is currently executing forward() at the time the operator is called.
“owner”: an operator belongs to the last module that’s executing forward() at the time the operator is called, plus this module’s recursive parents. If an module has multiple parents (e.g. a shared module), only one will be picked.

For most cases, a module only calls submodules it owns, so both options would work identically. In certain edge cases, this option will affect the hierarchy of results, but won’t affect the total count.

by_module() → Counter[str]¶

Returns the statistics for all submodules, aggregated over all operators.

Returns: Counter(str) – statistics counter grouped by submodule names

by_module_and_operator() → Dict[str, Counter[str]]¶

Returns the statistics for all submodules, separated out by operator type for each submodule. The operator handle determines the name associated with each operator type.

Returns: dict(str, Counter(str)) – The statistics for each submodule and each operator. Grouped by submodule names, then by operator name.

by_operator(module_name: str = '') → Counter[str]¶

Returns the statistics for a requested module, grouped by operator type. The operator handle determines the name associated with each operator type.

Parameters: module_name (str) – The submodule to get data for. Defaults to the entire model.
Returns: Counter(str) – The statistics for each operator.

canonical_module_name(name: str) → str ¶

Returns the canonical module name of the given name, which might be different from the given name if the module is shared. This is the name that will be used as a key when statistics are output using .by_module() and .by_module_and_operator().

Parameters: name (str) – The name of the module to find the canonical name for.
Returns: str – The canonical name of the module.

clear_op_handles() → fvcore.nn.jit_analysis.JitModelAnalysis¶: Clears all operator handles currently set.

copy(new_model: Optional[torch.nn.Module] = None, new_inputs: Union[None, torch.Tensor, Tuple[torch.Tensor, …]] = None) → fvcore.nn.jit_analysis.JitModelAnalysis¶

Returns a copy of the JitModelAnalysis object, keeping all settings, but on a new model or new inputs.

Parameters

new_model (nn.Module or None) – a new model for the new JitModelAnalysis. If None, uses the original model.
new_inputs (typing.Tuple[object, ..] or None) – new inputs for the new JitModelAnalysis. If None, uses the original inputs.

Returns

JitModelAnalysis – the new model analysis object

set_op_handle(*args, **kwargs: Optional[Callable[[List[Any], List[Any]], Union[Counter[str], numbers.Number]]]) → fvcore.nn.jit_analysis.JitModelAnalysis¶

Sets additional operator handles, or replaces existing ones.

Parameters

args – (str, Handle) pairs of operator names and handles.
kwargs – mapping from operator names to handles.

If a handle is None, the op will be explicitly ignored. Otherwise, handle should be a function that calculates the desirable statistic from an operator. The function must take two arguments, which are the inputs and outputs of the operator, in the form of list(torch._C.Value). The function should return a counter object with per-operator statistics.

Examples

handlers = {"aten::linear": my_handler}
counter.set_op_handle("aten::matmul", None, "aten::bmm", my_handler2)
       .set_op_handle(**handlers)

total(module_name: str = '') → int ¶

Returns the total aggregated statistic across all operators for the requested module.

Parameters: module_name (str) – The submodule to get data for. Defaults to the entire model.
Returns: int – The aggregated statistic.

tracer_warnings(mode: str) → T¶

Sets which warnings to print when tracing the graph to calculate statistics. There are three modes. Defaults to ‘no_tracer_warning’. Allowed values are:

‘all’ : keeps all warnings raised while tracing
‘no_tracer_warning’ : suppress torch.jit.TracerWarning only
‘none’ : suppress all warnings raised while tracing

Parameters: mode (str) – warning mode in one of the above values.

uncalled_modules() → Set[str]¶

Returns a set of submodules that were never called during the trace of the graph. This may be because they were unused, or because they were accessed via direct calls .forward() or with other python methods. In the latter case, statistics will not be attributed to the submodule, though the statistics will be included in the parent module.

Returns

set(str) –

The set of submodule names that were never called: during the trace of the model.

uncalled_modules_warnings(enabled: bool) → T¶

Sets if warnings from uncalled submodules are shown. Defaults to true. A submodule is considered “uncalled” if it is never called during tracing. This may be because it is actually unused, or because it is accessed via calls to .forward() or other methods of the module. The set of uncalled modules may be obtained from uncalled_modules() regardless of this setting.

Parameters: enabled (bool) – Set to ‘True’ to show warnings.

unsupported_ops(module_name: str = '') → Counter[str]¶

Lists the number of operators that were encountered but unsupported because no operator handle is available for them. Does not include operators that are explicitly ignored.

Parameters: module_name (str) – The submodule to list unsupported ops. Defaults to the entire model.
Returns: Counter(str) – The number of occurences each unsupported operator.

unsupported_ops_warnings(enabled: bool) → T¶

Sets if warnings for unsupported operators are shown. Defaults to True. Counts of unsupported operators may be obtained from unsupported_ops() regardless of this setting.

Parameters: enabled (bool) – Set to ‘True’ to show unsupported operator warnings.

fvcore.nn.flop_count(model: torch.nn.Module, inputs: Tuple[Any, …], supported_ops: Optional[Dict[str, Callable[[List[Any], List[Any]], Union[Counter[str], numbers.Number]]]] = None) → Tuple[DefaultDict[str, float], Counter[str]][source]¶

Given a model and an input to the model, compute the per-operator Gflops of the given model.

Parameters

model (nn.Module) – The model to compute flop counts.
inputs (tuple) – Inputs that are passed to model to count flops. Inputs need to be in a tuple.
supported_ops (dict(str,Callable) or None) – provide additional handlers for extra ops, or overwrite the existing handlers for convolution and matmul and einsum. The key is operator name and the value is a function that takes (inputs, outputs) of the op. We count one Multiply-Add as one FLOP.

Returns

tuple[defaultdict, Counter] –

A dictionary that records the number of: gflops for each operation and a Counter that records the number of unsupported operations.

class fvcore.nn.FlopCountAnalysis(model: torch.nn.Module, inputs: Union[torch.Tensor, Tuple[torch.Tensor, …]])[source]¶

Bases: fvcore.nn.jit_analysis.JitModelAnalysis

Provides access to per-submodule model flop count obtained by tracing a model with pytorch’s jit tracing functionality. By default, comes with standard flop counters for a few common operators. Note that:

Flop is not a well-defined concept. We just produce our best estimate.

We count one fused multiply-add as one flop.

Handles for additional operators may be added, or the default ones overwritten, using the .set_op_handle(name, func) method. See the method documentation for details.

Flop counts can be obtained as:

.total(module_name=""): total flop count for the module
.by_operator(module_name=""): flop counts for the module, as a Counter over different operator types
.by_module(): Counter of flop counts for all submodules
.by_module_and_operator(): dictionary indexed by descendant of Counters over different operator types

An operator is treated as within a module if it is executed inside the module’s __call__ method. Note that this does not include calls to other methods of the module or explicit calls to module.forward(...).

Example usage:

>>> import torch.nn as nn
>>> import torch
>>> class TestModel(nn.Module):
...    def __init__(self):
...        super().__init__()
...        self.fc = nn.Linear(in_features=1000, out_features=10)
...        self.conv = nn.Conv2d(
...            in_channels=3, out_channels=10, kernel_size=1
...        )
...        self.act = nn.ReLU()
...    def forward(self, x):
...        return self.fc(self.act(self.conv(x)).flatten(1))

>>> model = TestModel()
>>> inputs = (torch.randn((1,3,10,10)),)
>>> flops = FlopCountAnalysis(model, inputs)
>>> flops.total()
13000
>>> flops.total("fc")
10000
>>> flops.by_operator()
Counter({"addmm" : 10000, "conv" : 3000})
>>> flops.by_module()
Counter({"" : 13000, "fc" : 10000, "conv" : 3000, "act" : 0})
>>> flops.by_module_and_operator()
{"" : Counter({"addmm" : 10000, "conv" : 3000}),
 "fc" : Counter({"addmm" : 10000}),
 "conv" : Counter({"conv" : 3000}),
 "act" : Counter()
}

__init__(model: torch.nn.Module, inputs: Union[torch.Tensor, Tuple[torch.Tensor, …]]) → None [source]¶

Parameters

model – The model to analyze
inputs – The inputs to the model for analysis.

We will trace the execution of model.forward(inputs). This means inputs have to be tensors or tuple of tensors (see https://pytorch.org/docs/stable/generated/torch.jit.trace.html#torch.jit.trace). In order to trace other methods or unsupported input types, you may need to implement a wrapper module.

ancestor_mode(mode: str) → T¶

Sets how to determine the ancestor modules of an operator. Must be one of “owner” or “caller”.

“caller”: an operator belongs to all modules that is currently executing forward() at the time the operator is called.
“owner”: an operator belongs to the last module that’s executing forward() at the time the operator is called, plus this module’s recursive parents. If an module has multiple parents (e.g. a shared module), only one will be picked.

For most cases, a module only calls submodules it owns, so both options would work identically. In certain edge cases, this option will affect the hierarchy of results, but won’t affect the total count.

by_module() → Counter[str]¶

Returns the statistics for all submodules, aggregated over all operators.

Returns: Counter(str) – statistics counter grouped by submodule names

by_module_and_operator() → Dict[str, Counter[str]]¶

Returns the statistics for all submodules, separated out by operator type for each submodule. The operator handle determines the name associated with each operator type.

Returns: dict(str, Counter(str)) – The statistics for each submodule and each operator. Grouped by submodule names, then by operator name.

by_operator(module_name: str = '') → Counter[str]¶

Returns the statistics for a requested module, grouped by operator type. The operator handle determines the name associated with each operator type.

Parameters: module_name (str) – The submodule to get data for. Defaults to the entire model.
Returns: Counter(str) – The statistics for each operator.

canonical_module_name(name: str) → str ¶

Returns the canonical module name of the given name, which might be different from the given name if the module is shared. This is the name that will be used as a key when statistics are output using .by_module() and .by_module_and_operator().

Parameters: name (str) – The name of the module to find the canonical name for.
Returns: str – The canonical name of the module.

clear_op_handles() → fvcore.nn.jit_analysis.JitModelAnalysis¶: Clears all operator handles currently set.

copy(new_model: Optional[torch.nn.Module] = None, new_inputs: Union[None, torch.Tensor, Tuple[torch.Tensor, …]] = None) → fvcore.nn.jit_analysis.JitModelAnalysis¶

Returns a copy of the JitModelAnalysis object, keeping all settings, but on a new model or new inputs.

Parameters

new_model (nn.Module or None) – a new model for the new JitModelAnalysis. If None, uses the original model.
new_inputs (typing.Tuple[object, ..] or None) – new inputs for the new JitModelAnalysis. If None, uses the original inputs.

Returns

JitModelAnalysis – the new model analysis object

set_op_handle(*args, **kwargs: Optional[Callable[[List[Any], List[Any]], Union[Counter[str], numbers.Number]]]) → fvcore.nn.jit_analysis.JitModelAnalysis¶

Sets additional operator handles, or replaces existing ones.

Parameters

args – (str, Handle) pairs of operator names and handles.
kwargs – mapping from operator names to handles.

If a handle is None, the op will be explicitly ignored. Otherwise, handle should be a function that calculates the desirable statistic from an operator. The function must take two arguments, which are the inputs and outputs of the operator, in the form of list(torch._C.Value). The function should return a counter object with per-operator statistics.

Examples

handlers = {"aten::linear": my_handler}
counter.set_op_handle("aten::matmul", None, "aten::bmm", my_handler2)
       .set_op_handle(**handlers)

total(module_name: str = '') → int ¶

Returns the total aggregated statistic across all operators for the requested module.

Parameters: module_name (str) – The submodule to get data for. Defaults to the entire model.
Returns: int – The aggregated statistic.

tracer_warnings(mode: str) → T¶

Sets which warnings to print when tracing the graph to calculate statistics. There are three modes. Defaults to ‘no_tracer_warning’. Allowed values are:

‘all’ : keeps all warnings raised while tracing
‘no_tracer_warning’ : suppress torch.jit.TracerWarning only
‘none’ : suppress all warnings raised while tracing

Parameters: mode (str) – warning mode in one of the above values.

uncalled_modules() → Set[str]¶

Returns a set of submodules that were never called during the trace of the graph. This may be because they were unused, or because they were accessed via direct calls .forward() or with other python methods. In the latter case, statistics will not be attributed to the submodule, though the statistics will be included in the parent module.

Returns

set(str) –

The set of submodule names that were never called: during the trace of the model.

uncalled_modules_warnings(enabled: bool) → T¶

Sets if warnings from uncalled submodules are shown. Defaults to true. A submodule is considered “uncalled” if it is never called during tracing. This may be because it is actually unused, or because it is accessed via calls to .forward() or other methods of the module. The set of uncalled modules may be obtained from uncalled_modules() regardless of this setting.

Parameters: enabled (bool) – Set to ‘True’ to show warnings.

unsupported_ops(module_name: str = '') → Counter[str]¶

Lists the number of operators that were encountered but unsupported because no operator handle is available for them. Does not include operators that are explicitly ignored.

Parameters: module_name (str) – The submodule to list unsupported ops. Defaults to the entire model.
Returns: Counter(str) – The number of occurences each unsupported operator.

unsupported_ops_warnings(enabled: bool) → T¶

Sets if warnings for unsupported operators are shown. Defaults to True. Counts of unsupported operators may be obtained from unsupported_ops() regardless of this setting.

Parameters: enabled (bool) – Set to ‘True’ to show unsupported operator warnings.

fvcore.nn.sigmoid_focal_loss(inputs: torch.Tensor, targets: torch.Tensor, alpha: float = - 1, gamma: float = 2, reduction: str = 'none') → torch.Tensor [source]¶

Loss used in RetinaNet for dense detection: https://arxiv.org/abs/1708.02002. :param inputs: A float tensor of arbitrary shape.

The predictions for each example.

Parameters

targets –

A float tensor with the same shape as inputs. Stores the binary
classification label for each element in inputs

(0 for the negative class and 1 for the positive class).
alpha – (optional) Weighting factor in range (0,1) to balance positive vs negative examples. Default = -1 (no weighting).
gamma – Exponent of the modulating factor (1 - p_t) to balance easy vs hard examples.
reduction – ‘none’ | ‘mean’ | ‘sum’ ‘none’: No reduction will be applied to the output. ‘mean’: The output will be averaged. ‘sum’: The output will be summed.

Returns

Loss tensor with the reduction option applied.

fvcore.nn.sigmoid_focal_loss_star(inputs: torch.Tensor, targets: torch.Tensor, alpha: float = - 1, gamma: float = 1, reduction: str = 'none') → torch.Tensor [source]¶

FL* described in RetinaNet paper Appendix: https://arxiv.org/abs/1708.02002. :param inputs: A float tensor of arbitrary shape.

The predictions for each example.

Parameters

targets –

A float tensor with the same shape as inputs. Stores the binary
classification label for each element in inputs

(0 for the negative class and 1 for the positive class).
alpha – (optional) Weighting factor in range (0,1) to balance positive vs negative examples. Default = -1 (no weighting).
gamma – Gamma parameter described in FL*. Default = 1 (no weighting).
reduction – ‘none’ | ‘mean’ | ‘sum’ ‘none’: No reduction will be applied to the output. ‘mean’: The output will be averaged. ‘sum’: The output will be summed.

Returns

Loss tensor with the reduction option applied.

fvcore.nn.giou_loss(boxes1: torch.Tensor, boxes2: torch.Tensor, reduction: str = 'none', eps: float = 1e-07) → torch.Tensor [source]¶

Generalized Intersection over Union Loss (Hamid Rezatofighi et. al) https://arxiv.org/abs/1902.09630

Gradient-friendly IoU loss with an additional penalty that is non-zero when the boxes do not overlap and scales with the size of their smallest enclosing box. This loss is symmetric, so the boxes1 and boxes2 arguments are interchangeable.

Parameters

boxes1 (Tensor) – box locations in XYXY format, shape (N, 4) or (4,).
boxes2 (Tensor) – box locations in XYXY format, shape (N, 4) or (4,).
reduction – ‘none’ | ‘mean’ | ‘sum’ ‘none’: No reduction will be applied to the output. ‘mean’: The output will be averaged. ‘sum’: The output will be summed.
eps (float) – small number to prevent division by zero

fvcore.nn.parameter_count(model: torch.nn.Module) → DefaultDict[str, int][source]¶

Count parameters of a model and its submodules.

Parameters: model – a torch module
Returns: dict (str-> int) – the key is either a parameter name or a module name. The value is the number of elements in the parameter, or in all parameters of the module. The key “” corresponds to the total number of parameters of the model.

fvcore.nn.parameter_count_table(model: torch.nn.Module, max_depth: int = 3) → str [source]¶

Format the parameter count of the model (and its submodules or parameters) in a nice table. It looks like this:

| name                            | #elements or shape   |
|:--------------------------------|:---------------------|
| model                           | 37.9M                |
|  backbone                       |  31.5M               |
|   backbone.fpn_lateral3         |   0.1M               |
|    backbone.fpn_lateral3.weight |    (256, 512, 1, 1)  |
|    backbone.fpn_lateral3.bias   |    (256,)            |
|   backbone.fpn_output3          |   0.6M               |
|    backbone.fpn_output3.weight  |    (256, 256, 3, 3)  |
|    backbone.fpn_output3.bias    |    (256,)            |
|   backbone.fpn_lateral4         |   0.3M               |
|    backbone.fpn_lateral4.weight |    (256, 1024, 1, 1) |
|    backbone.fpn_lateral4.bias   |    (256,)            |
|   backbone.fpn_output4          |   0.6M               |
|    backbone.fpn_output4.weight  |    (256, 256, 3, 3)  |
|    backbone.fpn_output4.bias    |    (256,)            |
|   backbone.fpn_lateral5         |   0.5M               |
|    backbone.fpn_lateral5.weight |    (256, 2048, 1, 1) |
|    backbone.fpn_lateral5.bias   |    (256,)            |
|   backbone.fpn_output5          |   0.6M               |
|    backbone.fpn_output5.weight  |    (256, 256, 3, 3)  |
|    backbone.fpn_output5.bias    |    (256,)            |
|   backbone.top_block            |   5.3M               |
|    backbone.top_block.p6        |    4.7M              |
|    backbone.top_block.p7        |    0.6M              |
|   backbone.bottom_up            |   23.5M              |
|    backbone.bottom_up.stem      |    9.4K              |
|    backbone.bottom_up.res2      |    0.2M              |
|    backbone.bottom_up.res3      |    1.2M              |
|    backbone.bottom_up.res4      |    7.1M              |
|    backbone.bottom_up.res5      |    14.9M             |
|    ......                       |    .....             |

Parameters

model – a torch module
max_depth (int) – maximum depth to recursively print submodules or parameters

Returns

str – the table to be printed

fvcore.nn.get_bn_modules(model: torch.nn.Module) → List[torch.nn.Module][source]¶

Find all BatchNorm (BN) modules that are in training mode. See fvcore.precise_bn.BN_MODULE_TYPES for a list of all modules that are included in this search.

Parameters: model (nn.Module) – a model possibly containing BN modules.
Returns: list[nn.Module] – all BN modules in the model.

fvcore.nn.update_bn_stats(model: torch.nn.Module, data_loader: Iterable[Any], num_iters: int = 200, progress: Optional[str] = None) → None [source]¶

Recompute and update the batch norm stats to make them more precise. During training both BN stats and the weight are changing after every iteration, so the running average can not precisely reflect the actual stats of the current model. In this function, the BN stats are recomputed with fixed weights, to make the running average more precise. Specifically, it computes the true average of per-batch mean/variance instead of the running average. See Sec. 3 of the paper “Rethinking Batch in BatchNorm” for details.

Parameters

model (nn.Module) –
the model whose bn stats will be recomputed.

Note that:
1. This function will not alter the training mode of the given model. Users are responsible for setting the layers that needs precise-BN to training mode, prior to calling this function.
2. Be careful if your models contain other stateful layers in addition to BN, i.e. layers whose state can change in forward iterations. This function will alter their state. If you wish them unchanged, you need to either pass in a submodule without those layers, or backup the states.
data_loader (iterator) – an iterator. Produce data as inputs to the model.
num_iters (int) – number of iterations to compute the stats.
progress – None or “tqdm”. If set, use tqdm to report the progress.

fvcore.nn.flop_count_str(flops: fvcore.nn.flop_count.FlopCountAnalysis, activations: Optional[fvcore.nn.activation_count.ActivationCountAnalysis] = None) → str [source]¶

Calculates the parameters and flops of the model with the given inputs and returns a string representation of the model that includes the parameters and flops of every submodule. The string is structured to be similar that given by str(model), though it is not guaranteed to be identical in form if the default string representation of a module has been overridden. If a module has zero parameters and flops, statistics will not be reported for succinctness.

The trace can only register the scope of a module if it is called directly, which means flops (and activations) arising from explicit calls to .forward() or to other python functions of the module will not be attributed to that module. Modules that are never called will have ‘N/A’ listed for their flops; this means they are either unused or their statistics are missing for this reason. Any such flops are still counted towards the parent

Example:

>>> import torch
>>> import torch.nn as nn

>>> class InnerNet(nn.Module):
...     def __init__(self):
...         super().__init__()
...         self.fc1 = nn.Linear(10,10)
...         self.fc2 = nn.Linear(10,10)
...     def forward(self, x):
...         return self.fc1(self.fc2(x))

>>> class TestNet(nn.Module):
...     def __init__(self):
...         super().__init__()
...         self.fc1 = nn.Linear(10,10)
...         self.fc2 = nn.Linear(10,10)
...         self.inner = InnerNet()
...     def forward(self, x):
...         return self.fc1(self.fc2(self.inner(x)))

>>> inputs = torch.randn((1,10))
>>> print(flop_count_str(FlopCountAnalysis(model, inputs)))
TestNet(
  #params: 0.44K, #flops: 0.4K
  (fc1): Linear(
    in_features=10, out_features=10, bias=True
    #params: 0.11K, #flops: 100
  )
  (fc2): Linear(
    in_features=10, out_features=10, bias=True
    #params: 0.11K, #flops: 100
  )
  (inner): InnerNet(
    #params: 0.22K, #flops: 0.2K
    (fc1): Linear(
      in_features=10, out_features=10, bias=True
      #params: 0.11K, #flops: 100
    )
    (fc2): Linear(
      in_features=10, out_features=10, bias=True
      #params: 0.11K, #flops: 100
    )
  )
)

Parameters

flops (FlopCountAnalysis) – the flop counting object
activations (bool) – If given, the activations of each layer will also be calculated and included in the representation.

Returns

str – a string representation of the model with the number of parameters and flops included.

fvcore.nn.flop_count_table(flops: fvcore.nn.flop_count.FlopCountAnalysis, max_depth: int = 3, activations: Optional[fvcore.nn.activation_count.ActivationCountAnalysis] = None, show_param_shapes: bool = True) → str [source]¶

Format the per-module parameters and flops of a model in a table. It looks like this:

| model                            | #parameters or shape   | #flops    |
|:---------------------------------|:-----------------------|:----------|
| model                            | 34.6M                  | 65.7G     |
|  s1                              |  15.4K                 |  4.32G    |
|   s1.pathway0_stem               |   9.54K                |   1.23G   |
|    s1.pathway0_stem.conv         |    9.41K               |    1.23G  |
|    s1.pathway0_stem.bn           |    0.128K              |           |
|   s1.pathway1_stem               |   5.9K                 |   3.08G   |
|    s1.pathway1_stem.conv         |    5.88K               |    3.08G  |
|    s1.pathway1_stem.bn           |    16                  |           |
|  s1_fuse                         |  0.928K                |  29.4M    |
|   s1_fuse.conv_f2s               |   0.896K               |   29.4M   |
|    s1_fuse.conv_f2s.weight       |    (16, 8, 7, 1, 1)    |           |
|   s1_fuse.bn                     |   32                   |           |
|    s1_fuse.bn.weight             |    (16,)               |           |
|    s1_fuse.bn.bias               |    (16,)               |           |
|  s2                              |  0.226M                |  7.73G    |
|   s2.pathway0_res0               |   80.1K                |   2.58G   |
|    s2.pathway0_res0.branch1      |    20.5K               |    0.671G |
|    s2.pathway0_res0.branch1_bn   |    0.512K              |           |
|    s2.pathway0_res0.branch2      |    59.1K               |    1.91G  |
|   s2.pathway0_res1.branch2       |   70.4K                |   2.28G   |
|    s2.pathway0_res1.branch2.a    |    16.4K               |    0.537G |
|    s2.pathway0_res1.branch2.a_bn |    0.128K              |           |
|    s2.pathway0_res1.branch2.b    |    36.9K               |    1.21G  |
|    s2.pathway0_res1.branch2.b_bn |    0.128K              |           |
|    s2.pathway0_res1.branch2.c    |    16.4K               |    0.537G |
|    s2.pathway0_res1.branch2.c_bn |    0.512K              |           |
|   s2.pathway0_res2.branch2       |   70.4K                |   2.28G   |
|    s2.pathway0_res2.branch2.a    |    16.4K               |    0.537G |
|    s2.pathway0_res2.branch2.a_bn |    0.128K              |           |
|    s2.pathway0_res2.branch2.b    |    36.9K               |    1.21G  |
|    s2.pathway0_res2.branch2.b_bn |    0.128K              |           |
|    s2.pathway0_res2.branch2.c    |    16.4K               |    0.537G |
|    s2.pathway0_res2.branch2.c_bn |    0.512K              |           |
|    ............................. |    ......              |    ...... |

Parameters

flops (FlopCountAnalysis) – the flop counting object
max_depth (int) – The max depth of submodules to include in the table. Defaults to 3.
activations (ActivationCountAnalysis or None) – If given, include activation counts as an additional column in the table.
show_param_shapes (bool) – If true, shapes for parameters will be included in the table. Defaults to True.

Returns

str – The formatted table.

Examples:

print(flop_count_table(FlopCountAnalysis(model, inputs)))

fvcore.nn.smooth_l1_loss(input: torch.Tensor, target: torch.Tensor, beta: float, reduction: str = 'none') → torch.Tensor [source]¶

Smooth L1 loss defined in the Fast R-CNN paper as:

              | 0.5 * x ** 2 / beta   if abs(x) < beta
smoothl1(x) = |
              | abs(x) - 0.5 * beta   otherwise,

where x = input - target.

Smooth L1 loss is related to Huber loss, which is defined as:

           | 0.5 * x ** 2                  if abs(x) < beta
huber(x) = |
           | beta * (abs(x) - 0.5 * beta)  otherwise

Smooth L1 loss is equal to huber(x) / beta. This leads to the following differences:

As beta -> 0, Smooth L1 loss converges to L1 loss, while Huber loss converges to a constant 0 loss.

As beta -> +inf, Smooth L1 converges to a constant 0 loss, while Huber loss converges to L2 loss.

For Smooth L1 loss, as beta varies, the L1 segment of the loss has a constant slope of 1. For Huber loss, the slope of the L1 segment is beta.

Smooth L1 loss can be seen as exactly L1 loss, but with the abs(x) < beta portion replaced with a quadratic function such that at abs(x) = beta, its slope is 1. The quadratic segment smooths the L1 loss near x = 0.

Parameters

input (Tensor) – input tensor of any shape
target (Tensor) – target value tensor with the same shape as input
beta (float) – L1 to L2 change point. For beta values < 1e-5, L1 loss is computed.
reduction – ‘none’ | ‘mean’ | ‘sum’ ‘none’: No reduction will be applied to the output. ‘mean’: The output will be averaged. ‘sum’: The output will be summed.

Returns

The loss with the reduction option applied.

Note

PyTorch’s builtin “Smooth L1 loss” implementation does not actually implement Smooth L1 loss, nor does it implement Huber loss. It implements the special case of both in which they are equal (beta=1). See: https://pytorch.org/docs/stable/nn.html#torch.nn.SmoothL1Loss.

fvcore.nn.c2_msra_fill(module: torch.nn.Module) → None [source]¶

Initialize module.weight using the “MSRAFill” implemented in Caffe2. Also initializes module.bias to 0.

Parameters: module (torch.nn.Module) – module to initialize.

fvcore.nn.c2_xavier_fill(module: torch.nn.Module) → None [source]¶

Initialize module.weight using the “XavierFill” implemented in Caffe2. Also initializes module.bias to 0.

Parameters: module (torch.nn.Module) – module to initialize.

fvcore.common¶

class fvcore.common.checkpoint.Checkpointer(model: torch.nn.Module, save_dir: str = '', *, save_to_disk: bool = True, **checkpointables: Any)[source]¶

Bases: object

A checkpointer that can save/load model as well as extra checkpointable objects.

__init__(model: torch.nn.Module, save_dir: str = '', *, save_to_disk: bool = True, **checkpointables: Any) → None [source]¶

Parameters

model (nn.Module) – model.
save_dir (str) – a directory to save and find checkpoints.
save_to_disk (bool) – if True, save checkpoint to disk, otherwise disable saving for this checkpointer.
checkpointables (object) – any checkpointable objects, i.e., objects that have the state_dict() and load_state_dict() method. For example, it can be used like Checkpointer(model, “dir”, optimizer=optimizer).

add_checkpointable(key: str, checkpointable: Any) → None [source]¶

Add checkpointable object for this checkpointer to track.

Parameters

key (str) – the key used to save the object
checkpointable – any object with state_dict() and load_state_dict() method

save(name: str, **kwargs: Any) → None [source]¶

Dump model and checkpointables to a file.

Parameters

name (str) – name of the file.
kwargs (dict) – extra arbitrary data to save.

load(path: str, checkpointables: Optional[List[str]] = None) → Dict[str, Any][source]¶

Load from the given checkpoint.

Parameters

path (str) – path or url to the checkpoint. If empty, will not load anything.
checkpointables (list) – List of checkpointable names to load. If not specified (None), will load all the possible checkpointables.

Returns

dict – extra data loaded from the checkpoint that has not been processed. For example, those saved with save(**extra_data)().

has_checkpoint() → bool [source]¶

Returns: bool – whether a checkpoint exists in the target directory.

get_checkpoint_file() → str [source]¶

Returns: str – The latest checkpoint file in target directory.

get_all_checkpoint_files() → List[str][source]¶

Returns

list –

All available checkpoint files (.pth files) in target: directory.

resume_or_load(path: str, *, resume: bool = True) → Dict[str, Any][source]¶

If resume is True, this method attempts to resume from the last checkpoint, if exists. Otherwise, load checkpoint from the given path. This is useful when restarting an interrupted training job.

Parameters

path (str) – path to the checkpoint.
resume (bool) – if True, resume from the last checkpoint if it exists and load the model together with all the checkpointables. Otherwise only load the model without loading any checkpointables.

Returns

same as load().

tag_last_checkpoint(last_filename_basename: str) → None [source]¶

Tag the last checkpoint.

Parameters: last_filename_basename (str) – the basename of the last filename.

class fvcore.common.checkpoint.PeriodicCheckpointer(checkpointer: fvcore.common.checkpoint.Checkpointer, period: int, max_iter: Optional[int] = None, max_to_keep: Optional[int] = None, file_prefix: str = 'model')[source]¶

Bases: object

Save checkpoints periodically. When .step(iteration) is called, it will execute checkpointer.save on the given checkpointer, if iteration is a multiple of period or if max_iter is reached.

checkpointer¶

the underlying checkpointer object

Type: Checkpointer

__init__(checkpointer: fvcore.common.checkpoint.Checkpointer, period: int, max_iter: Optional[int] = None, max_to_keep: Optional[int] = None, file_prefix: str = 'model') → None [source]¶

Parameters

checkpointer – the checkpointer object used to save checkpoints.
period (int) – the period to save checkpoint.
max_iter (int) – maximum number of iterations. When it is reached, a checkpoint named “{file_prefix}_final” will be saved.
max_to_keep (int) – maximum number of most current checkpoints to keep, previous checkpoints will be deleted
file_prefix (str) – the prefix of checkpoint’s filename

step(iteration: int, **kwargs: Any) → None [source]¶

Perform the appropriate action at the given iteration.

Parameters

iteration (int) – the current iteration, ranged in [0, max_iter-1].
kwargs (Any) – extra data to save, same as in Checkpointer.save().

save(name: str, **kwargs: Any) → None [source]¶

Same argument as Checkpointer.save(). Use this method to manually save checkpoints outside the schedule.

Parameters

name (str) – file name.
kwargs (Any) – extra data to save, same as in Checkpointer.save().

class fvcore.common.config.CfgNode(init_dict=None, key_list=None, new_allowed=False)[source]¶

Bases: yacs.config.CfgNode

Our own extended version of yacs.config.CfgNode. It contains the following extra features:

The merge_from_file() method supports the “_BASE_” key, which allows the new CfgNode to inherit all the attributes from the base configuration file(s).
Keys that start with “COMPUTED_” are treated as insertion-only “computed” attributes. They can be inserted regardless of whether the CfgNode is frozen or not.
With “allow_unsafe=True”, it supports pyyaml tags that evaluate expressions in config. See examples in https://pyyaml.org/wiki/PyYAMLDocumentation#yaml-tags-and-python-types Note that this may lead to arbitrary code execution: you must not load a config file from untrusted sources before manually inspecting the content of the file.

classmethod load_yaml_with_base(filename: str, allow_unsafe: bool = False) → Dict[str, Any][source]¶

Just like yaml.load(open(filename)), but inherit attributes from its: _BASE_.

Parameters

filename (str or file-like object) – the file name or file of the current config. Will be used to find the base config file.
allow_unsafe (bool) – whether to allow loading the config file with yaml.unsafe_load.

Returns

(dict) – the loaded yaml

merge_from_file(cfg_filename: str, allow_unsafe: bool = False) → None [source]¶

Merge configs from a given yaml file.

Parameters

cfg_filename – the file name of the yaml config.
allow_unsafe – whether to allow loading the config file with yaml.unsafe_load.

merge_from_other_cfg(cfg_other: fvcore.common.config.CfgNode) → Callable[], None][source]¶

Parameters: cfg_other (CfgNode) – configs to merge from.

merge_from_list(cfg_list: List[str]) → Callable[], None][source]¶

Parameters: cfg_list (list) – list of configs to merge from.

class fvcore.common.history_buffer.HistoryBuffer(max_length: int = 1000000)[source]¶

Bases: object

Track a series of scalar values and provide access to smoothed values over a window or the global average of the series.

__init__(max_length: int = 1000000) → None [source]¶

Parameters: max_length – maximal number of values that can be stored in the buffer. When the capacity of the buffer is exhausted, old values will be removed.

update(value: float, iteration: Optional[float] = None) → None [source]¶: Add a new scalar value produced at certain iteration. If the length of the buffer exceeds self._max_length, the oldest element will be removed from the buffer.

latest() → float [source]¶: Return the latest scalar value added to the buffer.

median(window_size: int) → float [source]¶: Return the median of the latest window_size values in the buffer.

avg(window_size: int) → float [source]¶: Return the mean of the latest window_size values in the buffer.

global_avg() → float [source]¶: Return the mean of all the elements in the buffer. Note that this includes those getting removed due to limited buffer storage.

values() → List[Tuple[float, float]][source]¶

Returns: list[(number, iteration)] – content of the current buffer.

class fvcore.common.param_scheduler.ParamScheduler[source]¶

Bases: object

Base class for parameter schedulers. A parameter scheduler defines a mapping from a progress value in [0, 1) to a number (e.g. learning rate).

WHERE_EPSILON = 1e-06¶

__call__(where: float) → float [source]¶

Get the value of the param for a given point at training.

We update params (such as learning rate) based on the percent progress of training completed. This allows a scheduler to be agnostic to the exact length of a particular run (e.g. 120 epochs vs 90 epochs), as long as the relative progress where params should be updated is the same. However, it assumes that the total length of training is known.

Parameters: where – A float in [0,1) that represents how far training has progressed

class fvcore.common.param_scheduler.ConstantParamScheduler(value: float)[source]¶

Bases: fvcore.common.param_scheduler.ParamScheduler

Returns a constant value for a param.

WHERE_EPSILON = 1e-06¶

class fvcore.common.param_scheduler.CosineParamScheduler(start_value: float, end_value: float)[source]¶

Bases: fvcore.common.param_scheduler.ParamScheduler

Cosine decay or cosine warmup schedules based on start and end values. The schedule is updated based on the fraction of training progress. The schedule was proposed in ‘SGDR: Stochastic Gradient Descent with Warm Restarts’ (https://arxiv.org/abs/1608.03983). Note that this class only implements the cosine annealing part of SGDR, and not the restarts.

Example

CosineParamScheduler(start_value=0.1, end_value=0.0001)

WHERE_EPSILON = 1e-06¶

class fvcore.common.param_scheduler.ExponentialParamScheduler(start_value: float, decay: float)[source]¶

Bases: fvcore.common.param_scheduler.ParamScheduler

Exponetial schedule parameterized by a start value and decay. The schedule is updated based on the fraction of training progress, where, with the formula param_t = start_value * (decay ** where).

Example

Corresponds to a decreasing schedule with values in [2.0, 0.04).

WHERE_EPSILON = 1e-06¶

class fvcore.common.param_scheduler.LinearParamScheduler(start_value: float, end_value: float)[source]¶

Bases: fvcore.common.param_scheduler.ParamScheduler

Linearly interpolates parameter between start_value and end_value. Can be used for either warmup or decay based on start and end values. The schedule is updated after every train step by default.

Example

LinearParamScheduler(start_value=0.0001, end_value=0.01)

Corresponds to a linear increasing schedule with values in [0.0001, 0.01)

WHERE_EPSILON = 1e-06¶

class fvcore.common.param_scheduler.CompositeParamScheduler(schedulers: Sequence[fvcore.common.param_scheduler.ParamScheduler], lengths: List[float], interval_scaling: Sequence[str])[source]¶

Bases: fvcore.common.param_scheduler.ParamScheduler

Composite parameter scheduler composed of intermediate schedulers. Takes a list of schedulers and a list of lengths corresponding to percentage of training each scheduler should run for. Schedulers are run in order. All values in lengths should sum to 1.0.

Each scheduler also has a corresponding interval scale. If interval scale is ‘fixed’, the intermediate scheduler will be run without any rescaling of the time. If interval scale is ‘rescaled’, intermediate scheduler is run such that each scheduler will start and end at the same values as it would if it were the only scheduler. Default is ‘rescaled’ for all schedulers.

Example

schedulers = [
  ConstantParamScheduler(value=0.42),
  CosineParamScheduler(start_value=0.42, end_value=1e-4)
]
CompositeParamScheduler(
  schedulers=schedulers,
  interval_scaling=['rescaled', 'rescaled'],
  lengths=[0.3, 0.7])

The parameter value will be 0.42 for the first [0%, 30%) of steps, and then will cosine decay from 0.42 to 0.0001 for [30%, 100%) of training.

WHERE_EPSILON = 1e-06¶

class fvcore.common.param_scheduler.MultiStepParamScheduler(values: List[float], num_updates: Optional[int] = None, milestones: Optional[List[int]] = None)[source]¶

Bases: fvcore.common.param_scheduler.ParamScheduler

Takes a predefined schedule for a param value, and a list of epochs or steps which stand for the upper boundary (excluded) of each range.

Example

MultiStepParamScheduler(
  values=[0.1, 0.01, 0.001, 0.0001],
  milestones=[30, 60, 80, 120]
)

Then the param value will be 0.1 for epochs 0-29, 0.01 for epochs 30-59, 0.001 for epochs 60-79, 0.0001 for epochs 80-120. Note that the length of values must be equal to the length of milestones plus one.

__init__(values: List[float], num_updates: Optional[int] = None, milestones: Optional[List[int]] = None) → None [source]¶

Parameters

values – param value in each range
num_updates – the end of the last range. If None, will use milestones[-1]
milestones – the boundary of each range. If None, will evenly split num_updates

For example, all the following combinations define the same scheduler:

num_updates=90, milestones=[30, 60], values=[1, 0.1, 0.01]
num_updates=90, values=[1, 0.1, 0.01]
milestones=[30, 60, 90], values=[1, 0.1, 0.01]
milestones=[3, 6, 9], values=[1, 0.1, 0.01] (ParamScheduler is scale-invariant)

WHERE_EPSILON = 1e-06¶

class fvcore.common.param_scheduler.StepParamScheduler(num_updates: Union[int, float], values: List[float])[source]¶

Bases: fvcore.common.param_scheduler.ParamScheduler

Takes a fixed schedule for a param value. If the length of the fixed schedule is less than the number of epochs, then the epochs are divided evenly among the param schedule. The schedule is updated after every train epoch by default.

Example

StepParamScheduler(values=[0.1, 0.01, 0.001, 0.0001], num_updates=120)

Then the param value will be 0.1 for epochs 0-29, 0.01 for epochs 30-59, 0.001 for epoch 60-89, 0.0001 for epochs 90-119.

WHERE_EPSILON = 1e-06¶

class fvcore.common.param_scheduler.StepWithFixedGammaParamScheduler(base_value: float, num_decays: int, gamma: float, num_updates: int)[source]¶

Bases: fvcore.common.param_scheduler.ParamScheduler

Decays the param value by gamma at equal number of steps so as to have the specified total number of decays.

Example

StepWithFixedGammaParamScheduler(
  base_value=0.1, gamma=0.1, num_decays=3, num_updates=120)

Then the param value will be 0.1 for epochs 0-29, 0.01 for epochs 30-59, 0.001 for epoch 60-89, 0.0001 for epochs 90-119.

WHERE_EPSILON = 1e-06¶

class fvcore.common.param_scheduler.PolynomialDecayParamScheduler(base_value: float, power: float)[source]¶

Bases: fvcore.common.param_scheduler.ParamScheduler

Decays the param value after every epoch according to a polynomial function with a fixed power. The schedule is updated after every train step by default.

Example

PolynomialDecayParamScheduler(base_value=0.1, power=0.9)

Then the param value will be 0.1 for epoch 0, 0.099 for epoch 1, and so on.

WHERE_EPSILON = 1e-06¶

class fvcore.common.registry.Registry(*args, **kwds)[source]¶

Bases: collections.abc.Iterable, typing.Generic

The registry that provides name -> object mapping, to support third-party users’ custom modules.

To create a registry (e.g. a backbone registry):

BACKBONE_REGISTRY = Registry('BACKBONE')

To register an object:

@BACKBONE_REGISTRY.register()
class MyBackbone():
    ...

Or:

BACKBONE_REGISTRY.register(MyBackbone)

__init__(name: str) → None [source]¶

Parameters: name (str) – the name of this registry

register(obj: Any = None) → Any[source]¶: Register the given object under the the name obj.__name__. Can be used as either a decorator or not. See docstring of this class for usage.

get(name: str) → Any[source]¶

class fvcore.common.timer.Timer[source]¶

Bases: object

A timer which computes the time elapsed since the start/reset of the timer.

reset() → None [source]¶: Reset the timer.

pause() → None [source]¶: Pause the timer.

is_paused() → bool [source]¶

Returns: bool – whether the timer is currently paused

resume() → None [source]¶: Resume the timer.

seconds() → float [source]¶

Returns

(float) –

the total number of seconds since the start/reset of the: timer, excluding the time when the timer is paused.

avg_seconds() → float [source]¶

Returns: (float) – the average number of seconds between every start/reset and pause.