fvcore documentation¶
Detectron2 depends on utilities in fvcore. We include part of fvcore documentation here for easier reference.
fvcore.nn¶
-
fvcore.nn.
activation_count
(model: torch.nn.Module, inputs: Tuple[Any, …], supported_ops: Optional[Dict[str, Callable[[List[Any], List[Any]], Union[Counter[str], numbers.Number]]]] = None) → Tuple[DefaultDict[str, float], Counter[str]][source]¶ Given a model and an input to the model, compute the total number of activations of the model.
- Parameters
model (nn.Module) – The model to compute activation counts.
inputs (tuple) – Inputs that are passed to model to count activations. Inputs need to be in a tuple.
supported_ops (dict(str,Callable) or None) – provide additional handlers for extra ops, or overwrite the existing handlers for convolution and matmul. The key is operator name and the value is a function that takes (inputs, outputs) of the op.
- Returns
tuple[defaultdict, Counter] –
- A dictionary that records the number of
activation (mega) for each operation and a Counter that records the number of unsupported operations.
-
class
fvcore.nn.
ActivationCountAnalysis
(model: torch.nn.Module, inputs: Union[torch.Tensor, Tuple[torch.Tensor, …]])[source]¶ Bases:
fvcore.nn.jit_analysis.JitModelAnalysis
Provides access to per-submodule model activation count obtained by tracing a model with pytorch’s jit tracing functionality. By default, comes with standard activation counters for convolutional and dot-product operators.
Handles for additional operators may be added, or the default ones overwritten, using the
.set_op_handle(name, func)
method. See the method documentation for details.Activation counts can be obtained as:
.total(module_name="")
: total activation count for a module.by_operator(module_name="")
: activation counts for the module, as a Counter over different operator types.by_module()
: Counter of activation counts for all submodules.by_module_and_operator()
: dictionary indexed by descendant of Counters over different operator types
An operator is treated as within a module if it is executed inside the module’s
__call__
method. Note that this does not include calls to other methods of the module or explicit calls tomodule.forward(...)
.Example usage:
>>> import torch.nn as nn >>> import torch >>> class TestModel(nn.Module): ... def __init__(self): ... super().__init__() ... self.fc = nn.Linear(in_features=1000, out_features=10) ... self.conv = nn.Conv2d( ... in_channels=3, out_channels=10, kernel_size=1 ... ) ... self.act = nn.ReLU() ... def forward(self, x): ... return self.fc(self.act(self.conv(x)).flatten(1))
>>> model = TestModel() >>> inputs = (torch.randn((1,3,10,10)),) >>> acts = ActivationCountAnalysis(model, inputs) >>> acts.total() 1010 >>> acts.total("fc") 10 >>> acts.by_operator() Counter({"conv" : 1000, "addmm" : 10}) >>> acts.by_module() Counter({"" : 1010, "fc" : 10, "conv" : 1000, "act" : 0}) >>> acts.by_module_and_operator() {"" : Counter({"conv" : 1000, "addmm" : 10}), "fc" : Counter({"addmm" : 10}), "conv" : Counter({"conv" : 1000}), "act" : Counter() }
-
__init__
(model: torch.nn.Module, inputs: Union[torch.Tensor, Tuple[torch.Tensor, …]]) → None[source]¶ - Parameters
model – The model to analyze
inputs – The inputs to the model for analysis.
We will trace the execution of model.forward(inputs). This means inputs have to be tensors or tuple of tensors (see https://pytorch.org/docs/stable/generated/torch.jit.trace.html#torch.jit.trace). In order to trace other methods or unsupported input types, you may need to implement a wrapper module.
-
ancestor_mode
(mode: str) → T¶ Sets how to determine the ancestor modules of an operator. Must be one of “owner” or “caller”.
“caller”: an operator belongs to all modules that is currently executing forward() at the time the operator is called.
“owner”: an operator belongs to the last module that’s executing forward() at the time the operator is called, plus this module’s recursive parents. If an module has multiple parents (e.g. a shared module), only one will be picked.
For most cases, a module only calls submodules it owns, so both options would work identically. In certain edge cases, this option will affect the hierarchy of results, but won’t affect the total count.
-
by_module
() → Counter[str]¶ Returns the statistics for all submodules, aggregated over all operators.
- Returns
Counter(str) – statistics counter grouped by submodule names
-
by_module_and_operator
() → Dict[str, Counter[str]]¶ Returns the statistics for all submodules, separated out by operator type for each submodule. The operator handle determines the name associated with each operator type.
- Returns
dict(str, Counter(str)) – The statistics for each submodule and each operator. Grouped by submodule names, then by operator name.
-
by_operator
(module_name: str = '') → Counter[str]¶ Returns the statistics for a requested module, grouped by operator type. The operator handle determines the name associated with each operator type.
- Parameters
module_name (str) – The submodule to get data for. Defaults to the entire model.
- Returns
Counter(str) – The statistics for each operator.
-
canonical_module_name
(name: str) → str¶ Returns the canonical module name of the given
name
, which might be different from the givenname
if the module is shared. This is the name that will be used as a key when statistics are output using .by_module() and .by_module_and_operator().- Parameters
name (str) – The name of the module to find the canonical name for.
- Returns
str – The canonical name of the module.
-
clear_op_handles
() → fvcore.nn.jit_analysis.JitModelAnalysis¶ Clears all operator handles currently set.
-
copy
(new_model: Optional[torch.nn.Module] = None, new_inputs: Union[None, torch.Tensor, Tuple[torch.Tensor, …]] = None) → fvcore.nn.jit_analysis.JitModelAnalysis¶ Returns a copy of the
JitModelAnalysis
object, keeping all settings, but on a new model or new inputs.
-
set_op_handle
(*args, **kwargs: Optional[Callable[[List[Any], List[Any]], Union[Counter[str], numbers.Number]]]) → fvcore.nn.jit_analysis.JitModelAnalysis¶ Sets additional operator handles, or replaces existing ones.
- Parameters
args – (str, Handle) pairs of operator names and handles.
kwargs – mapping from operator names to handles.
If a handle is
None
, the op will be explicitly ignored. Otherwise, handle should be a function that calculates the desirable statistic from an operator. The function must take two arguments, which are the inputs and outputs of the operator, in the form oflist(torch._C.Value)
. The function should return a counter object with per-operator statistics.Examples
handlers = {"aten::linear": my_handler} counter.set_op_handle("aten::matmul", None, "aten::bmm", my_handler2) .set_op_handle(**handlers)
-
total
(module_name: str = '') → int¶ Returns the total aggregated statistic across all operators for the requested module.
- Parameters
module_name (str) – The submodule to get data for. Defaults to the entire model.
- Returns
int – The aggregated statistic.
-
tracer_warnings
(mode: str) → T¶ Sets which warnings to print when tracing the graph to calculate statistics. There are three modes. Defaults to ‘no_tracer_warning’. Allowed values are:
‘all’ : keeps all warnings raised while tracing
‘no_tracer_warning’ : suppress torch.jit.TracerWarning only
‘none’ : suppress all warnings raised while tracing
- Parameters
mode (str) – warning mode in one of the above values.
-
uncalled_modules
() → Set[str]¶ Returns a set of submodules that were never called during the trace of the graph. This may be because they were unused, or because they were accessed via direct calls .forward() or with other python methods. In the latter case, statistics will not be attributed to the submodule, though the statistics will be included in the parent module.
- Returns
set(str) –
- The set of submodule names that were never called
during the trace of the model.
-
uncalled_modules_warnings
(enabled: bool) → T¶ Sets if warnings from uncalled submodules are shown. Defaults to true. A submodule is considered “uncalled” if it is never called during tracing. This may be because it is actually unused, or because it is accessed via calls to
.forward()
or other methods of the module. The set of uncalled modules may be obtained fromuncalled_modules()
regardless of this setting.- Parameters
enabled (bool) – Set to ‘True’ to show warnings.
-
unsupported_ops
(module_name: str = '') → Counter[str]¶ Lists the number of operators that were encountered but unsupported because no operator handle is available for them. Does not include operators that are explicitly ignored.
- Parameters
module_name (str) – The submodule to list unsupported ops. Defaults to the entire model.
- Returns
Counter(str) – The number of occurences each unsupported operator.
-
unsupported_ops_warnings
(enabled: bool) → T¶ Sets if warnings for unsupported operators are shown. Defaults to True. Counts of unsupported operators may be obtained from
unsupported_ops()
regardless of this setting.- Parameters
enabled (bool) – Set to ‘True’ to show unsupported operator warnings.
-
fvcore.nn.
flop_count
(model: torch.nn.Module, inputs: Tuple[Any, …], supported_ops: Optional[Dict[str, Callable[[List[Any], List[Any]], Union[Counter[str], numbers.Number]]]] = None) → Tuple[DefaultDict[str, float], Counter[str]][source]¶ Given a model and an input to the model, compute the per-operator Gflops of the given model.
- Parameters
model (nn.Module) – The model to compute flop counts.
inputs (tuple) – Inputs that are passed to model to count flops. Inputs need to be in a tuple.
supported_ops (dict(str,Callable) or None) – provide additional handlers for extra ops, or overwrite the existing handlers for convolution and matmul and einsum. The key is operator name and the value is a function that takes (inputs, outputs) of the op. We count one Multiply-Add as one FLOP.
- Returns
tuple[defaultdict, Counter] –
- A dictionary that records the number of
gflops for each operation and a Counter that records the number of unsupported operations.
-
class
fvcore.nn.
FlopCountAnalysis
(model: torch.nn.Module, inputs: Union[torch.Tensor, Tuple[torch.Tensor, …]])[source]¶ Bases:
fvcore.nn.jit_analysis.JitModelAnalysis
Provides access to per-submodule model flop count obtained by tracing a model with pytorch’s jit tracing functionality. By default, comes with standard flop counters for a few common operators. Note that:
Flop is not a well-defined concept. We just produce our best estimate.
We count one fused multiply-add as one flop.
Handles for additional operators may be added, or the default ones overwritten, using the
.set_op_handle(name, func)
method. See the method documentation for details.Flop counts can be obtained as:
.total(module_name="")
: total flop count for the module.by_operator(module_name="")
: flop counts for the module, as a Counter over different operator types.by_module()
: Counter of flop counts for all submodules.by_module_and_operator()
: dictionary indexed by descendant of Counters over different operator types
An operator is treated as within a module if it is executed inside the module’s
__call__
method. Note that this does not include calls to other methods of the module or explicit calls tomodule.forward(...)
.Example usage:
>>> import torch.nn as nn >>> import torch >>> class TestModel(nn.Module): ... def __init__(self): ... super().__init__() ... self.fc = nn.Linear(in_features=1000, out_features=10) ... self.conv = nn.Conv2d( ... in_channels=3, out_channels=10, kernel_size=1 ... ) ... self.act = nn.ReLU() ... def forward(self, x): ... return self.fc(self.act(self.conv(x)).flatten(1))
>>> model = TestModel() >>> inputs = (torch.randn((1,3,10,10)),) >>> flops = FlopCountAnalysis(model, inputs) >>> flops.total() 13000 >>> flops.total("fc") 10000 >>> flops.by_operator() Counter({"addmm" : 10000, "conv" : 3000}) >>> flops.by_module() Counter({"" : 13000, "fc" : 10000, "conv" : 3000, "act" : 0}) >>> flops.by_module_and_operator() {"" : Counter({"addmm" : 10000, "conv" : 3000}), "fc" : Counter({"addmm" : 10000}), "conv" : Counter({"conv" : 3000}), "act" : Counter() }
-
__init__
(model: torch.nn.Module, inputs: Union[torch.Tensor, Tuple[torch.Tensor, …]]) → None[source]¶ - Parameters
model – The model to analyze
inputs – The inputs to the model for analysis.
We will trace the execution of model.forward(inputs). This means inputs have to be tensors or tuple of tensors (see https://pytorch.org/docs/stable/generated/torch.jit.trace.html#torch.jit.trace). In order to trace other methods or unsupported input types, you may need to implement a wrapper module.
-
ancestor_mode
(mode: str) → T¶ Sets how to determine the ancestor modules of an operator. Must be one of “owner” or “caller”.
“caller”: an operator belongs to all modules that is currently executing forward() at the time the operator is called.
“owner”: an operator belongs to the last module that’s executing forward() at the time the operator is called, plus this module’s recursive parents. If an module has multiple parents (e.g. a shared module), only one will be picked.
For most cases, a module only calls submodules it owns, so both options would work identically. In certain edge cases, this option will affect the hierarchy of results, but won’t affect the total count.
-
by_module
() → Counter[str]¶ Returns the statistics for all submodules, aggregated over all operators.
- Returns
Counter(str) – statistics counter grouped by submodule names
-
by_module_and_operator
() → Dict[str, Counter[str]]¶ Returns the statistics for all submodules, separated out by operator type for each submodule. The operator handle determines the name associated with each operator type.
- Returns
dict(str, Counter(str)) – The statistics for each submodule and each operator. Grouped by submodule names, then by operator name.
-
by_operator
(module_name: str = '') → Counter[str]¶ Returns the statistics for a requested module, grouped by operator type. The operator handle determines the name associated with each operator type.
- Parameters
module_name (str) – The submodule to get data for. Defaults to the entire model.
- Returns
Counter(str) – The statistics for each operator.
-
canonical_module_name
(name: str) → str¶ Returns the canonical module name of the given
name
, which might be different from the givenname
if the module is shared. This is the name that will be used as a key when statistics are output using .by_module() and .by_module_and_operator().- Parameters
name (str) – The name of the module to find the canonical name for.
- Returns
str – The canonical name of the module.
-
clear_op_handles
() → fvcore.nn.jit_analysis.JitModelAnalysis¶ Clears all operator handles currently set.
-
copy
(new_model: Optional[torch.nn.Module] = None, new_inputs: Union[None, torch.Tensor, Tuple[torch.Tensor, …]] = None) → fvcore.nn.jit_analysis.JitModelAnalysis¶ Returns a copy of the
JitModelAnalysis
object, keeping all settings, but on a new model or new inputs.
-
set_op_handle
(*args, **kwargs: Optional[Callable[[List[Any], List[Any]], Union[Counter[str], numbers.Number]]]) → fvcore.nn.jit_analysis.JitModelAnalysis¶ Sets additional operator handles, or replaces existing ones.
- Parameters
args – (str, Handle) pairs of operator names and handles.
kwargs – mapping from operator names to handles.
If a handle is
None
, the op will be explicitly ignored. Otherwise, handle should be a function that calculates the desirable statistic from an operator. The function must take two arguments, which are the inputs and outputs of the operator, in the form oflist(torch._C.Value)
. The function should return a counter object with per-operator statistics.Examples
handlers = {"aten::linear": my_handler} counter.set_op_handle("aten::matmul", None, "aten::bmm", my_handler2) .set_op_handle(**handlers)
-
total
(module_name: str = '') → int¶ Returns the total aggregated statistic across all operators for the requested module.
- Parameters
module_name (str) – The submodule to get data for. Defaults to the entire model.
- Returns
int – The aggregated statistic.
-
tracer_warnings
(mode: str) → T¶ Sets which warnings to print when tracing the graph to calculate statistics. There are three modes. Defaults to ‘no_tracer_warning’. Allowed values are:
‘all’ : keeps all warnings raised while tracing
‘no_tracer_warning’ : suppress torch.jit.TracerWarning only
‘none’ : suppress all warnings raised while tracing
- Parameters
mode (str) – warning mode in one of the above values.
-
uncalled_modules
() → Set[str]¶ Returns a set of submodules that were never called during the trace of the graph. This may be because they were unused, or because they were accessed via direct calls .forward() or with other python methods. In the latter case, statistics will not be attributed to the submodule, though the statistics will be included in the parent module.
- Returns
set(str) –
- The set of submodule names that were never called
during the trace of the model.
-
uncalled_modules_warnings
(enabled: bool) → T¶ Sets if warnings from uncalled submodules are shown. Defaults to true. A submodule is considered “uncalled” if it is never called during tracing. This may be because it is actually unused, or because it is accessed via calls to
.forward()
or other methods of the module. The set of uncalled modules may be obtained fromuncalled_modules()
regardless of this setting.- Parameters
enabled (bool) – Set to ‘True’ to show warnings.
-
unsupported_ops
(module_name: str = '') → Counter[str]¶ Lists the number of operators that were encountered but unsupported because no operator handle is available for them. Does not include operators that are explicitly ignored.
- Parameters
module_name (str) – The submodule to list unsupported ops. Defaults to the entire model.
- Returns
Counter(str) – The number of occurences each unsupported operator.
-
unsupported_ops_warnings
(enabled: bool) → T¶ Sets if warnings for unsupported operators are shown. Defaults to True. Counts of unsupported operators may be obtained from
unsupported_ops()
regardless of this setting.- Parameters
enabled (bool) – Set to ‘True’ to show unsupported operator warnings.
-
fvcore.nn.
sigmoid_focal_loss
(inputs: torch.Tensor, targets: torch.Tensor, alpha: float = - 1, gamma: float = 2, reduction: str = 'none') → torch.Tensor[source]¶ Loss used in RetinaNet for dense detection: https://arxiv.org/abs/1708.02002. :param inputs: A float tensor of arbitrary shape.
The predictions for each example.
- Parameters
targets –
- A float tensor with the same shape as inputs. Stores the binary
classification label for each element in inputs
(0 for the negative class and 1 for the positive class).
alpha – (optional) Weighting factor in range (0,1) to balance positive vs negative examples. Default = -1 (no weighting).
gamma – Exponent of the modulating factor (1 - p_t) to balance easy vs hard examples.
reduction – ‘none’ | ‘mean’ | ‘sum’ ‘none’: No reduction will be applied to the output. ‘mean’: The output will be averaged. ‘sum’: The output will be summed.
- Returns
Loss tensor with the reduction option applied.
-
fvcore.nn.
sigmoid_focal_loss_star
(inputs: torch.Tensor, targets: torch.Tensor, alpha: float = - 1, gamma: float = 1, reduction: str = 'none') → torch.Tensor[source]¶ FL* described in RetinaNet paper Appendix: https://arxiv.org/abs/1708.02002. :param inputs: A float tensor of arbitrary shape.
The predictions for each example.
- Parameters
targets –
- A float tensor with the same shape as inputs. Stores the binary
classification label for each element in inputs
(0 for the negative class and 1 for the positive class).
alpha – (optional) Weighting factor in range (0,1) to balance positive vs negative examples. Default = -1 (no weighting).
gamma – Gamma parameter described in FL*. Default = 1 (no weighting).
reduction – ‘none’ | ‘mean’ | ‘sum’ ‘none’: No reduction will be applied to the output. ‘mean’: The output will be averaged. ‘sum’: The output will be summed.
- Returns
Loss tensor with the reduction option applied.
-
fvcore.nn.
giou_loss
(boxes1: torch.Tensor, boxes2: torch.Tensor, reduction: str = 'none', eps: float = 1e-07) → torch.Tensor[source]¶ Generalized Intersection over Union Loss (Hamid Rezatofighi et. al) https://arxiv.org/abs/1902.09630
Gradient-friendly IoU loss with an additional penalty that is non-zero when the boxes do not overlap and scales with the size of their smallest enclosing box. This loss is symmetric, so the boxes1 and boxes2 arguments are interchangeable.
- Parameters
boxes1 (Tensor) – box locations in XYXY format, shape (N, 4) or (4,).
boxes2 (Tensor) – box locations in XYXY format, shape (N, 4) or (4,).
reduction – ‘none’ | ‘mean’ | ‘sum’ ‘none’: No reduction will be applied to the output. ‘mean’: The output will be averaged. ‘sum’: The output will be summed.
eps (float) – small number to prevent division by zero
-
fvcore.nn.
parameter_count
(model: torch.nn.Module) → DefaultDict[str, int][source]¶ Count parameters of a model and its submodules.
- Parameters
model – a torch module
- Returns
dict (str-> int) – the key is either a parameter name or a module name. The value is the number of elements in the parameter, or in all parameters of the module. The key “” corresponds to the total number of parameters of the model.
-
fvcore.nn.
parameter_count_table
(model: torch.nn.Module, max_depth: int = 3) → str[source]¶ Format the parameter count of the model (and its submodules or parameters) in a nice table. It looks like this:
| name | #elements or shape | |:--------------------------------|:---------------------| | model | 37.9M | | backbone | 31.5M | | backbone.fpn_lateral3 | 0.1M | | backbone.fpn_lateral3.weight | (256, 512, 1, 1) | | backbone.fpn_lateral3.bias | (256,) | | backbone.fpn_output3 | 0.6M | | backbone.fpn_output3.weight | (256, 256, 3, 3) | | backbone.fpn_output3.bias | (256,) | | backbone.fpn_lateral4 | 0.3M | | backbone.fpn_lateral4.weight | (256, 1024, 1, 1) | | backbone.fpn_lateral4.bias | (256,) | | backbone.fpn_output4 | 0.6M | | backbone.fpn_output4.weight | (256, 256, 3, 3) | | backbone.fpn_output4.bias | (256,) | | backbone.fpn_lateral5 | 0.5M | | backbone.fpn_lateral5.weight | (256, 2048, 1, 1) | | backbone.fpn_lateral5.bias | (256,) | | backbone.fpn_output5 | 0.6M | | backbone.fpn_output5.weight | (256, 256, 3, 3) | | backbone.fpn_output5.bias | (256,) | | backbone.top_block | 5.3M | | backbone.top_block.p6 | 4.7M | | backbone.top_block.p7 | 0.6M | | backbone.bottom_up | 23.5M | | backbone.bottom_up.stem | 9.4K | | backbone.bottom_up.res2 | 0.2M | | backbone.bottom_up.res3 | 1.2M | | backbone.bottom_up.res4 | 7.1M | | backbone.bottom_up.res5 | 14.9M | | ...... | ..... |
- Parameters
model – a torch module
max_depth (int) – maximum depth to recursively print submodules or parameters
- Returns
str – the table to be printed
-
fvcore.nn.
get_bn_modules
(model: torch.nn.Module) → List[torch.nn.Module][source]¶ Find all BatchNorm (BN) modules that are in training mode. See fvcore.precise_bn.BN_MODULE_TYPES for a list of all modules that are included in this search.
- Parameters
model (nn.Module) – a model possibly containing BN modules.
- Returns
list[nn.Module] – all BN modules in the model.
-
fvcore.nn.
update_bn_stats
(model: torch.nn.Module, data_loader: Iterable[Any], num_iters: int = 200, progress: Optional[str] = None) → None[source]¶ Recompute and update the batch norm stats to make them more precise. During training both BN stats and the weight are changing after every iteration, so the running average can not precisely reflect the actual stats of the current model. In this function, the BN stats are recomputed with fixed weights, to make the running average more precise. Specifically, it computes the true average of per-batch mean/variance instead of the running average. See Sec. 3 of the paper “Rethinking Batch in BatchNorm” for details.
- Parameters
model (nn.Module) –
the model whose bn stats will be recomputed.
Note that:
This function will not alter the training mode of the given model. Users are responsible for setting the layers that needs precise-BN to training mode, prior to calling this function.
Be careful if your models contain other stateful layers in addition to BN, i.e. layers whose state can change in forward iterations. This function will alter their state. If you wish them unchanged, you need to either pass in a submodule without those layers, or backup the states.
data_loader (iterator) – an iterator. Produce data as inputs to the model.
num_iters (int) – number of iterations to compute the stats.
progress – None or “tqdm”. If set, use tqdm to report the progress.
-
fvcore.nn.
flop_count_str
(flops: fvcore.nn.flop_count.FlopCountAnalysis, activations: Optional[fvcore.nn.activation_count.ActivationCountAnalysis] = None) → str[source]¶ Calculates the parameters and flops of the model with the given inputs and returns a string representation of the model that includes the parameters and flops of every submodule. The string is structured to be similar that given by str(model), though it is not guaranteed to be identical in form if the default string representation of a module has been overridden. If a module has zero parameters and flops, statistics will not be reported for succinctness.
The trace can only register the scope of a module if it is called directly, which means flops (and activations) arising from explicit calls to .forward() or to other python functions of the module will not be attributed to that module. Modules that are never called will have ‘N/A’ listed for their flops; this means they are either unused or their statistics are missing for this reason. Any such flops are still counted towards the parent
Example:
>>> import torch >>> import torch.nn as nn
>>> class InnerNet(nn.Module): ... def __init__(self): ... super().__init__() ... self.fc1 = nn.Linear(10,10) ... self.fc2 = nn.Linear(10,10) ... def forward(self, x): ... return self.fc1(self.fc2(x))
>>> class TestNet(nn.Module): ... def __init__(self): ... super().__init__() ... self.fc1 = nn.Linear(10,10) ... self.fc2 = nn.Linear(10,10) ... self.inner = InnerNet() ... def forward(self, x): ... return self.fc1(self.fc2(self.inner(x)))
>>> inputs = torch.randn((1,10)) >>> print(flop_count_str(FlopCountAnalysis(model, inputs))) TestNet( #params: 0.44K, #flops: 0.4K (fc1): Linear( in_features=10, out_features=10, bias=True #params: 0.11K, #flops: 100 ) (fc2): Linear( in_features=10, out_features=10, bias=True #params: 0.11K, #flops: 100 ) (inner): InnerNet( #params: 0.22K, #flops: 0.2K (fc1): Linear( in_features=10, out_features=10, bias=True #params: 0.11K, #flops: 100 ) (fc2): Linear( in_features=10, out_features=10, bias=True #params: 0.11K, #flops: 100 ) ) )
- Parameters
flops (FlopCountAnalysis) – the flop counting object
activations (bool) – If given, the activations of each layer will also be calculated and included in the representation.
- Returns
str – a string representation of the model with the number of parameters and flops included.
-
fvcore.nn.
flop_count_table
(flops: fvcore.nn.flop_count.FlopCountAnalysis, max_depth: int = 3, activations: Optional[fvcore.nn.activation_count.ActivationCountAnalysis] = None, show_param_shapes: bool = True) → str[source]¶ Format the per-module parameters and flops of a model in a table. It looks like this:
| model | #parameters or shape | #flops | |:---------------------------------|:-----------------------|:----------| | model | 34.6M | 65.7G | | s1 | 15.4K | 4.32G | | s1.pathway0_stem | 9.54K | 1.23G | | s1.pathway0_stem.conv | 9.41K | 1.23G | | s1.pathway0_stem.bn | 0.128K | | | s1.pathway1_stem | 5.9K | 3.08G | | s1.pathway1_stem.conv | 5.88K | 3.08G | | s1.pathway1_stem.bn | 16 | | | s1_fuse | 0.928K | 29.4M | | s1_fuse.conv_f2s | 0.896K | 29.4M | | s1_fuse.conv_f2s.weight | (16, 8, 7, 1, 1) | | | s1_fuse.bn | 32 | | | s1_fuse.bn.weight | (16,) | | | s1_fuse.bn.bias | (16,) | | | s2 | 0.226M | 7.73G | | s2.pathway0_res0 | 80.1K | 2.58G | | s2.pathway0_res0.branch1 | 20.5K | 0.671G | | s2.pathway0_res0.branch1_bn | 0.512K | | | s2.pathway0_res0.branch2 | 59.1K | 1.91G | | s2.pathway0_res1.branch2 | 70.4K | 2.28G | | s2.pathway0_res1.branch2.a | 16.4K | 0.537G | | s2.pathway0_res1.branch2.a_bn | 0.128K | | | s2.pathway0_res1.branch2.b | 36.9K | 1.21G | | s2.pathway0_res1.branch2.b_bn | 0.128K | | | s2.pathway0_res1.branch2.c | 16.4K | 0.537G | | s2.pathway0_res1.branch2.c_bn | 0.512K | | | s2.pathway0_res2.branch2 | 70.4K | 2.28G | | s2.pathway0_res2.branch2.a | 16.4K | 0.537G | | s2.pathway0_res2.branch2.a_bn | 0.128K | | | s2.pathway0_res2.branch2.b | 36.9K | 1.21G | | s2.pathway0_res2.branch2.b_bn | 0.128K | | | s2.pathway0_res2.branch2.c | 16.4K | 0.537G | | s2.pathway0_res2.branch2.c_bn | 0.512K | | | ............................. | ...... | ...... |
- Parameters
flops (FlopCountAnalysis) – the flop counting object
max_depth (int) – The max depth of submodules to include in the table. Defaults to 3.
activations (ActivationCountAnalysis or None) – If given, include activation counts as an additional column in the table.
show_param_shapes (bool) – If true, shapes for parameters will be included in the table. Defaults to True.
- Returns
str – The formatted table.
Examples:
print(flop_count_table(FlopCountAnalysis(model, inputs)))
-
fvcore.nn.
smooth_l1_loss
(input: torch.Tensor, target: torch.Tensor, beta: float, reduction: str = 'none') → torch.Tensor[source]¶ Smooth L1 loss defined in the Fast R-CNN paper as:
| 0.5 * x ** 2 / beta if abs(x) < beta smoothl1(x) = | | abs(x) - 0.5 * beta otherwise,
where x = input - target.
Smooth L1 loss is related to Huber loss, which is defined as:
| 0.5 * x ** 2 if abs(x) < beta huber(x) = | | beta * (abs(x) - 0.5 * beta) otherwise
Smooth L1 loss is equal to huber(x) / beta. This leads to the following differences:
As beta -> 0, Smooth L1 loss converges to L1 loss, while Huber loss converges to a constant 0 loss.
As beta -> +inf, Smooth L1 converges to a constant 0 loss, while Huber loss converges to L2 loss.
For Smooth L1 loss, as beta varies, the L1 segment of the loss has a constant slope of 1. For Huber loss, the slope of the L1 segment is beta.
Smooth L1 loss can be seen as exactly L1 loss, but with the abs(x) < beta portion replaced with a quadratic function such that at abs(x) = beta, its slope is 1. The quadratic segment smooths the L1 loss near x = 0.
- Parameters
input (Tensor) – input tensor of any shape
target (Tensor) – target value tensor with the same shape as input
beta (float) – L1 to L2 change point. For beta values < 1e-5, L1 loss is computed.
reduction – ‘none’ | ‘mean’ | ‘sum’ ‘none’: No reduction will be applied to the output. ‘mean’: The output will be averaged. ‘sum’: The output will be summed.
- Returns
The loss with the reduction option applied.
Note
PyTorch’s builtin “Smooth L1 loss” implementation does not actually implement Smooth L1 loss, nor does it implement Huber loss. It implements the special case of both in which they are equal (beta=1). See: https://pytorch.org/docs/stable/nn.html#torch.nn.SmoothL1Loss.
-
fvcore.nn.
c2_msra_fill
(module: torch.nn.Module) → None[source]¶ Initialize module.weight using the “MSRAFill” implemented in Caffe2. Also initializes module.bias to 0.
- Parameters
module (torch.nn.Module) – module to initialize.
-
fvcore.nn.
c2_xavier_fill
(module: torch.nn.Module) → None[source]¶ Initialize module.weight using the “XavierFill” implemented in Caffe2. Also initializes module.bias to 0.
- Parameters
module (torch.nn.Module) – module to initialize.
fvcore.common¶
-
class
fvcore.common.checkpoint.
Checkpointer
(model: torch.nn.Module, save_dir: str = '', *, save_to_disk: bool = True, **checkpointables: Any)[source]¶ Bases:
object
A checkpointer that can save/load model as well as extra checkpointable objects.
-
__init__
(model: torch.nn.Module, save_dir: str = '', *, save_to_disk: bool = True, **checkpointables: Any) → None[source]¶ - Parameters
model (nn.Module) – model.
save_dir (str) – a directory to save and find checkpoints.
save_to_disk (bool) – if True, save checkpoint to disk, otherwise disable saving for this checkpointer.
checkpointables (object) – any checkpointable objects, i.e., objects that have the
state_dict()
andload_state_dict()
method. For example, it can be used like Checkpointer(model, “dir”, optimizer=optimizer).
-
add_checkpointable
(key: str, checkpointable: Any) → None[source]¶ Add checkpointable object for this checkpointer to track.
- Parameters
key (str) – the key used to save the object
checkpointable – any object with
state_dict()
andload_state_dict()
method
-
load
(path: str, checkpointables: Optional[List[str]] = None) → Dict[str, Any][source]¶ Load from the given checkpoint.
- Parameters
- Returns
dict – extra data loaded from the checkpoint that has not been processed. For example, those saved with
save(**extra_data)()
.
-
has_checkpoint
() → bool[source]¶ - Returns
bool – whether a checkpoint exists in the target directory.
-
get_all_checkpoint_files
() → List[str][source]¶ - Returns
list –
- All available checkpoint files (.pth files) in target
directory.
-
-
class
fvcore.common.checkpoint.
PeriodicCheckpointer
(checkpointer: fvcore.common.checkpoint.Checkpointer, period: int, max_iter: Optional[int] = None, max_to_keep: Optional[int] = None, file_prefix: str = 'model')[source]¶ Bases:
object
Save checkpoints periodically. When .step(iteration) is called, it will execute checkpointer.save on the given checkpointer, if iteration is a multiple of period or if max_iter is reached.
-
checkpointer
¶ the underlying checkpointer object
- Type
-
__init__
(checkpointer: fvcore.common.checkpoint.Checkpointer, period: int, max_iter: Optional[int] = None, max_to_keep: Optional[int] = None, file_prefix: str = 'model') → None[source]¶ - Parameters
checkpointer – the checkpointer object used to save checkpoints.
period (int) – the period to save checkpoint.
max_iter (int) – maximum number of iterations. When it is reached, a checkpoint named “{file_prefix}_final” will be saved.
max_to_keep (int) – maximum number of most current checkpoints to keep, previous checkpoints will be deleted
file_prefix (str) – the prefix of checkpoint’s filename
-
step
(iteration: int, **kwargs: Any) → None[source]¶ Perform the appropriate action at the given iteration.
- Parameters
iteration (int) – the current iteration, ranged in [0, max_iter-1].
kwargs (Any) – extra data to save, same as in
Checkpointer.save()
.
-
save
(name: str, **kwargs: Any) → None[source]¶ Same argument as
Checkpointer.save()
. Use this method to manually save checkpoints outside the schedule.- Parameters
name (str) – file name.
kwargs (Any) – extra data to save, same as in
Checkpointer.save()
.
-
-
class
fvcore.common.config.
CfgNode
(init_dict=None, key_list=None, new_allowed=False)[source]¶ Bases:
yacs.config.CfgNode
Our own extended version of
yacs.config.CfgNode
. It contains the following extra features:The
merge_from_file()
method supports the “_BASE_” key, which allows the new CfgNode to inherit all the attributes from the base configuration file(s).Keys that start with “COMPUTED_” are treated as insertion-only “computed” attributes. They can be inserted regardless of whether the CfgNode is frozen or not.
With “allow_unsafe=True”, it supports pyyaml tags that evaluate expressions in config. See examples in https://pyyaml.org/wiki/PyYAMLDocumentation#yaml-tags-and-python-types Note that this may lead to arbitrary code execution: you must not load a config file from untrusted sources before manually inspecting the content of the file.
-
classmethod
load_yaml_with_base
(filename: str, allow_unsafe: bool = False) → Dict[str, Any][source]¶ - Just like yaml.load(open(filename)), but inherit attributes from its
_BASE_.
-
merge_from_file
(cfg_filename: str, allow_unsafe: bool = False) → None[source]¶ Merge configs from a given yaml file.
- Parameters
cfg_filename – the file name of the yaml config.
allow_unsafe – whether to allow loading the config file with yaml.unsafe_load.
-
merge_from_other_cfg
(cfg_other: fvcore.common.config.CfgNode) → Callable[], None][source]¶ - Parameters
cfg_other (CfgNode) – configs to merge from.
-
class
fvcore.common.history_buffer.
HistoryBuffer
(max_length: int = 1000000)[source]¶ Bases:
object
Track a series of scalar values and provide access to smoothed values over a window or the global average of the series.
-
__init__
(max_length: int = 1000000) → None[source]¶ - Parameters
max_length – maximal number of values that can be stored in the buffer. When the capacity of the buffer is exhausted, old values will be removed.
-
update
(value: float, iteration: Optional[float] = None) → None[source]¶ Add a new scalar value produced at certain iteration. If the length of the buffer exceeds self._max_length, the oldest element will be removed from the buffer.
-
median
(window_size: int) → float[source]¶ Return the median of the latest window_size values in the buffer.
-
avg
(window_size: int) → float[source]¶ Return the mean of the latest window_size values in the buffer.
-
-
class
fvcore.common.param_scheduler.
ParamScheduler
[source]¶ Bases:
object
Base class for parameter schedulers. A parameter scheduler defines a mapping from a progress value in [0, 1) to a number (e.g. learning rate).
-
WHERE_EPSILON
= 1e-06¶
-
__call__
(where: float) → float[source]¶ Get the value of the param for a given point at training.
We update params (such as learning rate) based on the percent progress of training completed. This allows a scheduler to be agnostic to the exact length of a particular run (e.g. 120 epochs vs 90 epochs), as long as the relative progress where params should be updated is the same. However, it assumes that the total length of training is known.
- Parameters
where – A float in [0,1) that represents how far training has progressed
-
-
class
fvcore.common.param_scheduler.
ConstantParamScheduler
(value: float)[source]¶ Bases:
fvcore.common.param_scheduler.ParamScheduler
Returns a constant value for a param.
-
WHERE_EPSILON
= 1e-06¶
-
-
class
fvcore.common.param_scheduler.
CosineParamScheduler
(start_value: float, end_value: float)[source]¶ Bases:
fvcore.common.param_scheduler.ParamScheduler
Cosine decay or cosine warmup schedules based on start and end values. The schedule is updated based on the fraction of training progress. The schedule was proposed in ‘SGDR: Stochastic Gradient Descent with Warm Restarts’ (https://arxiv.org/abs/1608.03983). Note that this class only implements the cosine annealing part of SGDR, and not the restarts.
Example
CosineParamScheduler(start_value=0.1, end_value=0.0001)
-
WHERE_EPSILON
= 1e-06¶
-
-
class
fvcore.common.param_scheduler.
ExponentialParamScheduler
(start_value: float, decay: float)[source]¶ Bases:
fvcore.common.param_scheduler.ParamScheduler
Exponetial schedule parameterized by a start value and decay. The schedule is updated based on the fraction of training progress, where, with the formula param_t = start_value * (decay ** where).
Example
Corresponds to a decreasing schedule with values in [2.0, 0.04).
-
WHERE_EPSILON
= 1e-06¶
-
-
class
fvcore.common.param_scheduler.
LinearParamScheduler
(start_value: float, end_value: float)[source]¶ Bases:
fvcore.common.param_scheduler.ParamScheduler
Linearly interpolates parameter between
start_value
andend_value
. Can be used for either warmup or decay based on start and end values. The schedule is updated after every train step by default.Example
LinearParamScheduler(start_value=0.0001, end_value=0.01)
Corresponds to a linear increasing schedule with values in [0.0001, 0.01)
-
WHERE_EPSILON
= 1e-06¶
-
-
class
fvcore.common.param_scheduler.
CompositeParamScheduler
(schedulers: Sequence[fvcore.common.param_scheduler.ParamScheduler], lengths: List[float], interval_scaling: Sequence[str])[source]¶ Bases:
fvcore.common.param_scheduler.ParamScheduler
Composite parameter scheduler composed of intermediate schedulers. Takes a list of schedulers and a list of lengths corresponding to percentage of training each scheduler should run for. Schedulers are run in order. All values in lengths should sum to 1.0.
Each scheduler also has a corresponding interval scale. If interval scale is ‘fixed’, the intermediate scheduler will be run without any rescaling of the time. If interval scale is ‘rescaled’, intermediate scheduler is run such that each scheduler will start and end at the same values as it would if it were the only scheduler. Default is ‘rescaled’ for all schedulers.
Example
schedulers = [ ConstantParamScheduler(value=0.42), CosineParamScheduler(start_value=0.42, end_value=1e-4) ] CompositeParamScheduler( schedulers=schedulers, interval_scaling=['rescaled', 'rescaled'], lengths=[0.3, 0.7])
The parameter value will be 0.42 for the first [0%, 30%) of steps, and then will cosine decay from 0.42 to 0.0001 for [30%, 100%) of training.
-
WHERE_EPSILON
= 1e-06¶
-
-
class
fvcore.common.param_scheduler.
MultiStepParamScheduler
(values: List[float], num_updates: Optional[int] = None, milestones: Optional[List[int]] = None)[source]¶ Bases:
fvcore.common.param_scheduler.ParamScheduler
Takes a predefined schedule for a param value, and a list of epochs or steps which stand for the upper boundary (excluded) of each range.
Example
MultiStepParamScheduler( values=[0.1, 0.01, 0.001, 0.0001], milestones=[30, 60, 80, 120] )
Then the param value will be 0.1 for epochs 0-29, 0.01 for epochs 30-59, 0.001 for epochs 60-79, 0.0001 for epochs 80-120. Note that the length of values must be equal to the length of milestones plus one.
-
__init__
(values: List[float], num_updates: Optional[int] = None, milestones: Optional[List[int]] = None) → None[source]¶ - Parameters
values – param value in each range
num_updates – the end of the last range. If None, will use
milestones[-1]
milestones – the boundary of each range. If None, will evenly split
num_updates
For example, all the following combinations define the same scheduler:
num_updates=90, milestones=[30, 60], values=[1, 0.1, 0.01]
num_updates=90, values=[1, 0.1, 0.01]
milestones=[30, 60, 90], values=[1, 0.1, 0.01]
milestones=[3, 6, 9], values=[1, 0.1, 0.01] (ParamScheduler is scale-invariant)
-
WHERE_EPSILON
= 1e-06¶
-
-
class
fvcore.common.param_scheduler.
StepParamScheduler
(num_updates: Union[int, float], values: List[float])[source]¶ Bases:
fvcore.common.param_scheduler.ParamScheduler
Takes a fixed schedule for a param value. If the length of the fixed schedule is less than the number of epochs, then the epochs are divided evenly among the param schedule. The schedule is updated after every train epoch by default.
Example
StepParamScheduler(values=[0.1, 0.01, 0.001, 0.0001], num_updates=120)
Then the param value will be 0.1 for epochs 0-29, 0.01 for epochs 30-59, 0.001 for epoch 60-89, 0.0001 for epochs 90-119.
-
WHERE_EPSILON
= 1e-06¶
-
-
class
fvcore.common.param_scheduler.
StepWithFixedGammaParamScheduler
(base_value: float, num_decays: int, gamma: float, num_updates: int)[source]¶ Bases:
fvcore.common.param_scheduler.ParamScheduler
Decays the param value by gamma at equal number of steps so as to have the specified total number of decays.
Example
StepWithFixedGammaParamScheduler( base_value=0.1, gamma=0.1, num_decays=3, num_updates=120)
Then the param value will be 0.1 for epochs 0-29, 0.01 for epochs 30-59, 0.001 for epoch 60-89, 0.0001 for epochs 90-119.
-
WHERE_EPSILON
= 1e-06¶
-
-
class
fvcore.common.param_scheduler.
PolynomialDecayParamScheduler
(base_value: float, power: float)[source]¶ Bases:
fvcore.common.param_scheduler.ParamScheduler
Decays the param value after every epoch according to a polynomial function with a fixed power. The schedule is updated after every train step by default.
Example
PolynomialDecayParamScheduler(base_value=0.1, power=0.9)
Then the param value will be 0.1 for epoch 0, 0.099 for epoch 1, and so on.
-
WHERE_EPSILON
= 1e-06¶
-
-
class
fvcore.common.registry.
Registry
(*args, **kwds)[source]¶ Bases:
collections.abc.Iterable
,typing.Generic
The registry that provides name -> object mapping, to support third-party users’ custom modules.
To create a registry (e.g. a backbone registry):
BACKBONE_REGISTRY = Registry('BACKBONE')
To register an object:
@BACKBONE_REGISTRY.register() class MyBackbone(): ...
Or:
BACKBONE_REGISTRY.register(MyBackbone)