datasetinsights.evaluation_metrics¶

datasetinsights.evaluation_metrics.average_log10_error¶

Average Log10 Error metric.

The average log10 error can be described as:

\[\frac{1}{n}\sum_{p}^{n} |log_{10}(y_p)-log_{10}(\hat{y_p})|\]

class datasetinsights.evaluation_metrics.average_log10_error.AverageLog10Error¶

Bases: datasetinsights.evaluation_metrics.base.EvaluationMetric

Average Log10 Error metric.

The metric is defined for grayscale depth images.

sum_of_log10_error¶

the sum of the log10 errors for all the

Type: float

images in a branch

num_samples¶

the number of samples in all mini-batches

Type: int

compute()¶

reset()¶

update(output)¶

datasetinsights.evaluation_metrics.average_precision¶

class datasetinsights.evaluation_metrics.average_precision.AveragePrecision(config: datasetinsights.evaluation_metrics.average_precision.DetectionConfig = None)¶

Bases: datasetinsights.evaluation_metrics.base.EvaluationMetric

compute(boxes: Dict[str, Tuple[List[datasetinsights.io.bbox.BBox3D], List[datasetinsights.io.bbox.BBox3D]]] = None) → Dict[str, float]¶: calculate the mean ap for all classes over all distance thresholds defined in the config. The equation is described in the second figure in the nuscenes paper https://arxiv.org/pdf/1903.11027.pdf It is the normalized sum of the ROC curves for each class and distance. :param boxes: the predicted and ground truth bounding boxes per sample. :return: dictionary mapping each label to it’s average precision (averaged across all distance thresholds for that label)

reset()¶

update(boxes)¶

class datasetinsights.evaluation_metrics.average_precision.DetectionConfig(class_range: Dict[str, int] = {'barrier': 30, 'bicycle': 40, 'bus': 50, 'car': 50, 'construction_vehicle': 50, 'motorcycle': 40, 'pedestrian': 40, 'traffic_cone': 30, 'trailer': 50, 'truck': 50}, dist_fcn: str = 'center_distance', dist_ths: List[float] = [0.5, 1.0, 2.0, 4.0], min_recall: float = 0.1, min_precision: float = 0.1, max_boxes_per_sample: float = 500, mean_ap_weight: int = 5)¶

Bases: object

Data class that specifies the detection evaluation settings.

classmethod deserialize(content)¶: Initialize from serialized dictionary.

serialize() → dict¶: Serialize instance into json-friendly format.

datasetinsights.evaluation_metrics.average_precision.calc_ap(*, precision, min_recall: float, min_precision: float) → float¶: Calculated average precision.

datasetinsights.evaluation_metrics.average_precision.center_distance(*, gt_box: datasetinsights.io.bbox.BBox3D, pred_box: datasetinsights.io.bbox.BBox3D) → float¶

L2 distance between the box centers (xy only).

Parameters

gt_box (BBox2D) – GT annotation sample.
pred_box (BBox2D) – Predicted sample.

Returns

L2 distance.

datasetinsights.evaluation_metrics.average_precision_2d¶

Average Precision metrics for 2D object detection

This module provides average precision metics to evaluate 2D object detection models, such as metrics defined in coco evaluation. The most commonly used metrics are MeanAveragePrecisionIOU50 and MeanAveragePrecisionAverageOverIOU which provide average precision for all labels considered.

These metrics are implemented based on this implementation.

AveragePrecision provides AP for each label under a given IOU. AveragePrecisionIOU50 provides AP for each label at IOU=50%. MeanAveragePrecisionIOU50 provides mean AP over all labels at IOU=50%. MeanAveragePrecisionAverageOverIOU provides mean AP over all labels and IOU=[0.5:0.95:0.05].

class datasetinsights.evaluation_metrics.average_precision_2d.AveragePrecision(iou_threshold=0.5, interpolation='EveryPointInterpolation', max_detections=100)¶

Bases: datasetinsights.evaluation_metrics.base.EvaluationMetric

2D Bounding Box Average Precision metrics.

This metric would calculate average precision (AP) for each label under an iou threshold (default: 0.5). The maximum number of detections per image is limited (default: 100).

Parameters

iou_threshold (float) – iou threshold (default: 0.5)
interpolation (string) – AP interoperation method name for AP calculation
max_detections (int) – max detections per image (default: 100)

TYPE = 'metric_per_label'¶

compute()¶

Compute AP for each label.

Returns: a dictionary of AP scores per label.
Return type: dict

static every_point_interpolated_ap(recall, precision)¶

Calculating the interpolation performed in all points.

Parameters

recall (list) – recall history of the prediction
precision (list) – precision history of the prediction

Returns

average precision for all points interpolation

Return type

float

static n_point_interpolated_ap(recall, precision, point=11)¶

Calculating the n-point interpolation.

Parameters

recall (list) – recall history of the prediction
precision (list) – precision history of the prediction
point (int) – n, n-point interpolation

Returns

average precision for n-point interpolation

Return type

float

reset()¶: Reset AP metrics.

update(mini_batch)¶

Update records per mini batch

Parameters: mini_batch (list(list)) – a list which contains batch_size of gt bboxes and pred bboxes pair in each image. For example, if batch size = 2, mini_batch looks like: [[gt_bboxes1, pred_bboxes1], [gt_bboxes2, pred_bboxes2]] where gt_bboxes1, pred_bboxes1 contain gt bboxes and pred bboxes in one image.

class datasetinsights.evaluation_metrics.average_precision_2d.AveragePrecisionIOU50¶

Bases: datasetinsights.evaluation_metrics.base.EvaluationMetric

2D Bounding Box Average Precision at \(IOU=50\%\).

This implementation would calculate AP for each label at \(IOU=50\%\). This woud provide a mapping for label and average precision. The maximum number of detections per image is 100.

TYPE = 'metric_per_label'¶

compute()¶

reset()¶

update(mini_batch)¶

class datasetinsights.evaluation_metrics.average_precision_2d.MeanAveragePrecisionAverageOverIOU¶

Bases: datasetinsights.evaluation_metrics.base.EvaluationMetric

2D Bounding Box Mean Average Precision metrics.

This implementation computes Mean Average Precision (mAP) metric, which is implemented as the Average Precision average over all labels and \(IOU = 0.5, 0.55, 0.60, ..., 0.95\). The max detections per image is limited to 100.

\[mAP = \frac{1}{N_\text{IOU}N_\text{label}}\sum_{\text{label}, \text{IOU}}AP(\text{label}, \text{IOU})\]

IOU_THRESHOULDS = array([0.5 , 0.55, 0.6 , 0.65, 0.7 , 0.75, 0.8 , 0.85, 0.9 , 0.95])¶

TYPE = 'scalar'¶

compute()¶: Compute mAP over IOU.

reset()¶

update(mini_batch)¶

class datasetinsights.evaluation_metrics.average_precision_2d.MeanAveragePrecisionIOU50¶

Bases: datasetinsights.evaluation_metrics.base.EvaluationMetric

2D Bounding Box Mean Average Precision metrics at \(IOU=50\%\).

This implementation would calculate mAP at \(IOU=50\%\). Average across all the labels.

\[mAP(\text{IOU=50})=\frac{1}{N_{\text{label}}}\sum_{\text{label}}AP(\text{label}, \text{IOU=50})\]

where AP is the AveragePrecision metrics competed separately for each label.

TYPE = 'scalar'¶

compute()¶

reset()¶

update(mini_batch)¶

datasetinsights.evaluation_metrics.average_precision_config¶

datasetinsights.evaluation_metrics.average_recall_2d¶

Average Recall metrics for 2D object detection

This module provides average recall metics to evaluate 2D object detection models, such as metrics defined in coco evaluation. The most commonly used metrics are MeanAverageRecallAverageOverIOU which provide average recall for all labels considered.

class datasetinsights.evaluation_metrics.average_recall_2d.AverageRecall(iou_threshold=0.5, max_detections=100)¶

Bases: datasetinsights.evaluation_metrics.base.EvaluationMetric

2D Bounding Box Average Recall metrics.

This metric would calculate average recall (AR) for each label under an iou threshold (default: 0.5). The maximum number of detections per image is limited (default: 100).

Parameters

iou_threshold (float) – iou threshold (default: 0.5)
max_detections (int) – max detections per image (default: 100)

TYPE = 'metric_per_label'¶

compute()¶

Compute AR for each label.

Returns: a dictionary of AR scores per label.
Return type: dict

reset()¶: Reset AR metrics.

update(mini_batch)¶

Update records per mini batch.

Parameters: mini_batch (list(list)) – a list which contains batch_size of gt bboxes and pred bboxes pair in each image. For example, if batch size = 2, mini_batch looks like: [[gt_bboxes1, pred_bboxes1], [gt_bboxes2, pred_bboxes2]] where gt_bboxes1, pred_bboxes1 contain gt bboxes and pred bboxes in one image.

class datasetinsights.evaluation_metrics.average_recall_2d.MeanAverageRecallAverageOverIOU¶

Bases: datasetinsights.evaluation_metrics.base.EvaluationMetric

2D Bounding Box Mean Average Recall metrics.

This implementation computes Mean Average Recall (mAR) metric, which is implemented as the Average Recall average over all labels and \(IOU = 0.5:0.95:0.05\). The max detections per image is limited to 100.

\[mAR = \frac{1}{N_\text{IOU}N_\text{label}}\sum_{\text{label}, \text{IOU}}AR(\text{label}, \text{IOU})\]

IOU_THRESHOULDS = array([0.5 , 0.55, 0.6 , 0.65, 0.7 , 0.75, 0.8 , 0.85, 0.9 , 0.95])¶

TYPE = 'scalar'¶

compute()¶: Compute mAR over IOU.

reset()¶

update(mini_batch)¶

datasetinsights.evaluation_metrics.average_relative_error¶

Average Relative Error metrics.

The average relative error can be described as:

\[\frac{1}{num\ samples}\sum_{p}^{num\ samples}\frac{|y_p-\hat{y_p}|}{y_p}\]

class datasetinsights.evaluation_metrics.average_relative_error.AverageRelativeError¶

Bases: datasetinsights.evaluation_metrics.base.EvaluationMetric

Average Relative Error metric.

The metric is defined for grayscale depth images.

sum_of_relative_error¶

the sum of the relative errors for all

Type: float

the images in a branch

num_samples¶

the number of samples in all mini-batches

Type: int

compute()¶

reset()¶

update(output)¶

datasetinsights.evaluation_metrics.base¶

class datasetinsights.evaluation_metrics.base.EvaluationMetric¶

Bases: object

Abstract base class for metrics.

COMPUTE_TYPE = ''¶

abstract compute()¶

static create(name, **kwargs)¶

Create a new instance of the metric subclass

Parameters

name (str) – unique identifier for a metric subclass
config (dict) – parameters specific to each metric subclass used to create a metric instance

Returns

an instance of the specified metric subclass

static find(name)¶

Find EvaluationMetric subclass based on the given name

Parameters: name (str) – unique identifier for a metric subclass
Returns: a label of the specified metric subclass

abstract reset()¶

abstract update(output)¶

datasetinsights.evaluation_metrics.confusion_matrix¶

datasetinsights.evaluation_metrics.confusion_matrix.precision_recall(gt_bboxes, pred_bboxes, iou_thresh=0.5)¶

Calculate precision and recall per image.

Parameters

gt_bboxes (List[BBox2D]) – a list of ground truth bounding boxes.
pred_bboxes (List[BBox2D]) – a list of predicted bounding boxes.
iou_thresh (float) – iou threshold. Defaults to 0.5.

Returns

(precision_per_image, recall_per_image).

Return type

tuple

datasetinsights.evaluation_metrics.confusion_matrix.prediction_records(gt_bboxes, pred_bboxes, iou_thresh=0.5)¶

Calculate prediction results per image.

Parameters

gt_bboxes (List[BBox2D]) – a list of ground truth bounding boxes.
pred_bboxes (List[BBox2D]) – a list of predicted bounding boxes.
iou_thresh (float) – iou threshold. Defaults to 0.5.

Returns

a Records class contains match results.

Return type

Records

datasetinsights.evaluation_metrics.exceptions¶

exception datasetinsights.evaluation_metrics.exceptions.NoSampleError¶

Bases: Exception

Raise when the number of samples is zero

datasetinsights.evaluation_metrics.iou¶

IoU evaluation metrics

class datasetinsights.evaluation_metrics.iou.IoU(num_classes, output_transform=<function IoU.<lambda>>)¶

Bases: datasetinsights.evaluation_metrics.base.EvaluationMetric

Intersection over Union (IoU) metric per class

The metric is defined for a pair of grayscale semantic segmentation images.

Parameters

num_classes – number of calsses in the ground truth image
output_transform – function that transform output pair of images

cm¶

pytorch ignite confusion matrix

Type: ignite.metrics.ConfusionMatrix

object.

compute()¶

reset()¶

update(output)¶

datasetinsights.evaluation_metrics.records¶

class datasetinsights.evaluation_metrics.records.Records(iou_threshold=0.5)¶

Bases: object

Save prediction records during update.

iou_threshold¶

iou threshold

Type: float

match_results¶

save the results (TP/FP)

Type: list

Parameters: iou_threshold (float) – iou threshold (default: 0.5)

add_records(gt_bboxes, pred_bboxes)¶

Add ground truth and prediction records.

Parameters

gt_bboxes – ground truth bboxes in the current image
pred_bboxes – sorted predicition bboxes in the current image

reset()¶

datasetinsights.evaluation_metrics.root_mean_square_error¶

Root Mean Square Error metrics.

The root mean square error can be described as:

\[\sqrt{\frac{1}{n}\sum_{p}^{n}{(y_p-\hat{y_p})}^2}\]

class datasetinsights.evaluation_metrics.root_mean_square_error.RootMeanSquareError¶

Bases: datasetinsights.evaluation_metrics.base.EvaluationMetric

Root Mean Square Error metric.

The metric is defined for grayscale depth images.

sum_of_root_mean_square_error¶

the sum of RMSE

Type: float

for all the images in a branch

num_samples¶

the number of samples in all mini-batches

Type: int

compute()¶

reset()¶

update(output)¶

datasetinsights.evaluation_metrics.threshold_accuracy¶

Average Log10 Error metric.

The average relative error can be described as:

\[(\delta_i):\%\:of\:y_p\:s.t.\: max(\frac{y_p}{\hat{y_p}})=\delta<thr\:for\:thr=1.25,1.25^2,1.25^3\]

class datasetinsights.evaluation_metrics.threshold_accuracy.ThresholdAccuracy(threshold)¶

Bases: datasetinsights.evaluation_metrics.base.EvaluationMetric

Threshold accuracy metric.

The metric is defined for grayscale depth images.

sum_of_threshold_acc¶

the sum of threshold accuracies for all the images in a branch

Type: int

num_samples¶

the number of samples in all mini-batches

Type: int

compute()¶

reset()¶

update(output)¶

class datasetinsights.evaluation_metrics.AverageLog10Error¶

Bases: datasetinsights.evaluation_metrics.base.EvaluationMetric

Average Log10 Error metric.

The metric is defined for grayscale depth images.

sum_of_log10_error¶

the sum of the log10 errors for all the

Type: float

images in a branch

num_samples¶

the number of samples in all mini-batches

Type: int

compute()¶

reset()¶

update(output)¶

class datasetinsights.evaluation_metrics.AveragePrecision(iou_threshold=0.5, interpolation='EveryPointInterpolation', max_detections=100)¶

Bases: datasetinsights.evaluation_metrics.base.EvaluationMetric

2D Bounding Box Average Precision metrics.

This metric would calculate average precision (AP) for each label under an iou threshold (default: 0.5). The maximum number of detections per image is limited (default: 100).

Parameters

iou_threshold (float) – iou threshold (default: 0.5)
interpolation (string) – AP interoperation method name for AP calculation
max_detections (int) – max detections per image (default: 100)

TYPE = 'metric_per_label'¶

compute()¶

Compute AP for each label.

Returns: a dictionary of AP scores per label.
Return type: dict

static every_point_interpolated_ap(recall, precision)¶

Calculating the interpolation performed in all points.

Parameters

recall (list) – recall history of the prediction
precision (list) – precision history of the prediction

Returns

average precision for all points interpolation

Return type

float

static n_point_interpolated_ap(recall, precision, point=11)¶

Calculating the n-point interpolation.

Parameters

recall (list) – recall history of the prediction
precision (list) – precision history of the prediction
point (int) – n, n-point interpolation

Returns

average precision for n-point interpolation

Return type

float

reset()¶: Reset AP metrics.

update(mini_batch)¶

Update records per mini batch

Parameters: mini_batch (list(list)) – a list which contains batch_size of gt bboxes and pred bboxes pair in each image. For example, if batch size = 2, mini_batch looks like: [[gt_bboxes1, pred_bboxes1], [gt_bboxes2, pred_bboxes2]] where gt_bboxes1, pred_bboxes1 contain gt bboxes and pred bboxes in one image.

class datasetinsights.evaluation_metrics.AveragePrecisionIOU50¶

Bases: datasetinsights.evaluation_metrics.base.EvaluationMetric

2D Bounding Box Average Precision at \(IOU=50\%\).

This implementation would calculate AP for each label at \(IOU=50\%\). This woud provide a mapping for label and average precision. The maximum number of detections per image is 100.

TYPE = 'metric_per_label'¶

compute()¶

reset()¶

update(mini_batch)¶

class datasetinsights.evaluation_metrics.AverageRecall(iou_threshold=0.5, max_detections=100)¶

Bases: datasetinsights.evaluation_metrics.base.EvaluationMetric

2D Bounding Box Average Recall metrics.

This metric would calculate average recall (AR) for each label under an iou threshold (default: 0.5). The maximum number of detections per image is limited (default: 100).

Parameters

iou_threshold (float) – iou threshold (default: 0.5)
max_detections (int) – max detections per image (default: 100)

TYPE = 'metric_per_label'¶

compute()¶

Compute AR for each label.

Returns: a dictionary of AR scores per label.
Return type: dict

reset()¶: Reset AR metrics.

update(mini_batch)¶

Update records per mini batch.

Parameters: mini_batch (list(list)) – a list which contains batch_size of gt bboxes and pred bboxes pair in each image. For example, if batch size = 2, mini_batch looks like: [[gt_bboxes1, pred_bboxes1], [gt_bboxes2, pred_bboxes2]] where gt_bboxes1, pred_bboxes1 contain gt bboxes and pred bboxes in one image.

class datasetinsights.evaluation_metrics.AverageRelativeError¶

Bases: datasetinsights.evaluation_metrics.base.EvaluationMetric

Average Relative Error metric.

The metric is defined for grayscale depth images.

sum_of_relative_error¶

the sum of the relative errors for all

Type: float

the images in a branch

num_samples¶

the number of samples in all mini-batches

Type: int

compute()¶

reset()¶

update(output)¶

class datasetinsights.evaluation_metrics.EvaluationMetric¶

Bases: object

Abstract base class for metrics.

COMPUTE_TYPE = ''¶

abstract compute()¶

static create(name, **kwargs)¶

Create a new instance of the metric subclass

Parameters

name (str) – unique identifier for a metric subclass
config (dict) – parameters specific to each metric subclass used to create a metric instance

Returns

an instance of the specified metric subclass

static find(name)¶

Find EvaluationMetric subclass based on the given name

Parameters: name (str) – unique identifier for a metric subclass
Returns: a label of the specified metric subclass

abstract reset()¶

abstract update(output)¶

class datasetinsights.evaluation_metrics.IoU(num_classes, output_transform=<function IoU.<lambda>>)¶

Bases: datasetinsights.evaluation_metrics.base.EvaluationMetric

Intersection over Union (IoU) metric per class

The metric is defined for a pair of grayscale semantic segmentation images.

Parameters

num_classes – number of calsses in the ground truth image
output_transform – function that transform output pair of images

cm¶

pytorch ignite confusion matrix

Type: ignite.metrics.ConfusionMatrix

object.

compute()¶

reset()¶

update(output)¶

class datasetinsights.evaluation_metrics.MeanAveragePrecisionAverageOverIOU¶

Bases: datasetinsights.evaluation_metrics.base.EvaluationMetric

2D Bounding Box Mean Average Precision metrics.

This implementation computes Mean Average Precision (mAP) metric, which is implemented as the Average Precision average over all labels and \(IOU = 0.5, 0.55, 0.60, ..., 0.95\). The max detections per image is limited to 100.

\[mAP = \frac{1}{N_\text{IOU}N_\text{label}}\sum_{\text{label}, \text{IOU}}AP(\text{label}, \text{IOU})\]

IOU_THRESHOULDS = array([0.5 , 0.55, 0.6 , 0.65, 0.7 , 0.75, 0.8 , 0.85, 0.9 , 0.95])¶

TYPE = 'scalar'¶

compute()¶: Compute mAP over IOU.

reset()¶

update(mini_batch)¶

class datasetinsights.evaluation_metrics.MeanAveragePrecisionIOU50¶

Bases: datasetinsights.evaluation_metrics.base.EvaluationMetric

2D Bounding Box Mean Average Precision metrics at \(IOU=50\%\).

This implementation would calculate mAP at \(IOU=50\%\). Average across all the labels.

\[mAP(\text{IOU=50})=\frac{1}{N_{\text{label}}}\sum_{\text{label}}AP(\text{label}, \text{IOU=50})\]

where AP is the AveragePrecision metrics competed separately for each label.

TYPE = 'scalar'¶

compute()¶

reset()¶

update(mini_batch)¶

class datasetinsights.evaluation_metrics.MeanAverageRecallAverageOverIOU¶

Bases: datasetinsights.evaluation_metrics.base.EvaluationMetric

2D Bounding Box Mean Average Recall metrics.

This implementation computes Mean Average Recall (mAR) metric, which is implemented as the Average Recall average over all labels and \(IOU = 0.5:0.95:0.05\). The max detections per image is limited to 100.

\[mAR = \frac{1}{N_\text{IOU}N_\text{label}}\sum_{\text{label}, \text{IOU}}AR(\text{label}, \text{IOU})\]

IOU_THRESHOULDS = array([0.5 , 0.55, 0.6 , 0.65, 0.7 , 0.75, 0.8 , 0.85, 0.9 , 0.95])¶

TYPE = 'scalar'¶

compute()¶: Compute mAR over IOU.

reset()¶

update(mini_batch)¶

class datasetinsights.evaluation_metrics.RootMeanSquareError¶

Bases: datasetinsights.evaluation_metrics.base.EvaluationMetric

Root Mean Square Error metric.

The metric is defined for grayscale depth images.

sum_of_root_mean_square_error¶

the sum of RMSE

Type: float

for all the images in a branch

num_samples¶

the number of samples in all mini-batches

Type: int

compute()¶

reset()¶

update(output)¶

class datasetinsights.evaluation_metrics.ThresholdAccuracy(threshold)¶

Bases: datasetinsights.evaluation_metrics.base.EvaluationMetric

Threshold accuracy metric.

The metric is defined for grayscale depth images.

sum_of_threshold_acc¶

the sum of threshold accuracies for all the images in a branch

Type: int

num_samples¶

the number of samples in all mini-batches

Type: int

compute()¶

reset()¶

update(output)¶