datasetinsights.evaluation_metrics

datasetinsights.evaluation_metrics.average_log10_error

Average Log10 Error metric.

The average log10 error can be described as:

\[\frac{1}{n}\sum_{p}^{n} |log_{10}(y_p)-log_{10}(\hat{y_p})|\]
class datasetinsights.evaluation_metrics.average_log10_error.AverageLog10Error

Bases: datasetinsights.evaluation_metrics.base.EvaluationMetric

Average Log10 Error metric.

The metric is defined for grayscale depth images.

sum_of_log10_error

the sum of the log10 errors for all the

Type

float

images in a branch
num_samples

the number of samples in all mini-batches

Type

int

compute()
reset()
update(output)

datasetinsights.evaluation_metrics.average_precision

class datasetinsights.evaluation_metrics.average_precision.AveragePrecision(config: datasetinsights.evaluation_metrics.average_precision.DetectionConfig = None)

Bases: datasetinsights.evaluation_metrics.base.EvaluationMetric

compute(boxes: Dict[str, Tuple[List[datasetinsights.io.bbox.BBox3D], List[datasetinsights.io.bbox.BBox3D]]] = None) → Dict[str, float]

calculate the mean ap for all classes over all distance thresholds defined in the config. The equation is described in the second figure in the nuscenes paper https://arxiv.org/pdf/1903.11027.pdf It is the normalized sum of the ROC curves for each class and distance. :param boxes: the predicted and ground truth bounding boxes per sample. :return: dictionary mapping each label to it’s average precision (averaged across all distance thresholds for that label)

reset()
update(boxes)
class datasetinsights.evaluation_metrics.average_precision.DetectionConfig(class_range: Dict[str, int] = {'barrier': 30, 'bicycle': 40, 'bus': 50, 'car': 50, 'construction_vehicle': 50, 'motorcycle': 40, 'pedestrian': 40, 'traffic_cone': 30, 'trailer': 50, 'truck': 50}, dist_fcn: str = 'center_distance', dist_ths: List[float] = [0.5, 1.0, 2.0, 4.0], min_recall: float = 0.1, min_precision: float = 0.1, max_boxes_per_sample: float = 500, mean_ap_weight: int = 5)

Bases: object

Data class that specifies the detection evaluation settings.

classmethod deserialize(content)

Initialize from serialized dictionary.

serialize() → dict

Serialize instance into json-friendly format.

datasetinsights.evaluation_metrics.average_precision.calc_ap(*, precision, min_recall: float, min_precision: float) → float

Calculated average precision.

datasetinsights.evaluation_metrics.average_precision.center_distance(*, gt_box: datasetinsights.io.bbox.BBox3D, pred_box: datasetinsights.io.bbox.BBox3D) → float

L2 distance between the box centers (xy only).

Parameters
  • gt_box (BBox2D) – GT annotation sample.

  • pred_box (BBox2D) – Predicted sample.

Returns

L2 distance.

datasetinsights.evaluation_metrics.average_precision_2d

Average Precision metrics for 2D object detection

This module provides average precision metics to evaluate 2D object detection models, such as metrics defined in coco evaluation. The most commonly used metrics are MeanAveragePrecisionIOU50 and MeanAveragePrecisionAverageOverIOU which provide average precision for all labels considered.

These metrics are implemented based on this implementation.

AveragePrecision provides AP for each label under a given IOU. AveragePrecisionIOU50 provides AP for each label at IOU=50%. MeanAveragePrecisionIOU50 provides mean AP over all labels at IOU=50%. MeanAveragePrecisionAverageOverIOU provides mean AP over all labels and IOU=[0.5:0.95:0.05].

class datasetinsights.evaluation_metrics.average_precision_2d.AveragePrecision(iou_threshold=0.5, interpolation='EveryPointInterpolation', max_detections=100)

Bases: datasetinsights.evaluation_metrics.base.EvaluationMetric

2D Bounding Box Average Precision metrics.

This metric would calculate average precision (AP) for each label under an iou threshold (default: 0.5). The maximum number of detections per image is limited (default: 100).

Parameters
  • iou_threshold (float) – iou threshold (default: 0.5)

  • interpolation (string) – AP interoperation method name for AP calculation

  • max_detections (int) – max detections per image (default: 100)

TYPE = 'metric_per_label'
compute()

Compute AP for each label.

Returns

a dictionary of AP scores per label.

Return type

dict

static every_point_interpolated_ap(recall, precision)

Calculating the interpolation performed in all points.

Parameters
  • recall (list) – recall history of the prediction

  • precision (list) – precision history of the prediction

Returns

average precision for all points interpolation

Return type

float

static n_point_interpolated_ap(recall, precision, point=11)

Calculating the n-point interpolation.

Parameters
  • recall (list) – recall history of the prediction

  • precision (list) – precision history of the prediction

  • point (int) – n, n-point interpolation

Returns

average precision for n-point interpolation

Return type

float

reset()

Reset AP metrics.

update(mini_batch)

Update records per mini batch

Parameters

mini_batch (list(list)) – a list which contains batch_size of gt bboxes and pred bboxes pair in each image. For example, if batch size = 2, mini_batch looks like: [[gt_bboxes1, pred_bboxes1], [gt_bboxes2, pred_bboxes2]] where gt_bboxes1, pred_bboxes1 contain gt bboxes and pred bboxes in one image.

class datasetinsights.evaluation_metrics.average_precision_2d.AveragePrecisionIOU50

Bases: datasetinsights.evaluation_metrics.base.EvaluationMetric

2D Bounding Box Average Precision at \(IOU=50\%\).

This implementation would calculate AP for each label at \(IOU=50\%\). This woud provide a mapping for label and average precision. The maximum number of detections per image is 100.

TYPE = 'metric_per_label'
compute()
reset()
update(mini_batch)
class datasetinsights.evaluation_metrics.average_precision_2d.MeanAveragePrecisionAverageOverIOU

Bases: datasetinsights.evaluation_metrics.base.EvaluationMetric

2D Bounding Box Mean Average Precision metrics.

This implementation computes Mean Average Precision (mAP) metric, which is implemented as the Average Precision average over all labels and \(IOU = 0.5, 0.55, 0.60, ..., 0.95\). The max detections per image is limited to 100.

\[mAP = \frac{1}{N_\text{IOU}N_\text{label}}\sum_{\text{label}, \text{IOU}}AP(\text{label}, \text{IOU})\]
IOU_THRESHOULDS = array([0.5 , 0.55, 0.6 , 0.65, 0.7 , 0.75, 0.8 , 0.85, 0.9 , 0.95])
TYPE = 'scalar'
compute()

Compute mAP over IOU.

reset()
update(mini_batch)
class datasetinsights.evaluation_metrics.average_precision_2d.MeanAveragePrecisionIOU50

Bases: datasetinsights.evaluation_metrics.base.EvaluationMetric

2D Bounding Box Mean Average Precision metrics at \(IOU=50\%\).

This implementation would calculate mAP at \(IOU=50\%\). Average across all the labels.

\[mAP(\text{IOU=50})=\frac{1}{N_{\text{label}}}\sum_{\text{label}}AP(\text{label}, \text{IOU=50})\]

where AP is the AveragePrecision metrics competed separately for each label.

TYPE = 'scalar'
compute()
reset()
update(mini_batch)

datasetinsights.evaluation_metrics.average_precision_config

datasetinsights.evaluation_metrics.average_recall_2d

Average Recall metrics for 2D object detection

This module provides average recall metics to evaluate 2D object detection models, such as metrics defined in coco evaluation. The most commonly used metrics are MeanAverageRecallAverageOverIOU which provide average recall for all labels considered.

class datasetinsights.evaluation_metrics.average_recall_2d.AverageRecall(iou_threshold=0.5, max_detections=100)

Bases: datasetinsights.evaluation_metrics.base.EvaluationMetric

2D Bounding Box Average Recall metrics.

This metric would calculate average recall (AR) for each label under an iou threshold (default: 0.5). The maximum number of detections per image is limited (default: 100).

Parameters
  • iou_threshold (float) – iou threshold (default: 0.5)

  • max_detections (int) – max detections per image (default: 100)

TYPE = 'metric_per_label'
compute()

Compute AR for each label.

Returns

a dictionary of AR scores per label.

Return type

dict

reset()

Reset AR metrics.

update(mini_batch)

Update records per mini batch.

Parameters

mini_batch (list(list)) – a list which contains batch_size of gt bboxes and pred bboxes pair in each image. For example, if batch size = 2, mini_batch looks like: [[gt_bboxes1, pred_bboxes1], [gt_bboxes2, pred_bboxes2]] where gt_bboxes1, pred_bboxes1 contain gt bboxes and pred bboxes in one image.

class datasetinsights.evaluation_metrics.average_recall_2d.MeanAverageRecallAverageOverIOU

Bases: datasetinsights.evaluation_metrics.base.EvaluationMetric

2D Bounding Box Mean Average Recall metrics.

This implementation computes Mean Average Recall (mAR) metric, which is implemented as the Average Recall average over all labels and \(IOU = 0.5:0.95:0.05\). The max detections per image is limited to 100.

\[mAR = \frac{1}{N_\text{IOU}N_\text{label}}\sum_{\text{label}, \text{IOU}}AR(\text{label}, \text{IOU})\]
IOU_THRESHOULDS = array([0.5 , 0.55, 0.6 , 0.65, 0.7 , 0.75, 0.8 , 0.85, 0.9 , 0.95])
TYPE = 'scalar'
compute()

Compute mAR over IOU.

reset()
update(mini_batch)

datasetinsights.evaluation_metrics.average_relative_error

Average Relative Error metrics.

The average relative error can be described as:

\[\frac{1}{num\ samples}\sum_{p}^{num\ samples}\frac{|y_p-\hat{y_p}|}{y_p}\]
class datasetinsights.evaluation_metrics.average_relative_error.AverageRelativeError

Bases: datasetinsights.evaluation_metrics.base.EvaluationMetric

Average Relative Error metric.

The metric is defined for grayscale depth images.

sum_of_relative_error

the sum of the relative errors for all

Type

float

the images in a branch
num_samples

the number of samples in all mini-batches

Type

int

compute()
reset()
update(output)

datasetinsights.evaluation_metrics.base

class datasetinsights.evaluation_metrics.base.EvaluationMetric

Bases: object

Abstract base class for metrics.

COMPUTE_TYPE = ''
abstract compute()
static create(name, **kwargs)

Create a new instance of the metric subclass

Parameters
  • name (str) – unique identifier for a metric subclass

  • config (dict) – parameters specific to each metric subclass used to create a metric instance

Returns

an instance of the specified metric subclass

static find(name)

Find EvaluationMetric subclass based on the given name

Parameters

name (str) – unique identifier for a metric subclass

Returns

a label of the specified metric subclass

abstract reset()
abstract update(output)

datasetinsights.evaluation_metrics.confusion_matrix

datasetinsights.evaluation_metrics.confusion_matrix.precision_recall(gt_bboxes, pred_bboxes, iou_thresh=0.5)

Calculate precision and recall per image.

Parameters
  • gt_bboxes (List[BBox2D]) – a list of ground truth bounding boxes.

  • pred_bboxes (List[BBox2D]) – a list of predicted bounding boxes.

  • iou_thresh (float) – iou threshold. Defaults to 0.5.

Returns

(precision_per_image, recall_per_image).

Return type

tuple

datasetinsights.evaluation_metrics.confusion_matrix.prediction_records(gt_bboxes, pred_bboxes, iou_thresh=0.5)

Calculate prediction results per image.

Parameters
  • gt_bboxes (List[BBox2D]) – a list of ground truth bounding boxes.

  • pred_bboxes (List[BBox2D]) – a list of predicted bounding boxes.

  • iou_thresh (float) – iou threshold. Defaults to 0.5.

Returns

a Records class contains match results.

Return type

Records

datasetinsights.evaluation_metrics.exceptions

exception datasetinsights.evaluation_metrics.exceptions.NoSampleError

Bases: Exception

Raise when the number of samples is zero

datasetinsights.evaluation_metrics.iou

IoU evaluation metrics

class datasetinsights.evaluation_metrics.iou.IoU(num_classes, output_transform=<function IoU.<lambda>>)

Bases: datasetinsights.evaluation_metrics.base.EvaluationMetric

Intersection over Union (IoU) metric per class

The metric is defined for a pair of grayscale semantic segmentation images.

Parameters
  • num_classes – number of calsses in the ground truth image

  • output_transform – function that transform output pair of images

cm

pytorch ignite confusion matrix

Type

ignite.metrics.ConfusionMatrix

object.
compute()
reset()
update(output)

datasetinsights.evaluation_metrics.records

class datasetinsights.evaluation_metrics.records.Records(iou_threshold=0.5)

Bases: object

Save prediction records during update.

iou_threshold

iou threshold

Type

float

match_results

save the results (TP/FP)

Type

list

Parameters

iou_threshold (float) – iou threshold (default: 0.5)

add_records(gt_bboxes, pred_bboxes)

Add ground truth and prediction records.

Parameters
  • gt_bboxes – ground truth bboxes in the current image

  • pred_bboxes – sorted predicition bboxes in the current image

reset()

datasetinsights.evaluation_metrics.root_mean_square_error

Root Mean Square Error metrics.

The root mean square error can be described as:

\[\sqrt{\frac{1}{n}\sum_{p}^{n}{(y_p-\hat{y_p})}^2}\]
class datasetinsights.evaluation_metrics.root_mean_square_error.RootMeanSquareError

Bases: datasetinsights.evaluation_metrics.base.EvaluationMetric

Root Mean Square Error metric.

The metric is defined for grayscale depth images.

sum_of_root_mean_square_error

the sum of RMSE

Type

float

for all the images in a branch
num_samples

the number of samples in all mini-batches

Type

int

compute()
reset()
update(output)

datasetinsights.evaluation_metrics.threshold_accuracy

Average Log10 Error metric.

The average relative error can be described as:

\[(\delta_i):\%\:of\:y_p\:s.t.\: max(\frac{y_p}{\hat{y_p}})=\delta<thr\:for\:thr=1.25,1.25^2,1.25^3\]
class datasetinsights.evaluation_metrics.threshold_accuracy.ThresholdAccuracy(threshold)

Bases: datasetinsights.evaluation_metrics.base.EvaluationMetric

Threshold accuracy metric.

The metric is defined for grayscale depth images.

sum_of_threshold_acc

the sum of threshold accuracies for all the images in a branch

Type

int

num_samples

the number of samples in all mini-batches

Type

int

compute()
reset()
update(output)
class datasetinsights.evaluation_metrics.AverageLog10Error

Bases: datasetinsights.evaluation_metrics.base.EvaluationMetric

Average Log10 Error metric.

The metric is defined for grayscale depth images.

sum_of_log10_error

the sum of the log10 errors for all the

Type

float

images in a branch
num_samples

the number of samples in all mini-batches

Type

int

compute()
reset()
update(output)
class datasetinsights.evaluation_metrics.AveragePrecision(iou_threshold=0.5, interpolation='EveryPointInterpolation', max_detections=100)

Bases: datasetinsights.evaluation_metrics.base.EvaluationMetric

2D Bounding Box Average Precision metrics.

This metric would calculate average precision (AP) for each label under an iou threshold (default: 0.5). The maximum number of detections per image is limited (default: 100).

Parameters
  • iou_threshold (float) – iou threshold (default: 0.5)

  • interpolation (string) – AP interoperation method name for AP calculation

  • max_detections (int) – max detections per image (default: 100)

TYPE = 'metric_per_label'
compute()

Compute AP for each label.

Returns

a dictionary of AP scores per label.

Return type

dict

static every_point_interpolated_ap(recall, precision)

Calculating the interpolation performed in all points.

Parameters
  • recall (list) – recall history of the prediction

  • precision (list) – precision history of the prediction

Returns

average precision for all points interpolation

Return type

float

static n_point_interpolated_ap(recall, precision, point=11)

Calculating the n-point interpolation.

Parameters
  • recall (list) – recall history of the prediction

  • precision (list) – precision history of the prediction

  • point (int) – n, n-point interpolation

Returns

average precision for n-point interpolation

Return type

float

reset()

Reset AP metrics.

update(mini_batch)

Update records per mini batch

Parameters

mini_batch (list(list)) – a list which contains batch_size of gt bboxes and pred bboxes pair in each image. For example, if batch size = 2, mini_batch looks like: [[gt_bboxes1, pred_bboxes1], [gt_bboxes2, pred_bboxes2]] where gt_bboxes1, pred_bboxes1 contain gt bboxes and pred bboxes in one image.

class datasetinsights.evaluation_metrics.AveragePrecisionIOU50

Bases: datasetinsights.evaluation_metrics.base.EvaluationMetric

2D Bounding Box Average Precision at \(IOU=50\%\).

This implementation would calculate AP for each label at \(IOU=50\%\). This woud provide a mapping for label and average precision. The maximum number of detections per image is 100.

TYPE = 'metric_per_label'
compute()
reset()
update(mini_batch)
class datasetinsights.evaluation_metrics.AverageRecall(iou_threshold=0.5, max_detections=100)

Bases: datasetinsights.evaluation_metrics.base.EvaluationMetric

2D Bounding Box Average Recall metrics.

This metric would calculate average recall (AR) for each label under an iou threshold (default: 0.5). The maximum number of detections per image is limited (default: 100).

Parameters
  • iou_threshold (float) – iou threshold (default: 0.5)

  • max_detections (int) – max detections per image (default: 100)

TYPE = 'metric_per_label'
compute()

Compute AR for each label.

Returns

a dictionary of AR scores per label.

Return type

dict

reset()

Reset AR metrics.

update(mini_batch)

Update records per mini batch.

Parameters

mini_batch (list(list)) – a list which contains batch_size of gt bboxes and pred bboxes pair in each image. For example, if batch size = 2, mini_batch looks like: [[gt_bboxes1, pred_bboxes1], [gt_bboxes2, pred_bboxes2]] where gt_bboxes1, pred_bboxes1 contain gt bboxes and pred bboxes in one image.

class datasetinsights.evaluation_metrics.AverageRelativeError

Bases: datasetinsights.evaluation_metrics.base.EvaluationMetric

Average Relative Error metric.

The metric is defined for grayscale depth images.

sum_of_relative_error

the sum of the relative errors for all

Type

float

the images in a branch
num_samples

the number of samples in all mini-batches

Type

int

compute()
reset()
update(output)
class datasetinsights.evaluation_metrics.EvaluationMetric

Bases: object

Abstract base class for metrics.

COMPUTE_TYPE = ''
abstract compute()
static create(name, **kwargs)

Create a new instance of the metric subclass

Parameters
  • name (str) – unique identifier for a metric subclass

  • config (dict) – parameters specific to each metric subclass used to create a metric instance

Returns

an instance of the specified metric subclass

static find(name)

Find EvaluationMetric subclass based on the given name

Parameters

name (str) – unique identifier for a metric subclass

Returns

a label of the specified metric subclass

abstract reset()
abstract update(output)
class datasetinsights.evaluation_metrics.IoU(num_classes, output_transform=<function IoU.<lambda>>)

Bases: datasetinsights.evaluation_metrics.base.EvaluationMetric

Intersection over Union (IoU) metric per class

The metric is defined for a pair of grayscale semantic segmentation images.

Parameters
  • num_classes – number of calsses in the ground truth image

  • output_transform – function that transform output pair of images

cm

pytorch ignite confusion matrix

Type

ignite.metrics.ConfusionMatrix

object.
compute()
reset()
update(output)
class datasetinsights.evaluation_metrics.MeanAveragePrecisionAverageOverIOU

Bases: datasetinsights.evaluation_metrics.base.EvaluationMetric

2D Bounding Box Mean Average Precision metrics.

This implementation computes Mean Average Precision (mAP) metric, which is implemented as the Average Precision average over all labels and \(IOU = 0.5, 0.55, 0.60, ..., 0.95\). The max detections per image is limited to 100.

\[mAP = \frac{1}{N_\text{IOU}N_\text{label}}\sum_{\text{label}, \text{IOU}}AP(\text{label}, \text{IOU})\]
IOU_THRESHOULDS = array([0.5 , 0.55, 0.6 , 0.65, 0.7 , 0.75, 0.8 , 0.85, 0.9 , 0.95])
TYPE = 'scalar'
compute()

Compute mAP over IOU.

reset()
update(mini_batch)
class datasetinsights.evaluation_metrics.MeanAveragePrecisionIOU50

Bases: datasetinsights.evaluation_metrics.base.EvaluationMetric

2D Bounding Box Mean Average Precision metrics at \(IOU=50\%\).

This implementation would calculate mAP at \(IOU=50\%\). Average across all the labels.

\[mAP(\text{IOU=50})=\frac{1}{N_{\text{label}}}\sum_{\text{label}}AP(\text{label}, \text{IOU=50})\]

where AP is the AveragePrecision metrics competed separately for each label.

TYPE = 'scalar'
compute()
reset()
update(mini_batch)
class datasetinsights.evaluation_metrics.MeanAverageRecallAverageOverIOU

Bases: datasetinsights.evaluation_metrics.base.EvaluationMetric

2D Bounding Box Mean Average Recall metrics.

This implementation computes Mean Average Recall (mAR) metric, which is implemented as the Average Recall average over all labels and \(IOU = 0.5:0.95:0.05\). The max detections per image is limited to 100.

\[mAR = \frac{1}{N_\text{IOU}N_\text{label}}\sum_{\text{label}, \text{IOU}}AR(\text{label}, \text{IOU})\]
IOU_THRESHOULDS = array([0.5 , 0.55, 0.6 , 0.65, 0.7 , 0.75, 0.8 , 0.85, 0.9 , 0.95])
TYPE = 'scalar'
compute()

Compute mAR over IOU.

reset()
update(mini_batch)
class datasetinsights.evaluation_metrics.RootMeanSquareError

Bases: datasetinsights.evaluation_metrics.base.EvaluationMetric

Root Mean Square Error metric.

The metric is defined for grayscale depth images.

sum_of_root_mean_square_error

the sum of RMSE

Type

float

for all the images in a branch
num_samples

the number of samples in all mini-batches

Type

int

compute()
reset()
update(output)
class datasetinsights.evaluation_metrics.ThresholdAccuracy(threshold)

Bases: datasetinsights.evaluation_metrics.base.EvaluationMetric

Threshold accuracy metric.

The metric is defined for grayscale depth images.

sum_of_threshold_acc

the sum of threshold accuracies for all the images in a branch

Type

int

num_samples

the number of samples in all mini-batches

Type

int

compute()
reset()
update(output)