datasetinsights.evaluation_metrics

datasetinsights.evaluation_metrics.average_log10_error

Average Log10 Error metric.

The average log10 error can be described as:

\[\frac{1}{n}\sum_{p}^{n} |log_{10}(y_p)-log_{10}(\hat{y_p})|\]
class datasetinsights.evaluation_metrics.average_log10_error.AverageLog10Error

Bases: datasetinsights.evaluation_metrics.base.EvaluationMetric

Average Log10 Error metric.

The metric is defined for grayscale depth images.

sum_of_log10_error

the sum of the log10 errors for all the

Type

float

images in a branch
num_samples

the number of samples in all mini-batches

Type

int

compute()
reset()
update(output)

datasetinsights.evaluation_metrics.average_precision

class datasetinsights.evaluation_metrics.average_precision.AveragePrecision(config: datasetinsights.evaluation_metrics.average_precision.DetectionConfig = None)

Bases: datasetinsights.evaluation_metrics.base.EvaluationMetric

compute(boxes: Dict[str, Tuple[List[datasetinsights.io.bbox.BBox3D], List[datasetinsights.io.bbox.BBox3D]]] = None) → Dict[str, float]

calculate the mean ap for all classes over all distance thresholds defined in the config. The equation is described in the second figure in the nuscenes paper https://arxiv.org/pdf/1903.11027.pdf It is the normalized sum of the ROC curves for each class and distance. :param boxes: the predicted and ground truth bounding boxes per sample. :return: dictionary mapping each label to it’s average precision (averaged across all distance thresholds for that label)

reset()
update(boxes)
class datasetinsights.evaluation_metrics.average_precision.DetectionConfig(class_range: Dict[str, int] = {'barrier': 30, 'bicycle': 40, 'bus': 50, 'car': 50, 'construction_vehicle': 50, 'motorcycle': 40, 'pedestrian': 40, 'traffic_cone': 30, 'trailer': 50, 'truck': 50}, dist_fcn: str = 'center_distance', dist_ths: List[float] = [0.5, 1.0, 2.0, 4.0], min_recall: float = 0.1, min_precision: float = 0.1, max_boxes_per_sample: float = 500, mean_ap_weight: int = 5)

Bases: object

Data class that specifies the detection evaluation settings.

classmethod deserialize(content)

Initialize from serialized dictionary.

serialize() → dict

Serialize instance into json-friendly format.

datasetinsights.evaluation_metrics.average_precision.calc_ap(*, precision, min_recall: float, min_precision: float) → float

Calculated average precision.

datasetinsights.evaluation_metrics.average_precision.center_distance(*, gt_box: datasetinsights.io.bbox.BBox3D, pred_box: datasetinsights.io.bbox.BBox3D) → float

L2 distance between the box centers (xy only).

Parameters
  • gt_box (BBox2D) – GT annotation sample.

  • pred_box (BBox2D) – Predicted sample.

Returns

L2 distance.

datasetinsights.evaluation_metrics.average_precision_2d

Reference.

We implement the average precision metrics for object detection based on this: https://github.com/rafaelpadilla/Object-Detection-Metrics#average-precision

We optimize the metric update algorithm based on this: https://github.com/rafaelpadilla/Object-Detection-Metrics/blob/master/lib/Evaluator.py

class datasetinsights.evaluation_metrics.average_precision_2d.AveragePrecision(iou_threshold=0.5, interpolation='EveryPointInterpolation', max_detections=100)

Bases: datasetinsights.evaluation_metrics.base.EvaluationMetric

2D Bounding Box Average Precision metrics.

This metric would calculate average precision (AP) for each label under an iou threshold (default: 0.5). The maximum number of detections per image is limited (default: 100).

Parameters
  • iou_threshold (float) – iou threshold (default: 0.5)

  • interpolation (string) – AP interoperation method name for AP calculation

  • max_detections (int) – max detections per image (default: 100)

TYPE = 'metric_per_label'
compute()

Compute AP for each label.

Returns

a dictionary of AP scores per label.

Return type

dict

static every_point_interpolated_ap(recall, precision)

Calculating the interpolation performed in all points.

Parameters
  • recall (list) – recall history of the prediction

  • precision (list) – precision history of the prediction

Returns

average precision for all points interpolation

Return type

float

static n_point_interpolated_ap(recall, precision, point=11)

Calculating the n-point interpolation.

Parameters
  • recall (list) – recall history of the prediction

  • precision (list) – precision history of the prediction

  • point (int) – n, n-point interpolation

Returns

average precision for n-point interpolation

Return type

float

reset()

Reset AP metrics.

update(mini_batch)

Update records per mini batch

Parameters

mini_batch (list(list)) – a list which contains batch_size of gt bboxes and pred bboxes pair in each image. For example, if batch size = 2, mini_batch looks like: [[gt_bboxes1, pred_bboxes1], [gt_bboxes2, pred_bboxes2]] where gt_bboxes1, pred_bboxes1 contain gt bboxes and pred bboxes in one image.

class datasetinsights.evaluation_metrics.average_precision_2d.AveragePrecisionIOU50

Bases: datasetinsights.evaluation_metrics.base.EvaluationMetric

2D Bounding Box Average Precision at IOU = 50%.

This implementation would calculate AP at IOU = 50% for each label.

TYPE = 'metric_per_label'
compute()
reset()
update(mini_batch)
class datasetinsights.evaluation_metrics.average_precision_2d.MeanAveragePrecisionAverageOverIOU

Bases: datasetinsights.evaluation_metrics.base.EvaluationMetric

2D Bounding Box Mean Average Precision metrics.

This implementation computes Mean Average Precision (mAP) metric, which is implemented as the Average Precision average over all labels and IOU = 0.5:0.95:0.05. The max detections per image is limited to 100.

\[mAP^{IoU=0.5:0.95:0.05} = mean_{label,IoU}\]
\[AP^{label, IoU=0.5:0.95:0.05}\]
IOU_THRESHOULDS = array([0.5 , 0.55, 0.6 , 0.65, 0.7 , 0.75, 0.8 , 0.85, 0.9 , 0.95])
TYPE = 'scalar'
compute()

Compute mAP over IOU.

reset()
update(mini_batch)
class datasetinsights.evaluation_metrics.average_precision_2d.MeanAveragePrecisionIOU50

Bases: datasetinsights.evaluation_metrics.base.EvaluationMetric

2D Bounding Box Mean Average Precision metrics at IOU=50%.

This implementation would calculate mAP at IOU=50%.

\[mAP^{IoU=50} = mean_{label}AP^{label, IoU=50}\]
TYPE = 'scalar'
compute()
reset()
update(mini_batch)

datasetinsights.evaluation_metrics.average_precision_config

datasetinsights.evaluation_metrics.average_recall_2d

Reference.

http://cocodataset.org/#detection-eval https://arxiv.org/pdf/1502.05082.pdf https://github.com/rafaelpadilla/Object-Detection-Metrics/issues/22

class datasetinsights.evaluation_metrics.average_recall_2d.AverageRecall(iou_threshold=0.5, max_detections=100)

Bases: datasetinsights.evaluation_metrics.base.EvaluationMetric

2D Bounding Box Average Recall metrics.

This metric would calculate average recall (AR) for each label under an iou threshold (default: 0.5). The maximum number of detections per image is limited (default: 100).

Parameters
  • iou_threshold (float) – iou threshold (default: 0.5)

  • max_detections (int) – max detections per image (default: 100)

TYPE = 'metric_per_label'
compute()

Compute AR for each label.

Returns

a dictionary of AR scores per label.

Return type

dict

reset()

Reset AR metrics.

update(mini_batch)

Update records per mini batch.

Parameters

mini_batch (list(list)) – a list which contains batch_size of gt bboxes and pred bboxes pair in each image. For example, if batch size = 2, mini_batch looks like: [[gt_bboxes1, pred_bboxes1], [gt_bboxes2, pred_bboxes2]] where gt_bboxes1, pred_bboxes1 contain gt bboxes and pred bboxes in one image.

class datasetinsights.evaluation_metrics.average_recall_2d.MeanAverageRecallAverageOverIOU

Bases: datasetinsights.evaluation_metrics.base.EvaluationMetric

2D Bounding Box Mean Average Recall metrics.

This implementation computes Mean Average Recall (mAR) metric, which is implemented as the Average Recall average over all labels and IOU = 0.5:0.95:0.05. The max detections per image is limited to 100.

\[mAR^{IoU=0.5:0.95:0.05} = mean_{label,IoU}\]
\[AR^{label, IoU=0.5:0.95:0.05}\]
IOU_THRESHOULDS = array([0.5 , 0.55, 0.6 , 0.65, 0.7 , 0.75, 0.8 , 0.85, 0.9 , 0.95])
TYPE = 'scalar'
compute()

Compute mAR over IOU.

reset()
update(mini_batch)

datasetinsights.evaluation_metrics.average_relative_error

Average Relative Error metrics.

The average relative error can be described as:

\[\frac{1}{num\ samples}\sum_{p}^{num\ samples}\frac{|y_p-\hat{y_p}|}{y_p}\]
class datasetinsights.evaluation_metrics.average_relative_error.AverageRelativeError

Bases: datasetinsights.evaluation_metrics.base.EvaluationMetric

Average Relative Error metric.

The metric is defined for grayscale depth images.

sum_of_relative_error

the sum of the relative errors for all

Type

float

the images in a branch
num_samples

the number of samples in all mini-batches

Type

int

compute()
reset()
update(output)

datasetinsights.evaluation_metrics.base

class datasetinsights.evaluation_metrics.base.EvaluationMetric

Bases: object

Abstract base class for metrics.

COMPUTE_TYPE = ''
abstract compute()
static create(name, **kwargs)

Create a new instance of the metric subclass

Parameters
  • name (str) – unique identifier for a metric subclass

  • config (dict) – parameters specific to each metric subclass used to create a metric instance

Returns

an instance of the specified metric subclass

static find(name)

Find EvaluationMetric subclass based on the given name

Parameters

name (str) – unique identifier for a metric subclass

Returns

a label of the specified metric subclass

abstract reset()
abstract update(output)

datasetinsights.evaluation_metrics.confusion_matrix

datasetinsights.evaluation_metrics.confusion_matrix.precision_recall(gt_bboxes, pred_bboxes, iou_thresh=0.5)

Calculate precision and recall per image.

Parameters
  • gt_bboxes (List[BBox2D]) – a list of ground truth bounding boxes.

  • pred_bboxes (List[BBox2D]) – a list of predicted bounding boxes.

  • iou_thresh (float) – iou threshold. Defaults to 0.5.

Returns

(precision_per_image, recall_per_image).

Return type

tuple

datasetinsights.evaluation_metrics.confusion_matrix.prediction_records(gt_bboxes, pred_bboxes, iou_thresh=0.5)

Calculate prediction results per image.

Parameters
  • gt_bboxes (List[BBox2D]) – a list of ground truth bounding boxes.

  • pred_bboxes (List[BBox2D]) – a list of predicted bounding boxes.

  • iou_thresh (float) – iou threshold. Defaults to 0.5.

Returns

a Records class contains match results.

Return type

Records

datasetinsights.evaluation_metrics.exceptions

exception datasetinsights.evaluation_metrics.exceptions.NoSampleError

Bases: Exception

Raise when the number of samples is zero

datasetinsights.evaluation_metrics.iou

IoU evaluation metrics

class datasetinsights.evaluation_metrics.iou.IoU(num_classes, output_transform=<function IoU.<lambda>>)

Bases: datasetinsights.evaluation_metrics.base.EvaluationMetric

Intersection over Union (IoU) metric per class

The metric is defined for a pair of grayscale semantic segmentation images.

Parameters
  • num_classes – number of calsses in the ground truth image

  • output_transform – function that transform output pair of images

cm

pytorch ignite confusion matrix

Type

ignite.metrics.ConfusionMatrix

object.
compute()
reset()
update(output)

datasetinsights.evaluation_metrics.records

class datasetinsights.evaluation_metrics.records.Records(iou_threshold=0.5)

Bases: object

Save prediction records during update.

iou_threshold

iou threshold

Type

float

match_results

save the results (TP/FP)

Type

list

Parameters

iou_threshold (float) – iou threshold (default: 0.5)

add_records(gt_bboxes, pred_bboxes)

Add ground truth and prediction records.

Parameters
  • gt_bboxes – ground truth bboxes in the current image

  • pred_bboxes – sorted predicition bboxes in the current image

reset()

datasetinsights.evaluation_metrics.root_mean_square_error

Root Mean Square Error metrics.

The root mean square error can be described as:

\[\sqrt{\frac{1}{n}\sum_{p}^{n}{(y_p-\hat{y_p})}^2}\]
class datasetinsights.evaluation_metrics.root_mean_square_error.RootMeanSquareError

Bases: datasetinsights.evaluation_metrics.base.EvaluationMetric

Root Mean Square Error metric.

The metric is defined for grayscale depth images.

sum_of_root_mean_square_error

the sum of RMSE

Type

float

for all the images in a branch
num_samples

the number of samples in all mini-batches

Type

int

compute()
reset()
update(output)

datasetinsights.evaluation_metrics.threshold_accuracy

Average Log10 Error metric.

The average relative error can be described as:

\[(\delta_i):\%\:of\:y_p\:s.t.\: max(\frac{y_p}{\hat{y_p}})=\delta<thr\:for\:thr=1.25,1.25^2,1.25^3\]
class datasetinsights.evaluation_metrics.threshold_accuracy.ThresholdAccuracy(threshold)

Bases: datasetinsights.evaluation_metrics.base.EvaluationMetric

Threshold accuracy metric.

The metric is defined for grayscale depth images.

sum_of_threshold_acc

the sum of threshold accuracies for all the images in a branch

Type

int

num_samples

the number of samples in all mini-batches

Type

int

compute()
reset()
update(output)
class datasetinsights.evaluation_metrics.AverageLog10Error

Bases: datasetinsights.evaluation_metrics.base.EvaluationMetric

Average Log10 Error metric.

The metric is defined for grayscale depth images.

sum_of_log10_error

the sum of the log10 errors for all the

Type

float

images in a branch
num_samples

the number of samples in all mini-batches

Type

int

compute()
reset()
update(output)
class datasetinsights.evaluation_metrics.AveragePrecision(iou_threshold=0.5, interpolation='EveryPointInterpolation', max_detections=100)

Bases: datasetinsights.evaluation_metrics.base.EvaluationMetric

2D Bounding Box Average Precision metrics.

This metric would calculate average precision (AP) for each label under an iou threshold (default: 0.5). The maximum number of detections per image is limited (default: 100).

Parameters
  • iou_threshold (float) – iou threshold (default: 0.5)

  • interpolation (string) – AP interoperation method name for AP calculation

  • max_detections (int) – max detections per image (default: 100)

TYPE = 'metric_per_label'
compute()

Compute AP for each label.

Returns

a dictionary of AP scores per label.

Return type

dict

static every_point_interpolated_ap(recall, precision)

Calculating the interpolation performed in all points.

Parameters
  • recall (list) – recall history of the prediction

  • precision (list) – precision history of the prediction

Returns

average precision for all points interpolation

Return type

float

static n_point_interpolated_ap(recall, precision, point=11)

Calculating the n-point interpolation.

Parameters
  • recall (list) – recall history of the prediction

  • precision (list) – precision history of the prediction

  • point (int) – n, n-point interpolation

Returns

average precision for n-point interpolation

Return type

float

reset()

Reset AP metrics.

update(mini_batch)

Update records per mini batch

Parameters

mini_batch (list(list)) – a list which contains batch_size of gt bboxes and pred bboxes pair in each image. For example, if batch size = 2, mini_batch looks like: [[gt_bboxes1, pred_bboxes1], [gt_bboxes2, pred_bboxes2]] where gt_bboxes1, pred_bboxes1 contain gt bboxes and pred bboxes in one image.

class datasetinsights.evaluation_metrics.AveragePrecisionIOU50

Bases: datasetinsights.evaluation_metrics.base.EvaluationMetric

2D Bounding Box Average Precision at IOU = 50%.

This implementation would calculate AP at IOU = 50% for each label.

TYPE = 'metric_per_label'
compute()
reset()
update(mini_batch)
class datasetinsights.evaluation_metrics.AverageRecall(iou_threshold=0.5, max_detections=100)

Bases: datasetinsights.evaluation_metrics.base.EvaluationMetric

2D Bounding Box Average Recall metrics.

This metric would calculate average recall (AR) for each label under an iou threshold (default: 0.5). The maximum number of detections per image is limited (default: 100).

Parameters
  • iou_threshold (float) – iou threshold (default: 0.5)

  • max_detections (int) – max detections per image (default: 100)

TYPE = 'metric_per_label'
compute()

Compute AR for each label.

Returns

a dictionary of AR scores per label.

Return type

dict

reset()

Reset AR metrics.

update(mini_batch)

Update records per mini batch.

Parameters

mini_batch (list(list)) – a list which contains batch_size of gt bboxes and pred bboxes pair in each image. For example, if batch size = 2, mini_batch looks like: [[gt_bboxes1, pred_bboxes1], [gt_bboxes2, pred_bboxes2]] where gt_bboxes1, pred_bboxes1 contain gt bboxes and pred bboxes in one image.

class datasetinsights.evaluation_metrics.AverageRelativeError

Bases: datasetinsights.evaluation_metrics.base.EvaluationMetric

Average Relative Error metric.

The metric is defined for grayscale depth images.

sum_of_relative_error

the sum of the relative errors for all

Type

float

the images in a branch
num_samples

the number of samples in all mini-batches

Type

int

compute()
reset()
update(output)
class datasetinsights.evaluation_metrics.EvaluationMetric

Bases: object

Abstract base class for metrics.

COMPUTE_TYPE = ''
abstract compute()
static create(name, **kwargs)

Create a new instance of the metric subclass

Parameters
  • name (str) – unique identifier for a metric subclass

  • config (dict) – parameters specific to each metric subclass used to create a metric instance

Returns

an instance of the specified metric subclass

static find(name)

Find EvaluationMetric subclass based on the given name

Parameters

name (str) – unique identifier for a metric subclass

Returns

a label of the specified metric subclass

abstract reset()
abstract update(output)
class datasetinsights.evaluation_metrics.IoU(num_classes, output_transform=<function IoU.<lambda>>)

Bases: datasetinsights.evaluation_metrics.base.EvaluationMetric

Intersection over Union (IoU) metric per class

The metric is defined for a pair of grayscale semantic segmentation images.

Parameters
  • num_classes – number of calsses in the ground truth image

  • output_transform – function that transform output pair of images

cm

pytorch ignite confusion matrix

Type

ignite.metrics.ConfusionMatrix

object.
compute()
reset()
update(output)
class datasetinsights.evaluation_metrics.MeanAveragePrecisionAverageOverIOU

Bases: datasetinsights.evaluation_metrics.base.EvaluationMetric

2D Bounding Box Mean Average Precision metrics.

This implementation computes Mean Average Precision (mAP) metric, which is implemented as the Average Precision average over all labels and IOU = 0.5:0.95:0.05. The max detections per image is limited to 100.

\[mAP^{IoU=0.5:0.95:0.05} = mean_{label,IoU}\]
\[AP^{label, IoU=0.5:0.95:0.05}\]
IOU_THRESHOULDS = array([0.5 , 0.55, 0.6 , 0.65, 0.7 , 0.75, 0.8 , 0.85, 0.9 , 0.95])
TYPE = 'scalar'
compute()

Compute mAP over IOU.

reset()
update(mini_batch)
class datasetinsights.evaluation_metrics.MeanAveragePrecisionIOU50

Bases: datasetinsights.evaluation_metrics.base.EvaluationMetric

2D Bounding Box Mean Average Precision metrics at IOU=50%.

This implementation would calculate mAP at IOU=50%.

\[mAP^{IoU=50} = mean_{label}AP^{label, IoU=50}\]
TYPE = 'scalar'
compute()
reset()
update(mini_batch)
class datasetinsights.evaluation_metrics.MeanAverageRecallAverageOverIOU

Bases: datasetinsights.evaluation_metrics.base.EvaluationMetric

2D Bounding Box Mean Average Recall metrics.

This implementation computes Mean Average Recall (mAR) metric, which is implemented as the Average Recall average over all labels and IOU = 0.5:0.95:0.05. The max detections per image is limited to 100.

\[mAR^{IoU=0.5:0.95:0.05} = mean_{label,IoU}\]
\[AR^{label, IoU=0.5:0.95:0.05}\]
IOU_THRESHOULDS = array([0.5 , 0.55, 0.6 , 0.65, 0.7 , 0.75, 0.8 , 0.85, 0.9 , 0.95])
TYPE = 'scalar'
compute()

Compute mAR over IOU.

reset()
update(mini_batch)
class datasetinsights.evaluation_metrics.RootMeanSquareError

Bases: datasetinsights.evaluation_metrics.base.EvaluationMetric

Root Mean Square Error metric.

The metric is defined for grayscale depth images.

sum_of_root_mean_square_error

the sum of RMSE

Type

float

for all the images in a branch
num_samples

the number of samples in all mini-batches

Type

int

compute()
reset()
update(output)
class datasetinsights.evaluation_metrics.ThresholdAccuracy(threshold)

Bases: datasetinsights.evaluation_metrics.base.EvaluationMetric

Threshold accuracy metric.

The metric is defined for grayscale depth images.

sum_of_threshold_acc

the sum of threshold accuracies for all the images in a branch

Type

int

num_samples

the number of samples in all mini-batches

Type

int

compute()
reset()
update(output)