datasetinsights.evaluation_metrics¶

datasetinsights.evaluation_metrics.average_log10_error¶

Average Log10 Error metric.

The average log10 error can be described as:

\[\frac{1}{n}\sum_{p}^{n} |log_{10}(y_p)-log_{10}(\hat{y_p})|\]

class datasetinsights.evaluation_metrics.average_log10_error.AverageLog10Error¶

Bases: datasetinsights.evaluation_metrics.base.EvaluationMetric

Average Log10 Error metric.

The metric is defined for grayscale depth images.

sum_of_log10_error¶

the sum of the log10 errors for all the

Type: float

images in a branch

num_samples¶

the number of samples in all mini-batches

Type: int

compute()¶

reset()¶

update(output)¶

datasetinsights.evaluation_metrics.average_precision¶

class datasetinsights.evaluation_metrics.average_precision.AveragePrecision(config: datasetinsights.evaluation_metrics.average_precision.DetectionConfig = None)¶

Bases: datasetinsights.evaluation_metrics.base.EvaluationMetric

compute(boxes: Dict[str, Tuple[List[datasetinsights.io.bbox.BBox3D], List[datasetinsights.io.bbox.BBox3D]]] = None) → Dict[str, float]¶: calculate the mean ap for all classes over all distance thresholds defined in the config. The equation is described in the second figure in the nuscenes paper https://arxiv.org/pdf/1903.11027.pdf It is the normalized sum of the ROC curves for each class and distance. :param boxes: the predicted and ground truth bounding boxes per sample. :return: dictionary mapping each label to it’s average precision (averaged across all distance thresholds for that label)

reset()¶

update(boxes)¶

class datasetinsights.evaluation_metrics.average_precision.DetectionConfig(class_range: Dict[str, int] = {'barrier': 30, 'bicycle': 40, 'bus': 50, 'car': 50, 'construction_vehicle': 50, 'motorcycle': 40, 'pedestrian': 40, 'traffic_cone': 30, 'trailer': 50, 'truck': 50}, dist_fcn: str = 'center_distance', dist_ths: List[float] = [0.5, 1.0, 2.0, 4.0], min_recall: float = 0.1, min_precision: float = 0.1, max_boxes_per_sample: float = 500, mean_ap_weight: int = 5)¶

Bases: object

Data class that specifies the detection evaluation settings.

classmethod deserialize(content)¶: Initialize from serialized dictionary.

serialize() → dict¶: Serialize instance into json-friendly format.

datasetinsights.evaluation_metrics.average_precision.calc_ap(*, precision, min_recall: float, min_precision: float) → float¶: Calculated average precision.

datasetinsights.evaluation_metrics.average_precision.center_distance(*, gt_box: datasetinsights.io.bbox.BBox3D, pred_box: datasetinsights.io.bbox.BBox3D) → float¶

L2 distance between the box centers (xy only).

Parameters

gt_box (BBox2D) – GT annotation sample.
pred_box (BBox2D) – Predicted sample.

Returns

L2 distance.

datasetinsights.evaluation_metrics.average_precision_2d¶

Reference.

We implement the average precision metrics for object detection based on this: https://github.com/rafaelpadilla/Object-Detection-Metrics#average-precision

We optimize the metric update algorithm based on this: https://github.com/rafaelpadilla/Object-Detection-Metrics/blob/master/lib/Evaluator.py

class datasetinsights.evaluation_metrics.average_precision_2d.AveragePrecision(iou_threshold=0.5, interpolation='EveryPointInterpolation', max_detections=100)¶

Bases: datasetinsights.evaluation_metrics.base.EvaluationMetric

2D Bounding Box Average Precision metrics.

This metric would calculate average precision (AP) for each label under an iou threshold (default: 0.5). The maximum number of detections per image is limited (default: 100).

Parameters

iou_threshold (float) – iou threshold (default: 0.5)
interpolation (string) – AP interoperation method name for AP calculation
max_detections (int) – max detections per image (default: 100)

TYPE = 'metric_per_label'¶

compute()¶

Compute AP for each label.

Returns: a dictionary of AP scores per label.
Return type: dict

static every_point_interpolated_ap(recall, precision)¶

Calculating the interpolation performed in all points.

Parameters

recall (list) – recall history of the prediction
precision (list) – precision history of the prediction

Returns

average precision for all points interpolation

Return type

float

static n_point_interpolated_ap(recall, precision, point=11)¶

Calculating the n-point interpolation.

Parameters

recall (list) – recall history of the prediction
precision (list) – precision history of the prediction
point (int) – n, n-point interpolation

Returns

average precision for n-point interpolation

Return type

float

reset()¶: Reset AP metrics.

update(mini_batch)¶

Update records per mini batch

Parameters: mini_batch (list(list)) – a list which contains batch_size of gt bboxes and pred bboxes pair in each image. For example, if batch size = 2, mini_batch looks like: [[gt_bboxes1, pred_bboxes1], [gt_bboxes2, pred_bboxes2]] where gt_bboxes1, pred_bboxes1 contain gt bboxes and pred bboxes in one image.

class datasetinsights.evaluation_metrics.average_precision_2d.AveragePrecisionIOU50¶

Bases: datasetinsights.evaluation_metrics.base.EvaluationMetric

2D Bounding Box Average Precision at IOU = 50%.

This implementation would calculate AP at IOU = 50% for each label.

TYPE = 'metric_per_label'¶

compute()¶

reset()¶

update(mini_batch)¶

class datasetinsights.evaluation_metrics.average_precision_2d.MeanAveragePrecisionAverageOverIOU¶

Bases: datasetinsights.evaluation_metrics.base.EvaluationMetric

2D Bounding Box Mean Average Precision metrics.

This implementation computes Mean Average Precision (mAP) metric, which is implemented as the Average Precision average over all labels and IOU = 0.5:0.95:0.05. The max detections per image is limited to 100.

\[mAP^{IoU=0.5:0.95:0.05} = mean_{label,IoU}\]

\[AP^{label, IoU=0.5:0.95:0.05}\]

IOU_THRESHOULDS = array([0.5 , 0.55, 0.6 , 0.65, 0.7 , 0.75, 0.8 , 0.85, 0.9 , 0.95])¶

TYPE = 'scalar'¶

compute()¶: Compute mAP over IOU.

reset()¶

update(mini_batch)¶

class datasetinsights.evaluation_metrics.average_precision_2d.MeanAveragePrecisionIOU50¶

Bases: datasetinsights.evaluation_metrics.base.EvaluationMetric

2D Bounding Box Mean Average Precision metrics at IOU=50%.

This implementation would calculate mAP at IOU=50%.

\[mAP^{IoU=50} = mean_{label}AP^{label, IoU=50}\]

TYPE = 'scalar'¶

compute()¶

reset()¶

update(mini_batch)¶

datasetinsights.evaluation_metrics.average_precision_config¶

datasetinsights.evaluation_metrics.average_recall_2d¶

Reference.

http://cocodataset.org/#detection-eval https://arxiv.org/pdf/1502.05082.pdf https://github.com/rafaelpadilla/Object-Detection-Metrics/issues/22

class datasetinsights.evaluation_metrics.average_recall_2d.AverageRecall(iou_threshold=0.5, max_detections=100)¶

Bases: datasetinsights.evaluation_metrics.base.EvaluationMetric

2D Bounding Box Average Recall metrics.

This metric would calculate average recall (AR) for each label under an iou threshold (default: 0.5). The maximum number of detections per image is limited (default: 100).

Parameters

iou_threshold (float) – iou threshold (default: 0.5)
max_detections (int) – max detections per image (default: 100)

TYPE = 'metric_per_label'¶

compute()¶

Compute AR for each label.

Returns: a dictionary of AR scores per label.
Return type: dict

reset()¶: Reset AR metrics.

update(mini_batch)¶

Update records per mini batch.

Parameters: mini_batch (list(list)) – a list which contains batch_size of gt bboxes and pred bboxes pair in each image. For example, if batch size = 2, mini_batch looks like: [[gt_bboxes1, pred_bboxes1], [gt_bboxes2, pred_bboxes2]] where gt_bboxes1, pred_bboxes1 contain gt bboxes and pred bboxes in one image.

class datasetinsights.evaluation_metrics.average_recall_2d.MeanAverageRecallAverageOverIOU¶

Bases: datasetinsights.evaluation_metrics.base.EvaluationMetric

2D Bounding Box Mean Average Recall metrics.

This implementation computes Mean Average Recall (mAR) metric, which is implemented as the Average Recall average over all labels and IOU = 0.5:0.95:0.05. The max detections per image is limited to 100.

\[mAR^{IoU=0.5:0.95:0.05} = mean_{label,IoU}\]

\[AR^{label, IoU=0.5:0.95:0.05}\]

IOU_THRESHOULDS = array([0.5 , 0.55, 0.6 , 0.65, 0.7 , 0.75, 0.8 , 0.85, 0.9 , 0.95])¶

TYPE = 'scalar'¶

compute()¶: Compute mAR over IOU.

reset()¶

update(mini_batch)¶

datasetinsights.evaluation_metrics.average_relative_error¶

Average Relative Error metrics.

The average relative error can be described as:

\[\frac{1}{num\ samples}\sum_{p}^{num\ samples}\frac{|y_p-\hat{y_p}|}{y_p}\]

class datasetinsights.evaluation_metrics.average_relative_error.AverageRelativeError¶

Bases: datasetinsights.evaluation_metrics.base.EvaluationMetric

Average Relative Error metric.

The metric is defined for grayscale depth images.

sum_of_relative_error¶

the sum of the relative errors for all

Type: float

the images in a branch

num_samples¶

the number of samples in all mini-batches

Type: int

compute()¶

reset()¶

update(output)¶

datasetinsights.evaluation_metrics.base¶

class datasetinsights.evaluation_metrics.base.EvaluationMetric¶

Bases: object

Abstract base class for metrics.

COMPUTE_TYPE = ''¶

abstract compute()¶

static create(name, **kwargs)¶

Create a new instance of the metric subclass

Parameters

name (str) – unique identifier for a metric subclass
config (dict) – parameters specific to each metric subclass used to create a metric instance

Returns

an instance of the specified metric subclass

static find(name)¶

Find EvaluationMetric subclass based on the given name

Parameters: name (str) – unique identifier for a metric subclass
Returns: a label of the specified metric subclass

abstract reset()¶

abstract update(output)¶

datasetinsights.evaluation_metrics.confusion_matrix¶

datasetinsights.evaluation_metrics.confusion_matrix.precision_recall(gt_bboxes, pred_bboxes, iou_thresh=0.5)¶

Calculate precision and recall per image.

Parameters

gt_bboxes (List[BBox2D]) – a list of ground truth bounding boxes.
pred_bboxes (List[BBox2D]) – a list of predicted bounding boxes.
iou_thresh (float) – iou threshold. Defaults to 0.5.

Returns

(precision_per_image, recall_per_image).

Return type

tuple

datasetinsights.evaluation_metrics.confusion_matrix.prediction_records(gt_bboxes, pred_bboxes, iou_thresh=0.5)¶

Calculate prediction results per image.

Parameters

gt_bboxes (List[BBox2D]) – a list of ground truth bounding boxes.
pred_bboxes (List[BBox2D]) – a list of predicted bounding boxes.
iou_thresh (float) – iou threshold. Defaults to 0.5.

Returns

a Records class contains match results.

Return type

Records

datasetinsights.evaluation_metrics.exceptions¶

exception datasetinsights.evaluation_metrics.exceptions.NoSampleError¶

Bases: Exception

Raise when the number of samples is zero

datasetinsights.evaluation_metrics.iou¶

IoU evaluation metrics

class datasetinsights.evaluation_metrics.iou.IoU(num_classes, output_transform=<function IoU.<lambda>>)¶

Bases: datasetinsights.evaluation_metrics.base.EvaluationMetric

Intersection over Union (IoU) metric per class

The metric is defined for a pair of grayscale semantic segmentation images.

Parameters

num_classes – number of calsses in the ground truth image
output_transform – function that transform output pair of images

cm¶

pytorch ignite confusion matrix

Type: ignite.metrics.ConfusionMatrix

object.

compute()¶

reset()¶

update(output)¶

datasetinsights.evaluation_metrics.records¶

class datasetinsights.evaluation_metrics.records.Records(iou_threshold=0.5)¶

Bases: object

Save prediction records during update.

iou_threshold¶

iou threshold

Type: float

match_results¶

save the results (TP/FP)

Type: list

Parameters: iou_threshold (float) – iou threshold (default: 0.5)

add_records(gt_bboxes, pred_bboxes)¶

Add ground truth and prediction records.

Parameters

gt_bboxes – ground truth bboxes in the current image
pred_bboxes – sorted predicition bboxes in the current image

reset()¶

datasetinsights.evaluation_metrics.root_mean_square_error¶

Root Mean Square Error metrics.

The root mean square error can be described as:

\[\sqrt{\frac{1}{n}\sum_{p}^{n}{(y_p-\hat{y_p})}^2}\]

class datasetinsights.evaluation_metrics.root_mean_square_error.RootMeanSquareError¶

Bases: datasetinsights.evaluation_metrics.base.EvaluationMetric

Root Mean Square Error metric.

The metric is defined for grayscale depth images.

sum_of_root_mean_square_error¶

the sum of RMSE

Type: float

for all the images in a branch

num_samples¶

the number of samples in all mini-batches

Type: int

compute()¶

reset()¶

update(output)¶

datasetinsights.evaluation_metrics.threshold_accuracy¶

Average Log10 Error metric.

The average relative error can be described as:

\[(\delta_i):\%\:of\:y_p\:s.t.\: max(\frac{y_p}{\hat{y_p}})=\delta<thr\:for\:thr=1.25,1.25^2,1.25^3\]

class datasetinsights.evaluation_metrics.threshold_accuracy.ThresholdAccuracy(threshold)¶

Bases: datasetinsights.evaluation_metrics.base.EvaluationMetric

Threshold accuracy metric.

The metric is defined for grayscale depth images.

sum_of_threshold_acc¶

the sum of threshold accuracies for all the images in a branch

Type: int

num_samples¶

the number of samples in all mini-batches

Type: int

compute()¶

reset()¶

update(output)¶

class datasetinsights.evaluation_metrics.AverageLog10Error¶

Bases: datasetinsights.evaluation_metrics.base.EvaluationMetric

Average Log10 Error metric.

The metric is defined for grayscale depth images.

sum_of_log10_error¶

the sum of the log10 errors for all the

Type: float

images in a branch

num_samples¶

the number of samples in all mini-batches

Type: int

compute()¶

reset()¶

update(output)¶

class datasetinsights.evaluation_metrics.AveragePrecision(iou_threshold=0.5, interpolation='EveryPointInterpolation', max_detections=100)¶

Bases: datasetinsights.evaluation_metrics.base.EvaluationMetric

2D Bounding Box Average Precision metrics.

This metric would calculate average precision (AP) for each label under an iou threshold (default: 0.5). The maximum number of detections per image is limited (default: 100).

Parameters

iou_threshold (float) – iou threshold (default: 0.5)
interpolation (string) – AP interoperation method name for AP calculation
max_detections (int) – max detections per image (default: 100)

TYPE = 'metric_per_label'¶

compute()¶

Compute AP for each label.

Returns: a dictionary of AP scores per label.
Return type: dict

static every_point_interpolated_ap(recall, precision)¶

Calculating the interpolation performed in all points.

Parameters

recall (list) – recall history of the prediction
precision (list) – precision history of the prediction

Returns

average precision for all points interpolation

Return type

float

static n_point_interpolated_ap(recall, precision, point=11)¶

Calculating the n-point interpolation.

Parameters

recall (list) – recall history of the prediction
precision (list) – precision history of the prediction
point (int) – n, n-point interpolation

Returns

average precision for n-point interpolation

Return type

float

reset()¶: Reset AP metrics.

update(mini_batch)¶

Update records per mini batch

Parameters: mini_batch (list(list)) – a list which contains batch_size of gt bboxes and pred bboxes pair in each image. For example, if batch size = 2, mini_batch looks like: [[gt_bboxes1, pred_bboxes1], [gt_bboxes2, pred_bboxes2]] where gt_bboxes1, pred_bboxes1 contain gt bboxes and pred bboxes in one image.

class datasetinsights.evaluation_metrics.AveragePrecisionIOU50¶

Bases: datasetinsights.evaluation_metrics.base.EvaluationMetric

2D Bounding Box Average Precision at IOU = 50%.

This implementation would calculate AP at IOU = 50% for each label.

TYPE = 'metric_per_label'¶

compute()¶

reset()¶

update(mini_batch)¶

class datasetinsights.evaluation_metrics.AverageRecall(iou_threshold=0.5, max_detections=100)¶

Bases: datasetinsights.evaluation_metrics.base.EvaluationMetric

2D Bounding Box Average Recall metrics.

This metric would calculate average recall (AR) for each label under an iou threshold (default: 0.5). The maximum number of detections per image is limited (default: 100).

Parameters

iou_threshold (float) – iou threshold (default: 0.5)
max_detections (int) – max detections per image (default: 100)

TYPE = 'metric_per_label'¶

compute()¶

Compute AR for each label.

Returns: a dictionary of AR scores per label.
Return type: dict

reset()¶: Reset AR metrics.

update(mini_batch)¶

Update records per mini batch.

Parameters: mini_batch (list(list)) – a list which contains batch_size of gt bboxes and pred bboxes pair in each image. For example, if batch size = 2, mini_batch looks like: [[gt_bboxes1, pred_bboxes1], [gt_bboxes2, pred_bboxes2]] where gt_bboxes1, pred_bboxes1 contain gt bboxes and pred bboxes in one image.

class datasetinsights.evaluation_metrics.AverageRelativeError¶

Bases: datasetinsights.evaluation_metrics.base.EvaluationMetric

Average Relative Error metric.

The metric is defined for grayscale depth images.

sum_of_relative_error¶

the sum of the relative errors for all

Type: float

the images in a branch

num_samples¶

the number of samples in all mini-batches

Type: int

compute()¶

reset()¶

update(output)¶

class datasetinsights.evaluation_metrics.EvaluationMetric¶

Bases: object

Abstract base class for metrics.

COMPUTE_TYPE = ''¶

abstract compute()¶

static create(name, **kwargs)¶

Create a new instance of the metric subclass

Parameters

name (str) – unique identifier for a metric subclass
config (dict) – parameters specific to each metric subclass used to create a metric instance

Returns

an instance of the specified metric subclass

static find(name)¶

Find EvaluationMetric subclass based on the given name

Parameters: name (str) – unique identifier for a metric subclass
Returns: a label of the specified metric subclass

abstract reset()¶

abstract update(output)¶

class datasetinsights.evaluation_metrics.IoU(num_classes, output_transform=<function IoU.<lambda>>)¶

Bases: datasetinsights.evaluation_metrics.base.EvaluationMetric

Intersection over Union (IoU) metric per class

The metric is defined for a pair of grayscale semantic segmentation images.

Parameters

num_classes – number of calsses in the ground truth image
output_transform – function that transform output pair of images

cm¶

pytorch ignite confusion matrix

Type: ignite.metrics.ConfusionMatrix

object.

compute()¶

reset()¶

update(output)¶

class datasetinsights.evaluation_metrics.MeanAveragePrecisionAverageOverIOU¶

Bases: datasetinsights.evaluation_metrics.base.EvaluationMetric

2D Bounding Box Mean Average Precision metrics.

This implementation computes Mean Average Precision (mAP) metric, which is implemented as the Average Precision average over all labels and IOU = 0.5:0.95:0.05. The max detections per image is limited to 100.

\[mAP^{IoU=0.5:0.95:0.05} = mean_{label,IoU}\]

\[AP^{label, IoU=0.5:0.95:0.05}\]

IOU_THRESHOULDS = array([0.5 , 0.55, 0.6 , 0.65, 0.7 , 0.75, 0.8 , 0.85, 0.9 , 0.95])¶

TYPE = 'scalar'¶

compute()¶: Compute mAP over IOU.

reset()¶

update(mini_batch)¶

class datasetinsights.evaluation_metrics.MeanAveragePrecisionIOU50¶

Bases: datasetinsights.evaluation_metrics.base.EvaluationMetric

2D Bounding Box Mean Average Precision metrics at IOU=50%.

This implementation would calculate mAP at IOU=50%.

\[mAP^{IoU=50} = mean_{label}AP^{label, IoU=50}\]

TYPE = 'scalar'¶

compute()¶

reset()¶

update(mini_batch)¶

class datasetinsights.evaluation_metrics.MeanAverageRecallAverageOverIOU¶

Bases: datasetinsights.evaluation_metrics.base.EvaluationMetric

2D Bounding Box Mean Average Recall metrics.

This implementation computes Mean Average Recall (mAR) metric, which is implemented as the Average Recall average over all labels and IOU = 0.5:0.95:0.05. The max detections per image is limited to 100.

\[mAR^{IoU=0.5:0.95:0.05} = mean_{label,IoU}\]

\[AR^{label, IoU=0.5:0.95:0.05}\]

IOU_THRESHOULDS = array([0.5 , 0.55, 0.6 , 0.65, 0.7 , 0.75, 0.8 , 0.85, 0.9 , 0.95])¶

TYPE = 'scalar'¶

compute()¶: Compute mAR over IOU.

reset()¶

update(mini_batch)¶

class datasetinsights.evaluation_metrics.RootMeanSquareError¶

Bases: datasetinsights.evaluation_metrics.base.EvaluationMetric

Root Mean Square Error metric.

The metric is defined for grayscale depth images.

sum_of_root_mean_square_error¶

the sum of RMSE

Type: float

for all the images in a branch

num_samples¶

the number of samples in all mini-batches

Type: int

compute()¶

reset()¶

update(output)¶

class datasetinsights.evaluation_metrics.ThresholdAccuracy(threshold)¶

Bases: datasetinsights.evaluation_metrics.base.EvaluationMetric

Threshold accuracy metric.

The metric is defined for grayscale depth images.

sum_of_threshold_acc¶

the sum of threshold accuracies for all the images in a branch

Type: int

num_samples¶

the number of samples in all mini-batches

Type: int

compute()¶

reset()¶

update(output)¶