datasetinsights.estimators¶
datasetinsights.estimators.base¶
-
class
datasetinsights.estimators.base.
Estimator
¶ Bases:
object
Abstract base class for estimator.
An estimator is the master class of all modeling operations. At minimum, it includes:
1. input data and output data transformations (e.g. input image cropping, remove unused output labels…) when applicable. 2. neural network graph (model) for either pytorch or tensorflow. 3. procedures to execute model training and evaluation.
One estimator could support multiple tasks (e.g. Mask R-CNN can be used for semantic segmentation and object detection)
-
abstract
evaluate
(**kwargs)¶ Abstract method to evaluate estimators
-
abstract
train
(**kwargs)¶ Abstract method to train estimators
-
abstract
-
datasetinsights.estimators.base.
create_estimator
(name, config, *, tb_log_dir=None, no_cuda=None, checkpoint_dir=None, kfp_log_dir='/home/docs/checkouts/readthedocs.org/user_builds/datasetinsights/checkouts/0.2.6/kfp/20210226-011335', kfp_metrics_filename='mlpipeline-metrics.json', kfp_ui_metadata_filename='mlpipeline-ui-metadata.json', no_val=None, **kwargs)¶ Create a new instance of the estimators subclass
- Parameters
name (str) – unique identifier for a estimators subclass
config (dict) – parameters specific to each estimators subclass used to create a estimators instance
- Returns
an instance of the specified estimators subclass
datasetinsights.estimators.deeplab¶
-
class
datasetinsights.estimators.deeplab.
DeeplabV3
(*, config, writer, checkpointer, device, checkpoint_file=None, **kwargs)¶ Bases:
datasetinsights.estimators.base.Estimator
DeeplabV3 Model https://arxiv.org/abs/1706.05587
- Parameters
config (CfgNode) – estimator config
writer – Tensorboard writer object
checkpointer – Model checkpointer callback to save models
device – model training on device (cpu|cuda)
-
backbone
¶ model backbone (resnet50|resnet101)
-
num_classes
¶ number of classes for semantic segmentation
-
model
¶ tensorflow or pytorch graph
-
writer
¶ Tensorboard writer object
-
checkpointer
¶ Model checkpointer callback to save models
-
device
¶ model training on device (cpu|cuda)
-
optimizer
¶ pytorch optimizer
-
lr_scheduler
¶ pytorch learning rate scheduler
-
evaluate
(**kwargs)¶ Abstract method to evaluate estimators
-
load
(path)¶ Load Estimator from path
- Parameters
path (str) – full path to the serialized estimator
-
save
(path)¶ Serialize Estimator to path
- Parameters
path (str) – full path to save serialized estimator
- Returns
saved full path of the serialized estimator
-
train
(**kwargs)¶ Abstract method to train estimators
-
class
datasetinsights.estimators.deeplab.
Normalize
(mean, std)¶ Bases:
object
-
class
datasetinsights.estimators.deeplab.
RandomCrop
(size)¶ Bases:
object
-
class
datasetinsights.estimators.deeplab.
ToTensor
¶ Bases:
object
Convert a pair of (image, target) to tensor
-
datasetinsights.estimators.deeplab.
pad_if_smaller
(img, size, fill=0)¶
datasetinsights.estimators.densedepth¶
-
class
datasetinsights.estimators.densedepth.
Decoder
(num_features=2208, decoder_width=0.5)¶ Bases:
torch.nn.modules.module.Module
-
forward
(features)¶ Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
-
-
class
datasetinsights.estimators.densedepth.
DenseDepth
(config, writer, checkpointer, device, checkpoint_file, **kwargs)¶ Bases:
datasetinsights.estimators.base.Estimator
-
config
¶ estimator config
-
writer
¶ Tensorboard writer object
-
model
¶ tensorflow or pytorch graph
-
checkpointer
¶ Model checkpointer callback to save models
-
device
¶ model training on device (cpu|cuda)
-
optimizer
¶ pytorch optimizer
-
evaluate
(**kwargs)¶ Abstract method to evaluate estimators
-
load
(path)¶ Load Estimator from path
- Parameters
path (str) – full path to the serialized estimator
-
save
(path)¶ Serialize Estimator to path
- Parameters
path (str) – full path to save serialized estimator
- Returns
saved full path of the serialized estimator
-
train
(**kwargs)¶ Abstract method to train estimators
-
-
class
datasetinsights.estimators.densedepth.
DenseDepthModel
¶ Bases:
torch.nn.modules.module.Module
-
forward
(x)¶ Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
-
-
class
datasetinsights.estimators.densedepth.
Encoder
¶ Bases:
torch.nn.modules.module.Module
-
forward
(x)¶ Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
-
-
class
datasetinsights.estimators.densedepth.
RandomChannelSwap
(probability)¶ Bases:
object
Swap color channel of the image
- Parameters
probability – the probability to swap color channel of the image
-
class
datasetinsights.estimators.densedepth.
ToTensor
¶ Bases:
object
Convert the image and depth to tensor.
-
class
datasetinsights.estimators.densedepth.
UpSample
(skip_input, output_features)¶ Bases:
torch.nn.modules.container.Sequential
-
forward
(x, concat_with)¶ Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
-
datasetinsights.estimators.faster_rcnn¶
faster rcnn pytorch train and evaluate.
-
exception
datasetinsights.estimators.faster_rcnn.
BadLoss
¶ Bases:
Exception
pass the exception.
-
class
datasetinsights.estimators.faster_rcnn.
BoxListToTensor
¶ Bases:
object
transform to bboxes to Tensor.
-
class
datasetinsights.estimators.faster_rcnn.
FasterRCNN
(*, config, logdir, kfp_writer, checkpointer, box_score_thresh=0.05, no_cuda=None, checkpoint_file=None, **kwargs)¶ Bases:
datasetinsights.estimators.base.Estimator
Faster-RCNN train/evaluate implementation for object detection.
https://github.com/pytorch/vision/tree/master/references/detection https://arxiv.org/abs/1506.01497 :param config: estimator config :type config: CfgNode :param box_score_thresh: (optional) default threshold is 0.05 :param distributed: whether or not the estimator is distributed :param kfp_metrics_filename: Kubeflow Metrics filename :param kfp_metrics_dir: Path to the directory where Kubeflow
metrics files are stored
https://github.com/pytorch/vision/tree/master/references/detection https://arxiv.org/abs/1506.01497
-
model
¶ pytorch model
-
writer
¶ Tensorboard writer object
-
kfp_writer
¶ KubeflowPipelineWriter object
-
checkpointer
¶ Model checkpointer callback to save models
-
device
¶ model training on device (cpu|cuda)
-
static
collate_fn
(batch)¶ Prepare batch to be format Faster RCNN expects.
- Parameters
batch – mini batch of the form ((x0,x1,x2…xn),(y0,y1,y2…yn))
- Returns
mini batch in the form [(x0,y0), (x1,y2), (x2,y2)… (xn,yn)]
-
static
create_optimizer_lrs
(config, params)¶ create optimizer and learning rate scheduler.
- Parameters
config – (CfgNode): estimator config:
params – model parameters
- Returns
pytorch optimizer lr_scheduler: pytorch LR scheduler
- Return type
optimizer
-
static
create_sampler
(is_distributed, *, dataset, is_train)¶ create sample of data.
- Parameters
is_distributed – whether or not the model is distributed
dataset – dataset obj must have len and __get_item__
is_train – whether or not the sampler is for training data
- Returns
(torch.utils.data.Sampler)
- Return type
data_sampler
-
evaluate
(test_data, **kwargs)¶ evaluate given dataset.
-
evaluate_per_epoch
(*, data_loader, epoch, label_mappings, max_detections_per_img=100, synchronize_metrics=True)¶ Evaluate model performance per epoch.
Note, torchvision’s implementation of faster rcnn requires input and gt data for training mode and returns a dictionary of losses (which we need to record the loss). We also need to get the raw predictions, which is only possible in model.eval() mode, to calculate the evaluation metric. :param data_loader: pytorch dataloader :type data_loader: DataLoader :param epoch: current epoch, used for logging :type epoch: int :param label_mappings: a dict of {label_id: label_name} mapping :type label_mappings: dict :param max_detections_per_img: max number of targets or predictions allowed :param per example: :param is_distributed: whether or not the model is distributed :param synchronize_metrics: whether or not to synchronize evaluation :param metrics across processes:
Returns:
-
static
get_transform
()¶ transform bounding box and tesnor.
-
load
(path)¶ Load Estimator from path.
- Parameters
path (str) – full path to the serialized estimator
-
log_metric_val
(label_mappings, epoch)¶ log metric values.
- Parameters
label_mappings (dict) – a dict of {label_id: label_name} mapping
epoch (int) – current epoch, used for logging
-
predict
(pil_img, box_score_thresh=0.5)¶ Get prediction from one image using loaded model.
- Parameters
pil_img (PIL Image) – PIL image from dataset.
box_score_thresh (float) – box score threshold for filter out lower
bounding boxes. Defaults to 0.5. (score) –
- Returns
high predicted score bboxes from the model.
- Return type
filtered_pred_annotations (List[BBox2D])
-
save
(path)¶ Serialize Estimator to path.
- Parameters
path (str) – full path to save serialized estimator
- Returns
saved full path of the serialized estimator
-
train
(train_data, val_data=None, **kwargs)¶ start training, save trained model per epoch.
- Parameters
train_data – Directory on localhost where train dataset is located.
val_data – Directory on localhost where
dataset is located. (validation) –
-
train_loop
(*, train_dataloader, label_mappings, val_dataloader, train_sampler=None)¶ train on whole range of epochs.
- Parameters
train_dataloader (torch.utils.data.DataLoader) –
label_mappings (dict) – a dict of {label_id: label_name} mapping
val_dataloader (torch.utils.data.DataLoader) –
train_sampler – (torch.utils.data.Sampler)
-
train_one_epoch
(*, optimizer, data_loader, epoch, lr_scheduler, accumulation_steps)¶ train per epoch.
- Parameters
optimizer – pytorch optimizer
data_loader (DataLoader) – pytorch dataloader
epoch (int) – lr_scheduler: Pytorch LR scheduler
lr_scheduler – Pytorch LR scheduler
accumulation_steps (int) – Accumulated Gradients are only updated
X steps. This creates an effective batch size of (after) –
* accumulation_steps (batch_size) –
-
-
class
datasetinsights.estimators.faster_rcnn.
Loss
¶ Bases:
object
Record Loss during epoch.
-
compute
()¶ compute avg loss.
Returns (float): avg. loss
-
reset
()¶ reset loss.
-
update
(avg_loss, batch_size)¶ update loss.
-
-
class
datasetinsights.estimators.faster_rcnn.
ToTensor
¶ Bases:
object
transform to tesnor.
-
datasetinsights.estimators.faster_rcnn.
canonical2list
(bbox: datasetinsights.io.bbox.BBox2D)¶ convert a BBox2d into a single list.
- Parameters
bbox –
- Returns
attribute list of BBox2D
-
datasetinsights.estimators.faster_rcnn.
convert_bboxes2canonical
(bboxes)¶ convert bounding boxes to canonical.
convert bounding boxes from the format used by pytorch torchvision’s faster rcnn model into our canonical format, a list of list of BBox2Ds. Faster RCNN format: https://github.com/pytorch/vision/blob/master/torchvision/models/ detection/faster_rcnn.py#L45 :param bboxes: A list of dictionaries. Each :type bboxes: List[Dict[str, torch.Tensor() :param item in the list corresponds to the bounding boxes for one example.: :param The dictionary must have the keys ‘boxes’ and ‘labels’. The value for: :param ‘boxes’ is: the ground-truth boxes in :type ‘boxes’ is:
FloatTensor[N, 4]
:param[x1: :param y1: :param x2: :param y2]
format: :param with values between0
andH
and: :param0
andW
. The value for labels is: the :type0
andW
. The value for labels is:Int64Tensor[N]
:param class label for each ground-truth box. If the dictionary has the key: :param scores then these values are used for the confidence score of the: :param BBox2D: :param otherwise the score is set to 1.:- Returns (list[List[BBox2D]]):
Each element in the list corresponds to the list of bounding boxes for an example.
-
datasetinsights.estimators.faster_rcnn.
create_dataloader
(distributed, dataset, sampler, train, *, batch_size=1, num_workers=0, collate_fn=None)¶ load dataset and create dataloader.
- Parameters
distributed – wether or not the dataloader is distributed
dataset – dataset obj must have len and __get_item__
sampler – (torch.utils.data.Sampler)
train – whether or not the sampler is for training data
batch_size – batch_size
num_workers – num_workers
collate_fn – Prepare batch to be format Faster RCNN expects
- Returns data_loader:
torch.utils.data.DataLoader
-
datasetinsights.estimators.faster_rcnn.
create_dataset
(config, data_path, split)¶ download dataset from source.
- Parameters
config – (CfgNode): estimator config:
data_path – Directory on localhost where datasets are located.
split – train, val, test
Returns dataset: dataset obj must have len and __get_item__
-
datasetinsights.estimators.faster_rcnn.
dataloader_creator
(config, dataset, sampler, split, distributed)¶ initiate data loading.
- Parameters
config – (CfgNode): estimator config:
dataset – dataset obj must have len and __get_item__
sampler – (torch.utils.data.Sampler)
split – train, val, test
distributed –
- Returns data_loader:
torch.utils.data.DataLoader
-
datasetinsights.estimators.faster_rcnn.
gather_gt_preds
(*, gt_preds, device, max_boxes=100)¶ gather list of prediction.
- Parameters
gt_preds (list(tuple(list(BBox2d), (Bbox2d)))) – A list of tuples where
first element in each tuple is a list of bounding boxes (the) –
to the targets in an example (corresponding) –
the second element (and) –
the tuple corresponds to the predictions in that example (in) –
device –
max_boxes – the maximum number of boxes allowed for either targets or
per image (predictions) –
Returns (list(tuple(list(BBox2d), (Bbox2d)))): a list in the same format as gt_preds but containing all the information across processes e.g. if rank 0 has gt_preds = [([box_0], []), ([box_1], [box_2, box_2.5])] and rank 1 has gt_preds [([], [box_3]), ([box_4, box_5], [])] then this function will return [([box_0], []), ([box_1], [box_2, box_2.5]), ([], [box_3]), ([box_4, box_5], [])] the returned list is consistent across all processes
-
datasetinsights.estimators.faster_rcnn.
list2canonical
(box_list)¶ convert a list into a Bbox2d.
- Parameters
box_list – box represented in list format
- Returns
BBox2d
-
datasetinsights.estimators.faster_rcnn.
list3d_2canonical
(batch)¶ convert 3d list to canonical.
convert a list of list of padded targets and predictions per examples where bounding boxes are represented by lists into the same format except the boxes are represented by the BBox2d class and the padded boxes are removed. :param batch: [[[gt], [prds]], [[gt], [prds]], ] where gt and prds are list :param of lists:
- Returns
[([gt],[preds]), ([gt],[preds])… where gt and preds are lists of BBox2ds
-
datasetinsights.estimators.faster_rcnn.
metric_per_class_plot
(metric_name, data, label_mappings, figsize=20, 10)¶ Bar plot for metric per class.
- Parameters
metric_name (str) – metric name.
data (dict) – a dictionary of metric per label.
label_mappings (dict) – a dict of {label_id: label_name} mapping
figsize (tuple) – figure size of the plot. Default is (20, 10)
- Returns (matplotlib.pyplot.figure):
a bar plot for metric per class.
-
datasetinsights.estimators.faster_rcnn.
pad_box_lists
(gt_preds: List[Tuple[List[datasetinsights.io.bbox.BBox2D], List[datasetinsights.io.bbox.BBox2D]]], max_boxes_per_img=100)¶ Pad the list of boxes.
Pad the list of boxes and targets with place holder boxes so that all targets and predictions have the same number of elements. :param gt_preds: A list of tuples where :type gt_preds: list(tuple(list(BBox2d), (Bbox2d))) :param the first element in each tuple is a list of bounding boxes: :param corresponding to the targets in an example: :param and the second element: :param in the tuple corresponds to the predictions in that example: :param max_boxes_per_img: : maximum number of target boxes and predicted boxes
per image
Returns: same format as gt_preds but all examples will have the same number of targets and predictions. If there are fewer targets or predictions than max_boxes_per_img, then boxes with nan values are added.
-
datasetinsights.estimators.faster_rcnn.
prepare_bboxes
(bboxes: List[datasetinsights.io.bbox.BBox2D]) → Dict[str, torch.Tensor]¶ Prepare bounding boxes for model training.
- Parameters
bboxes – mini batch of bounding boxes (not including images).
example is a list of bounding boxes. Torchvision's implementation (Each) –
Faster-RCNN requires bounding boxes to be in the format of a (of) –
{'labels' (dictionary) – [label ids, …], and
'boxes' – [[xleft, ytop, xright, ybottom of box1],
[xleft –
ytop –
xright –
of box2]..] (ybottom) –
- Returns
bounding boxes in the form that Faster RCNN expects
-
datasetinsights.estimators.faster_rcnn.
reduce_dict
(input_dict, average=True)¶ Reduce the values in dictionary.
Reduce the values in the dictionary from all processes so that all processes have the averaged results. :param input_dict: all the values will be reduced :type input_dict: dict :param average: whether to do average or sum :type average: bool
- Returns
dict with the same fields as input_dict, after reduction.
-
datasetinsights.estimators.faster_rcnn.
tensorlist2canonical
(tensor_list)¶ convert tensorlist to canonical.
Converts the gt and predictions into the canonical format and removes the boxes with nan values that were added for padding. :param tensor_list: [tensor([[gt, prds]), tensor([gt, prds])], …]
- Returns (list(tuple(list(BBox2d), (Bbox2d)))): A list of tuples where
the first element in each tuple is a list of bounding boxes corresponding to the targets in an example, and the second element in the tuple corresponds to the predictions in that example
-
class
datasetinsights.estimators.
DeeplabV3
(*, config, writer, checkpointer, device, checkpoint_file=None, **kwargs)¶ Bases:
datasetinsights.estimators.base.Estimator
DeeplabV3 Model https://arxiv.org/abs/1706.05587
- Parameters
config (CfgNode) – estimator config
writer – Tensorboard writer object
checkpointer – Model checkpointer callback to save models
device – model training on device (cpu|cuda)
-
backbone
¶ model backbone (resnet50|resnet101)
-
num_classes
¶ number of classes for semantic segmentation
-
model
¶ tensorflow or pytorch graph
-
writer
¶ Tensorboard writer object
-
checkpointer
¶ Model checkpointer callback to save models
-
device
¶ model training on device (cpu|cuda)
-
optimizer
¶ pytorch optimizer
-
lr_scheduler
¶ pytorch learning rate scheduler
-
evaluate
(**kwargs)¶ Abstract method to evaluate estimators
-
load
(path)¶ Load Estimator from path
- Parameters
path (str) – full path to the serialized estimator
-
save
(path)¶ Serialize Estimator to path
- Parameters
path (str) – full path to save serialized estimator
- Returns
saved full path of the serialized estimator
-
train
(**kwargs)¶ Abstract method to train estimators
-
class
datasetinsights.estimators.
DenseDepth
(config, writer, checkpointer, device, checkpoint_file, **kwargs)¶ Bases:
datasetinsights.estimators.base.Estimator
-
config
¶ estimator config
-
writer
¶ Tensorboard writer object
-
model
¶ tensorflow or pytorch graph
-
checkpointer
¶ Model checkpointer callback to save models
-
device
¶ model training on device (cpu|cuda)
-
optimizer
¶ pytorch optimizer
-
evaluate
(**kwargs)¶ Abstract method to evaluate estimators
-
load
(path)¶ Load Estimator from path
- Parameters
path (str) – full path to the serialized estimator
-
save
(path)¶ Serialize Estimator to path
- Parameters
path (str) – full path to save serialized estimator
- Returns
saved full path of the serialized estimator
-
train
(**kwargs)¶ Abstract method to train estimators
-
-
class
datasetinsights.estimators.
Estimator
¶ Bases:
object
Abstract base class for estimator.
An estimator is the master class of all modeling operations. At minimum, it includes:
1. input data and output data transformations (e.g. input image cropping, remove unused output labels…) when applicable. 2. neural network graph (model) for either pytorch or tensorflow. 3. procedures to execute model training and evaluation.
One estimator could support multiple tasks (e.g. Mask R-CNN can be used for semantic segmentation and object detection)
-
abstract
evaluate
(**kwargs)¶ Abstract method to evaluate estimators
-
abstract
train
(**kwargs)¶ Abstract method to train estimators
-
abstract
-
class
datasetinsights.estimators.
FasterRCNN
(*, config, logdir, kfp_writer, checkpointer, box_score_thresh=0.05, no_cuda=None, checkpoint_file=None, **kwargs)¶ Bases:
datasetinsights.estimators.base.Estimator
Faster-RCNN train/evaluate implementation for object detection.
https://github.com/pytorch/vision/tree/master/references/detection https://arxiv.org/abs/1506.01497 :param config: estimator config :type config: CfgNode :param box_score_thresh: (optional) default threshold is 0.05 :param distributed: whether or not the estimator is distributed :param kfp_metrics_filename: Kubeflow Metrics filename :param kfp_metrics_dir: Path to the directory where Kubeflow
metrics files are stored
https://github.com/pytorch/vision/tree/master/references/detection https://arxiv.org/abs/1506.01497
-
model
¶ pytorch model
-
writer
¶ Tensorboard writer object
-
kfp_writer
¶ KubeflowPipelineWriter object
-
checkpointer
¶ Model checkpointer callback to save models
-
device
¶ model training on device (cpu|cuda)
-
static
collate_fn
(batch)¶ Prepare batch to be format Faster RCNN expects.
- Parameters
batch – mini batch of the form ((x0,x1,x2…xn),(y0,y1,y2…yn))
- Returns
mini batch in the form [(x0,y0), (x1,y2), (x2,y2)… (xn,yn)]
-
static
create_optimizer_lrs
(config, params)¶ create optimizer and learning rate scheduler.
- Parameters
config – (CfgNode): estimator config:
params – model parameters
- Returns
pytorch optimizer lr_scheduler: pytorch LR scheduler
- Return type
optimizer
-
static
create_sampler
(is_distributed, *, dataset, is_train)¶ create sample of data.
- Parameters
is_distributed – whether or not the model is distributed
dataset – dataset obj must have len and __get_item__
is_train – whether or not the sampler is for training data
- Returns
(torch.utils.data.Sampler)
- Return type
data_sampler
-
evaluate
(test_data, **kwargs)¶ evaluate given dataset.
-
evaluate_per_epoch
(*, data_loader, epoch, label_mappings, max_detections_per_img=100, synchronize_metrics=True)¶ Evaluate model performance per epoch.
Note, torchvision’s implementation of faster rcnn requires input and gt data for training mode and returns a dictionary of losses (which we need to record the loss). We also need to get the raw predictions, which is only possible in model.eval() mode, to calculate the evaluation metric. :param data_loader: pytorch dataloader :type data_loader: DataLoader :param epoch: current epoch, used for logging :type epoch: int :param label_mappings: a dict of {label_id: label_name} mapping :type label_mappings: dict :param max_detections_per_img: max number of targets or predictions allowed :param per example: :param is_distributed: whether or not the model is distributed :param synchronize_metrics: whether or not to synchronize evaluation :param metrics across processes:
Returns:
-
static
get_transform
()¶ transform bounding box and tesnor.
-
load
(path)¶ Load Estimator from path.
- Parameters
path (str) – full path to the serialized estimator
-
log_metric_val
(label_mappings, epoch)¶ log metric values.
- Parameters
label_mappings (dict) – a dict of {label_id: label_name} mapping
epoch (int) – current epoch, used for logging
-
predict
(pil_img, box_score_thresh=0.5)¶ Get prediction from one image using loaded model.
- Parameters
pil_img (PIL Image) – PIL image from dataset.
box_score_thresh (float) – box score threshold for filter out lower
bounding boxes. Defaults to 0.5. (score) –
- Returns
high predicted score bboxes from the model.
- Return type
filtered_pred_annotations (List[BBox2D])
-
save
(path)¶ Serialize Estimator to path.
- Parameters
path (str) – full path to save serialized estimator
- Returns
saved full path of the serialized estimator
-
train
(train_data, val_data=None, **kwargs)¶ start training, save trained model per epoch.
- Parameters
train_data – Directory on localhost where train dataset is located.
val_data – Directory on localhost where
dataset is located. (validation) –
-
train_loop
(*, train_dataloader, label_mappings, val_dataloader, train_sampler=None)¶ train on whole range of epochs.
- Parameters
train_dataloader (torch.utils.data.DataLoader) –
label_mappings (dict) – a dict of {label_id: label_name} mapping
val_dataloader (torch.utils.data.DataLoader) –
train_sampler – (torch.utils.data.Sampler)
-
train_one_epoch
(*, optimizer, data_loader, epoch, lr_scheduler, accumulation_steps)¶ train per epoch.
- Parameters
optimizer – pytorch optimizer
data_loader (DataLoader) – pytorch dataloader
epoch (int) – lr_scheduler: Pytorch LR scheduler
lr_scheduler – Pytorch LR scheduler
accumulation_steps (int) – Accumulated Gradients are only updated
X steps. This creates an effective batch size of (after) –
* accumulation_steps (batch_size) –
-
-
datasetinsights.estimators.
convert_bboxes2canonical
(bboxes)¶ convert bounding boxes to canonical.
convert bounding boxes from the format used by pytorch torchvision’s faster rcnn model into our canonical format, a list of list of BBox2Ds. Faster RCNN format: https://github.com/pytorch/vision/blob/master/torchvision/models/ detection/faster_rcnn.py#L45 :param bboxes: A list of dictionaries. Each :type bboxes: List[Dict[str, torch.Tensor() :param item in the list corresponds to the bounding boxes for one example.: :param The dictionary must have the keys ‘boxes’ and ‘labels’. The value for: :param ‘boxes’ is: the ground-truth boxes in :type ‘boxes’ is:
FloatTensor[N, 4]
:param[x1: :param y1: :param x2: :param y2]
format: :param with values between0
andH
and: :param0
andW
. The value for labels is: the :type0
andW
. The value for labels is:Int64Tensor[N]
:param class label for each ground-truth box. If the dictionary has the key: :param scores then these values are used for the confidence score of the: :param BBox2D: :param otherwise the score is set to 1.:- Returns (list[List[BBox2D]]):
Each element in the list corresponds to the list of bounding boxes for an example.
-
datasetinsights.estimators.
create_estimator
(name, config, *, tb_log_dir=None, no_cuda=None, checkpoint_dir=None, kfp_log_dir='/home/docs/checkouts/readthedocs.org/user_builds/datasetinsights/checkouts/0.2.6/kfp/20210226-011335', kfp_metrics_filename='mlpipeline-metrics.json', kfp_ui_metadata_filename='mlpipeline-ui-metadata.json', no_val=None, **kwargs)¶ Create a new instance of the estimators subclass
- Parameters
name (str) – unique identifier for a estimators subclass
config (dict) – parameters specific to each estimators subclass used to create a estimators instance
- Returns
an instance of the specified estimators subclass