datasetinsights.io¶
datasetinsights.io.bbox¶
-
class
datasetinsights.io.bbox.
BBox2D
(label, x, y, w, h, score=1.0)¶ Bases:
object
Canonical Representation of a 2D bounding box.
-
label
¶ string representation of the label.
- Type
str
-
x
¶ x pixel coordinate of the upper left corner.
- Type
float
-
y
¶ y pixel coordinate of the upper left corner.
- Type
float
-
w
¶ width (number of pixels)of the bounding box.
- Type
float
-
h
¶ height (number of pixels) of the bounding box.
- Type
float
-
score
¶ detection confidence score. Default is set to score=1. if this is a ground truth bounding box.
- Type
float
Examples
Here is an example about how to use this class.
>>> gt_bbox = BBox2D(label='car', x=2, y=6, w=2, h=4) >>> gt_bbox "label='car'|score=1.0|x=2.0|y=6.0|w=2.0|h=4.0" >>> pred_bbox = BBox2D(label='car', x=2, y=5, w=2, h=4, score=0.79) >>> pred_bbox.area 8 >>> pred_bbox.intersect_with(gt_bbox) True >>> pred_bbox.intersection(gt_bbox) 6 >>> pred_bbox.union(gt_bbox) 10 >>> pred_bbox.iou(gt_bbox) 0.6
-
property
area
¶ Calculate area of this bounding box
- Returns
width x height of the bound box
-
intersect_with
(other)¶ Check whether this box intersects with other bounding box
- Parameters
other (BBox2D) – other bounding box object to check intersection
- Returns
True if two bounding boxes intersect, False otherwise
-
intersection
(other)¶ Calculate the intersection area with other bounding box
- Parameters
other (BBox2D) – other bounding box object to calculate intersection
- Returns
float of the intersection area for two bounding boxes
-
-
class
datasetinsights.io.bbox.
BBox3D
(translation, size, label, sample_token, score=1, rotation: pyquaternion.quaternion.Quaternion = Quaternion(1.0, 0.0, 0.0, 0.0), velocity=nan, nan, nan)¶ Bases:
object
Class for 3d bounding boxes which can either be predictions or ground-truths. This class is the primary representation in this repo of 3d bounding boxes and is based off of the Nuscenes style dataset.
-
property
back_left_bottom_pt
¶ Back-left-bottom point.
- Type
Returns
- Type
float
-
property
back_left_top_pt
¶ Back-left-top point.
- Type
float
-
property
back_right_bottom_pt
¶ Back-right-bottom point.
- Type
float
-
property
back_right_top_pt
¶ Back-right-top point.
- Type
float
-
property
front_left_bottom_pt
¶ Front-left-bottom point.
- Type
float
-
property
front_left_top_pt
¶ Front-left-top point.
- Type
float
-
property
front_right_bottom_pt
¶ Front-right-bottom point.
- Type
float
-
property
front_right_top_pt
¶ Front-right-top point.
- Type
float
-
property
p
¶ - list of all 8 corners of the box beginning with the the bottom
four corners and then the top
four corners, both in counterclockwise order (from birds eye view) beginning with the back-left corner
- Type
Returns
-
property
datasetinsights.io.checkpoint¶
Save estimator checkpoints
-
class
datasetinsights.io.checkpoint.
EstimatorCheckpoint
(estimator_name, checkpoint_dir, distributed)¶ Bases:
object
Saves and loads estimator checkpoints.
Assigns estimator checkpoint writer according to log_dir which is responsible for saving estimators. Writer can be a GCS or local writer. Assigns loader which is responsible for loading estimator from a given path. Loader can a local, GCS or HTTP loader.
- Parameters
estimator_name (str) – name of the estimator
checkpoint_dir (str) – Directory where checkpoints are stored
distributed (bool) – boolean to determine distributed training
-
checkpoint_dir
¶ Directory where checkpoints are stored
- Type
str
-
distributed
¶ boolean to determine distributed training
- Type
bool
-
load
(estimator, path)¶ Loads estimator from given path.
Path can be either a local path or GCS path or HTTP url.
- Parameters
estimator (datasetinsights.estimators.Estimator) –
estimator object (datasetinsights) –
path (str) – path of estimator
-
save
(estimator, epoch)¶ Save estimator to the log_dir.
- Parameters
estimator (datasetinsights.estimators.Estimator) –
estimator object. (datasetinsights) –
epoch (int) – Epoch number.
-
class
datasetinsights.io.checkpoint.
GCSEstimatorWriter
(cloud_path, prefix, *, suffix='estimator')¶ Bases:
object
Writes (saves) estimator checkpoints on GCS.
- Parameters
cloud_path (str) – GCS cloud path (e.g. gs://bucket/path/to/directoy)
prefix (str) – filename prefix of the checkpoint files
suffix (str) – filename suffix of the checkpoint files
-
save
(estimator, epoch=None)¶ Save estimator to checkpoint files on GCS.
- Parameters
estimator (datasetinsights.estimators.Estimator) – datasetinsights estimator object.
epoch (int) – the current epoch number. Default: None
- Returns
Full GCS cloud path to the saved checkpoint file.
-
class
datasetinsights.io.checkpoint.
LocalEstimatorWriter
(dirname, prefix, *, suffix='estimator', create_dir=True)¶ Bases:
object
Writes (saves) estimator checkpoints locally.
- Parameters
dirname (str) – Directory where estimator is to be saved.
prefix (str) – Filename prefix of the checkpoint files.
suffix (str) – Filename suffix of the checkpoint files.
create_dir (bool) – Flag for creating new directory. Default: True.
-
dirname
¶ directory name of where checkpoint files are stored
- Type
str
-
prefix
¶ filename prefix of the checkpoint files
- Type
str
-
suffix
¶ filename suffix of the checkpoint files
- Type
str
-
save
(estimator, epoch=None)¶ Save estimator to locally to log_dir.
- Parameters
estimator (datasetinsights.estimators.Estimator) – datasetinsights estimator object.
epoch (int) – The current epoch number. Default: None
- Returns
Full path to the saved checkpoint file.
-
datasetinsights.io.checkpoint.
load_from_gcs
(estimator, full_cloud_path)¶ Load estimator from checkpoint files on GCS.
- Parameters
estimator (datasetinsights.estimators.Estimator) – datasetinsights estimator object.
full_cloud_path – full path to the checkpoint file
-
datasetinsights.io.checkpoint.
load_from_http
(estimator, url)¶ Load estimator from checkpoint files on GCS.
- Parameters
estimator (datasetinsights.estimators.Estimator) – datasetinsights estimator object.
url – URL of the checkpoint file
-
datasetinsights.io.checkpoint.
load_local
(estimator, path)¶ Loads estimator checkpoints from a local path.
datasetinsights.io.download¶
-
class
datasetinsights.io.download.
TimeoutHTTPAdapter
(timeout, *args, **kwargs)¶ Bases:
requests.adapters.HTTPAdapter
-
send
(request, **kwargs)¶ Sends PreparedRequest object. Returns Response object.
- Parameters
request – The
PreparedRequest
being sent.stream – (optional) Whether to stream the request content.
timeout (float or tuple or urllib3 Timeout object) – (optional) How long to wait for the server to send data before giving up, as a float, or a (connect timeout, read timeout) tuple.
verify – (optional) Either a boolean, in which case it controls whether we verify the server’s TLS certificate, or a string, in which case it must be a path to a CA bundle to use
cert – (optional) Any user-provided SSL certificate to be trusted.
proxies – (optional) The proxies dictionary to apply to the request.
- Return type
requests.Response
-
-
datasetinsights.io.download.
checksum_matches
(filepath, expected_checksum, algorithm='CRC32')¶ Check if the checksum matches
- Parameters
filepath (str) – the doaloaded file path
expected_checksum (int) – expected checksum of the file
algorithm (str) – checksum algorithm. Defaults to CRC32
- Returns
True if the file checksum matches.
-
datasetinsights.io.download.
compute_checksum
(filepath, algorithm='CRC32')¶ Compute the checksum of a file.
- Parameters
filepath (str) – the doaloaded file path
algorithm (str) – checksum algorithm. Defaults to CRC32
- Returns
the checksum value
- Return type
int
-
datasetinsights.io.download.
download_file
(source_uri: str, dest_path: str, file_name: str = None)¶ Download a file specified from a source uri
- Parameters
source_uri (str) – source url where the file should be downloaded
dest_path (str) – destination path of the file
file_name (str) – file name of the file to be downloaded
- Returns
String of destination path.
-
datasetinsights.io.download.
get_checksum_from_file
(filepath)¶ This method return checksum of the file whose filepath is given.
- Parameters
filepath (str) – Path of the checksum file. Path can be HTTP(s) url or local path.
- Raises
ValueError – Raises this error if filepath is not local or not HTTP or HTTPS url.
-
datasetinsights.io.download.
validate_checksum
(filepath, expected_checksum, algorithm='CRC32')¶ Validate checksum of the downloaded file.
- Parameters
filepath (str) – the doaloaded file path
expected_checksum (int) – expected checksum of the file
algorithm (str) – checksum algorithm. Defaults to CRC32
- Raises
ChecksumError if the file checksum does not match. –
datasetinsights.io.exceptions¶
-
exception
datasetinsights.io.exceptions.
ChecksumError
¶ Bases:
Exception
Raises when the downloaded file checksum is not correct.
-
exception
datasetinsights.io.exceptions.
DownloadError
¶ Bases:
Exception
Raise when download file failed.
datasetinsights.io.gcs¶
-
class
datasetinsights.io.gcs.
GCSClient
(**kwargs)¶ Bases:
object
This class is used to download data from GCS location and perform function such as downloading the dataset and checksum validation.
-
GCS_PREFIX
= '^gs://'¶
-
KEY_SEPARATOR
= '/'¶
-
download
(*, url=None, local_path=None, bucket=None, key=None)¶ This method is used to download the dataset from GCS.
- Parameters
url (str) – This is the downloader-uri that indicates where the dataset should be downloaded from.
local_path (str) – This is the path to the directory where the download will store the dataset.
bucket (str) – gcs bucket name
key (str) – object key path
Examples –
>>> url = "gs://bucket/folder or gs://bucket/folder/data.zip" >>> local_path = "/tmp/folder" >>> bucket ="bucket" >>> key ="folder/data.zip" or "folder"
-
upload
(*, local_path=None, bucket=None, key=None, url=None, pattern='*')¶ Upload a file or list of files from directory to GCS
- Parameters
url (str) – This is the gcs location that indicates where
dataset should be uploaded. (the) –
local_path (str) – This is the path to the directory or file
the data is stored. (where) –
bucket (str) – gcs bucket name
key (str) – object key path
pattern – Unix glob patterns. Use **/* for recursive glob.
Examples –
- For file upload:
>>> url = "gs://bucket/folder/data.zip" >>> local_path = "/tmp/folder/data.zip" >>> bucket ="bucket" >>> key ="folder/data.zip"
- For directory upload:
>>> url = "gs://bucket/folder" >>> local_path = "/tmp/folder" >>> bucket ="bucket" >>> key ="folder" >>> key ="**/*"
-
datasetinsights.io.kfp_output¶
-
class
datasetinsights.io.kfp_output.
KubeflowPipelineWriter
(tb_log_dir='/home/docs/checkouts/readthedocs.org/user_builds/datasetinsights/checkouts/0.2.2/runs/20201027-200642', filename='mlpipeline-metrics.json', filepath='/')¶ Bases:
object
Serializes metrics dictionary genereated during model training/evaluation to JSON and store in a file.
- Parameters
filename (str) – Name of the file to which the writer will save metrics
filepath (str) – Path where the file will be stored
-
filename
¶ Name of the file to which the writer will save metrics
- Type
str
-
filepath
¶ Path where the file will be stored
- Type
str
-
data_dict
¶ A dictionary to save metrics name and value pairs
- Type
dict
-
data
¶ Dictionary to be JSON serialized
-
add_metric
(name, val)¶ Adds metric to the data dictionary of the writer
Note: Using same name key will overwrite the previous value as the current strategy is to save only the metrics generated in last epoch
- Parameters
name (str) – Name of the metric
val (float) – Value of the metric
-
create_tb_visualization_json
()¶
-
write_metric
()¶ Saves all the metrics added previously to a file in the format required by kubeflow
datasetinsights.io.loader¶
-
datasetinsights.io.loader.
create_loader
(dataset, *, dryrun=False, batch_size=1, num_workers=0, collate_fn=None)¶ Create data loader from dataset
Note: The data loader here is a pytorch data loader object which does not assume tensor_type to be pytorch tensor. We only require input dataset to support __getitem__ and __len__ mothod to iterate over items in the dataset.
Since collate_fn method in torch.utils.data.DataLoader behave differently when automatic batching is on, we might need to override this method. If create_loader method became too complicated in order to support different estimators, we might expect different estimator to have their own create_loader method.
https://pytorch.org/docs/stable/data.html#working-with-collate-fn
- Parameters
dataset (Dataset) – dataset object derived from datasetinsights.data.datasets.Dataset class.
dryrun (bool) – indicator whether to use a very small subset of the dataset. This subset is useful to make sure we can quickly run estimator without loading the whole dataset. (default: False)
batch_size (int) – how many samples per batch to load (default: 1)
num_workers (int) – number of parallel workers used for data loader. Set to 0 to run on a single thread (instead of 1 which might introduce overhead). (default: 0)
- Returns
torch.utils.data.DataLoader object as data loader
datasetinsights.io.transforms¶
-
class
datasetinsights.io.transforms.
Compose
(transforms)¶ Bases:
object
-
class
datasetinsights.io.transforms.
RandomHorizontalFlip
(flip_prob=0.5)¶ Bases:
object
Flip the image from top to bottom.
- Parameters
flip_prob – the probability to flip the image
-
class
datasetinsights.io.transforms.
Resize
(img_size=- 1, target_size=- 1)¶ Bases:
object
Resize the (image, target) to the given sizes.
- Parameters
img_size (tuple or int) – Desired output size. If size is a sequence like (h, w), output size will be matched to this. If size is an int, smaller edge of the image will be matched to this number. i.e, if height > width, then image will be rescaled to (size * height / width, size)
target_size (tuple or int) – Desired output size. If size is a sequence like (h, w), output size will be matched to this. If size is an int, smaller edge of the image will be matched to this number. i.e, if height > width, then image will be rescaled to (size * height / width, size)
-
class
datasetinsights.io.
BBox2D
(label, x, y, w, h, score=1.0)¶ Bases:
object
Canonical Representation of a 2D bounding box.
-
label
¶ string representation of the label.
- Type
str
-
x
¶ x pixel coordinate of the upper left corner.
- Type
float
-
y
¶ y pixel coordinate of the upper left corner.
- Type
float
-
w
¶ width (number of pixels)of the bounding box.
- Type
float
-
h
¶ height (number of pixels) of the bounding box.
- Type
float
-
score
¶ detection confidence score. Default is set to score=1. if this is a ground truth bounding box.
- Type
float
Examples
Here is an example about how to use this class.
>>> gt_bbox = BBox2D(label='car', x=2, y=6, w=2, h=4) >>> gt_bbox "label='car'|score=1.0|x=2.0|y=6.0|w=2.0|h=4.0" >>> pred_bbox = BBox2D(label='car', x=2, y=5, w=2, h=4, score=0.79) >>> pred_bbox.area 8 >>> pred_bbox.intersect_with(gt_bbox) True >>> pred_bbox.intersection(gt_bbox) 6 >>> pred_bbox.union(gt_bbox) 10 >>> pred_bbox.iou(gt_bbox) 0.6
-
property
area
¶ Calculate area of this bounding box
- Returns
width x height of the bound box
-
intersect_with
(other)¶ Check whether this box intersects with other bounding box
- Parameters
other (BBox2D) – other bounding box object to check intersection
- Returns
True if two bounding boxes intersect, False otherwise
-
intersection
(other)¶ Calculate the intersection area with other bounding box
- Parameters
other (BBox2D) – other bounding box object to calculate intersection
- Returns
float of the intersection area for two bounding boxes
-
-
class
datasetinsights.io.
EstimatorCheckpoint
(estimator_name, checkpoint_dir, distributed)¶ Bases:
object
Saves and loads estimator checkpoints.
Assigns estimator checkpoint writer according to log_dir which is responsible for saving estimators. Writer can be a GCS or local writer. Assigns loader which is responsible for loading estimator from a given path. Loader can a local, GCS or HTTP loader.
- Parameters
estimator_name (str) – name of the estimator
checkpoint_dir (str) – Directory where checkpoints are stored
distributed (bool) – boolean to determine distributed training
-
checkpoint_dir
¶ Directory where checkpoints are stored
- Type
str
-
distributed
¶ boolean to determine distributed training
- Type
bool
-
load
(estimator, path)¶ Loads estimator from given path.
Path can be either a local path or GCS path or HTTP url.
- Parameters
estimator (datasetinsights.estimators.Estimator) –
estimator object (datasetinsights) –
path (str) – path of estimator
-
save
(estimator, epoch)¶ Save estimator to the log_dir.
- Parameters
estimator (datasetinsights.estimators.Estimator) –
estimator object. (datasetinsights) –
epoch (int) – Epoch number.
-
class
datasetinsights.io.
KubeflowPipelineWriter
(tb_log_dir='/home/docs/checkouts/readthedocs.org/user_builds/datasetinsights/checkouts/0.2.2/runs/20201027-200642', filename='mlpipeline-metrics.json', filepath='/')¶ Bases:
object
Serializes metrics dictionary genereated during model training/evaluation to JSON and store in a file.
- Parameters
filename (str) – Name of the file to which the writer will save metrics
filepath (str) – Path where the file will be stored
-
filename
¶ Name of the file to which the writer will save metrics
- Type
str
-
filepath
¶ Path where the file will be stored
- Type
str
-
data_dict
¶ A dictionary to save metrics name and value pairs
- Type
dict
-
data
¶ Dictionary to be JSON serialized
-
add_metric
(name, val)¶ Adds metric to the data dictionary of the writer
Note: Using same name key will overwrite the previous value as the current strategy is to save only the metrics generated in last epoch
- Parameters
name (str) – Name of the metric
val (float) – Value of the metric
-
create_tb_visualization_json
()¶
-
write_metric
()¶ Saves all the metrics added previously to a file in the format required by kubeflow
-
datasetinsights.io.
create_downloader
(source_uri, **kwargs)¶ - This function instantiates the dataset downloader
after finding it with the source-uri provided
- Parameters
source_uri – URI used to look up the correct dataset downloader
**kwargs –
Returns: The dataset downloader instance matching the source-uri.