Dataset Insights

Unity Dataset Insights is a python package for downloading, parsing and analyzing synthetic datasets generated using the Unity Perception SDK.

Installation

Dataset Insights maintains a pip package for easy installation. It can work in any standard Python environment using pip install datasetinsights command. We support Python 3 (3.7 and 3.8).

Getting Started

Dataset Statistics

We provide a sample notebook to help you load synthetic datasets generated using Perception package and visualize dataset statistics. We plan to support other sample Unity projects in the future.

Dataset Download

You can download the datasets from HTTP(s), GCS, and Unity simulation projects using the download command from CLI or API.

CLI

datasetinsights download \
   --source-uri=<xxx> \
   --output=$HOME/data

API

GCSDatasetDownloader downloads a dataset from GCS location.

from datasetinsights.io.downloader import GCSDatasetDownloader

source_uri=gs://url/to/file.zip or gs://url/to/folder
dest = "~/data"
downloader = GCSDatasetDownloader()
downloader.download(source_uri=source_uri, output=data_root)

HTTPDatasetDownloader downloads a dataset from any HTTP(S) location.

from datasetinsights.io.downloader import HTTPDatasetDownloader

source_uri=http://url.to.file.zip
dest = "~/data"
downloader = HTTPDatasetDownloader()
downloader.download(source_uri=source_uri, output=data_root)

Dataset Explore

You can explore the dataset schema by using following API:

Unity Perception

AnnotationDefinitions and MetricDefinitions loads synthetic dataset definition tables and return a dictionary containing the definitions.

from datasetinsights.datasets.unity_perception import AnnotationDefinitions,
MetricDefinitions
annotation_def = AnnotationDefinitions(data_root=dest, version="my_schema_version")
definition_dict = annotation_def.get_definition(def_id="my_definition_id")

metric_def = MetricDefinitions(data_root=dest, version="my_schema_version")
definition_dict = metric_def.get_definition(def_id="my_definition_id")

Captures loads synthetic dataset captures tables and return a pandas dataframe with captures and annotations columns.

from datasetinsights.datasets.unity_perception import Captures
captures = Captures(data_root=dest, version="my_schema_version")
captures_df = captures.filter(def_id="my_definition_id")

Metrics loads synthetic dataset metrics table which holds extra metadata that can be used to describe a particular sequence, capture or annotation and return a pandas dataframe with captures and metrics columns.

from datasetinsights.datasets.unity_perception import Metrics
metrics = Metrics(data_root=dest, version="my_schema_version")
metrics_df = metrics.filter_metrics(def_id="my_definition_id")

Indices and tables

Citation

If you find this package useful, consider citing it using:

@misc{datasetinsights2020,
    title={Unity {D}ataset {I}nsights Package},
    author={{Unity Technologies}},
    howpublished={\url{https://github.com/Unity-Technologies/datasetinsights}},
    year={2020}
}