Dataset Insights¶
Unity Dataset Insights is a python package for downloading, parsing and analyzing synthetic datasets generated using the Unity Perception SDK.
Installation¶
Dataset Insights maintains a pip package for easy installation. It can work in any standard Python environment using pip install datasetinsights
command. We support Python 3 (3.7 and 3.8).
Getting Started¶
Dataset Statistics¶
We provide a sample notebook to help you load synthetic datasets generated using Perception package and visualize dataset statistics. We plan to support other sample Unity projects in the future.
Dataset Download¶
You can download the datasets from HTTP(s), GCS, and Unity simulation projects using the download command from CLI or API.
datasetinsights download \
--source-uri=<xxx> \
--output=$HOME/data
GCSDatasetDownloader downloads a dataset from GCS location.
from datasetinsights.io.downloader import GCSDatasetDownloader
source_uri=gs://url/to/file.zip or gs://url/to/folder
dest = "~/data"
downloader = GCSDatasetDownloader()
downloader.download(source_uri=source_uri, output=data_root)
HTTPDatasetDownloader downloads a dataset from any HTTP(S) location.
from datasetinsights.io.downloader import HTTPDatasetDownloader
source_uri=http://url.to.file.zip
dest = "~/data"
downloader = HTTPDatasetDownloader()
downloader.download(source_uri=source_uri, output=data_root)
Dataset Explore¶
You can explore the dataset schema by using following API:
AnnotationDefinitions and MetricDefinitions loads synthetic dataset definition tables and return a dictionary containing the definitions.
from datasetinsights.datasets.unity_perception import AnnotationDefinitions,
MetricDefinitions
annotation_def = AnnotationDefinitions(data_root=dest, version="my_schema_version")
definition_dict = annotation_def.get_definition(def_id="my_definition_id")
metric_def = MetricDefinitions(data_root=dest, version="my_schema_version")
definition_dict = metric_def.get_definition(def_id="my_definition_id")
Captures loads synthetic dataset captures tables and return a pandas dataframe with captures and annotations columns.
from datasetinsights.datasets.unity_perception import Captures
captures = Captures(data_root=dest, version="my_schema_version")
captures_df = captures.filter(def_id="my_definition_id")
Metrics loads synthetic dataset metrics table which holds extra metadata that can be used to describe a particular sequence, capture or annotation and return a pandas dataframe with captures and metrics columns.
from datasetinsights.datasets.unity_perception import Metrics
metrics = Metrics(data_root=dest, version="my_schema_version")
metrics_df = metrics.filter_metrics(def_id="my_definition_id")
Indices and tables¶
Citation¶
If you find this package useful, consider citing it using:
@misc{datasetinsights2020,
title={Unity {D}ataset {I}nsights Package},
author={{Unity Technologies}},
howpublished={\url{https://github.com/Unity-Technologies/datasetinsights}},
year={2020}
}