datasetinsights.io.downloader¶
datasetinsights.io.downloader.base¶
-
class
datasetinsights.io.downloader.base.
DatasetDownloader
(**kwargs)¶ Bases:
abc.ABC
This is the base class for all dataset downloaders The DatasetDownloader can be subclasses in the following way
class NewDatasetDownloader(DatasetDownloader, protocol=”protocol://”)
Here the ‘protocol://’ should match the prefix that the method download source_uri supports. Example http:// gs://
-
abstract
download
(source_uri, output, **kwargs)¶ This method downloads a dataset stored at the source_uri and stores it in the output directory
- Parameters
source_uri – URI that points to the dataset that should be downloaded
output – path to local folder where the dataset should be stored
-
abstract
-
datasetinsights.io.downloader.base.
create_dataset_downloader
(source_uri, **kwargs)¶ - This function instantiates the dataset downloader
after finding it with the source-uri provided
- Parameters
source_uri – URI used to look up the correct dataset downloader
**kwargs –
Returns: The dataset downloader instance matching the source-uri.
datasetinsights.io.downloader.gcs_downloader¶
-
class
datasetinsights.io.downloader.gcs_downloader.
GCSDatasetDownloader
(**kwargs)¶ Bases:
datasetinsights.io.downloader.base.DatasetDownloader
This class is used to download data from GCS
-
download
(source_uri=None, output=None, **kwargs)¶ - Parameters
source_uri – This is the downloader-uri that indicates where on GCS the dataset should be downloaded from. The expected source-uri follows these patterns gs://bucket/folder or gs://bucket/folder/data.zip
output – This is the path to the directory where the download will store the dataset.
-
datasetinsights.io.downloader.http_downloader¶
-
class
datasetinsights.io.downloader.http_downloader.
HTTPDatasetDownloader
(**kwargs)¶ Bases:
datasetinsights.io.downloader.base.DatasetDownloader
This class is used to download data from any HTTP or HTTPS public url and perform function such as downloading the dataset and checksum validation if checksum file path is provided.
-
download
(source_uri, output, checksum_file=None, **kwargs)¶ This method is used to download the dataset from HTTP or HTTPS url.
- Parameters
source_uri (str) – This is the downloader-uri that indicates where the dataset should be downloaded from.
output (str) – This is the path to the directory where the download will store the dataset.
checksum_file (str) – This is path of the txt file that contains checksum of the dataset to be downloaded. It can be HTTP or HTTPS url or local path.
- Raises
ChecksumError – This will raise this error if checksum doesn’t matches
-
-
class
datasetinsights.io.downloader.
GCSDatasetDownloader
(**kwargs)¶ Bases:
datasetinsights.io.downloader.base.DatasetDownloader
This class is used to download data from GCS
-
download
(source_uri=None, output=None, **kwargs)¶ - Parameters
source_uri – This is the downloader-uri that indicates where on GCS the dataset should be downloaded from. The expected source-uri follows these patterns gs://bucket/folder or gs://bucket/folder/data.zip
output – This is the path to the directory where the download will store the dataset.
-
-
class
datasetinsights.io.downloader.
HTTPDatasetDownloader
(**kwargs)¶ Bases:
datasetinsights.io.downloader.base.DatasetDownloader
This class is used to download data from any HTTP or HTTPS public url and perform function such as downloading the dataset and checksum validation if checksum file path is provided.
-
download
(source_uri, output, checksum_file=None, **kwargs)¶ This method is used to download the dataset from HTTP or HTTPS url.
- Parameters
source_uri (str) – This is the downloader-uri that indicates where the dataset should be downloaded from.
output (str) – This is the path to the directory where the download will store the dataset.
checksum_file (str) – This is path of the txt file that contains checksum of the dataset to be downloaded. It can be HTTP or HTTPS url or local path.
- Raises
ChecksumError – This will raise this error if checksum doesn’t matches
-
-
datasetinsights.io.downloader.
create_dataset_downloader
(source_uri, **kwargs)¶ - This function instantiates the dataset downloader
after finding it with the source-uri provided
- Parameters
source_uri – URI used to look up the correct dataset downloader
**kwargs –
Returns: The dataset downloader instance matching the source-uri.