Datasets

Module to get datasets in pycaret

pycaret.datasets.get_data(dataset: str = 'index', folder: Optional[str] = None, save_copy: bool = False, profile: bool = False, verbose: bool = True, address: Optional[str] = None)

Function to load sample datasets.

Order of read: (1) Tries to read dataset from local folder first. (2) Then tries to read dataset from folder in GitHub “address” (see below) (3) Then tries to read from sktime (if installed) (4) Raises error if none exist

List of available datasets on GitHub can be checked using (1) get_data('index') or (2) get_data('index', folder='time_series/seasonal) (see available “folder” options below)

Example

>>> from pycaret.datasets import get_data
>>> all_datasets = get_data('index')
>>> juice = get_data('juice')
dataset: str, default = ‘index’

Index value of dataset.

folder: Optional[str], default = None

The folder from which to get the data. If ‘None’, gets it from the “common” folder. Other options are:

  • time_series/seasonal

  • time_series/random_walk

  • time_series/white_noise

save_copy: bool, default = False

When set to true, it saves a copy in current working directory.

profile: bool, default = False

When set to true, an interactive EDA report is displayed.

verbose: bool, default = True

When set to False, head of data is not displayed.

address: string, default = None

Download url of dataset. Defaults to None which fetches the dataset from “https://raw.githubusercontent.com/pycaret/datasets/main/”. For people having difficulty linking to github, they can change the default address to their own (e.g. “https://gitee.com/IncubatorShokuhou/pycaret/raw/master/datasets/”)

Returns

pandas.DataFrame

Warning

  • Use of get_data requires internet connection.

Raises

ImportError

  1. When trying to import time series datasets that require sktime, but sktime has not been installed. (2) If the data does not exist