Datasets
Module to get datasets in pycaret
- pycaret.datasets.get_data(dataset: str = 'index', folder: Optional[str] = None, save_copy: bool = False, profile: bool = False, verbose: bool = True, address: Optional[str] = None)
Function to load sample datasets.
Order of read: (1) Tries to read dataset from local folder first. (2) Then tries to read dataset from folder in GitHub “address” (see below) (3) Then tries to read from sktime (if installed) (4) Raises error if none exist
List of available datasets on GitHub can be checked using (1)
get_data('index')
or (2)get_data('index', folder='time_series/seasonal)
(see available “folder” options below)Example
>>> from pycaret.datasets import get_data >>> all_datasets = get_data('index') >>> juice = get_data('juice')
- dataset: str, default = ‘index’
Index value of dataset.
- folder: Optional[str], default = None
The folder from which to get the data. If ‘None’, gets it from the “common” folder. Other options are:
time_series/seasonal
time_series/random_walk
time_series/white_noise
- save_copy: bool, default = False
When set to true, it saves a copy in current working directory.
- profile: bool, default = False
When set to true, an interactive EDA report is displayed.
- verbose: bool, default = True
When set to False, head of data is not displayed.
- address: string, default = None
Download url of dataset. Defaults to None which fetches the dataset from “https://raw.githubusercontent.com/pycaret/datasets/main/”. For people having difficulty linking to github, they can change the default address to their own (e.g. “https://gitee.com/IncubatorShokuhou/pycaret/raw/master/datasets/”)
- Returns
pandas.DataFrame
Warning
Use of
get_data
requires internet connection.
- Raises
ImportError –
When trying to import time series datasets that require sktime, but sktime has not been installed. (2) If the data does not exist