Time Series

class pycaret.time_series.TSForecastingExperiment
setup(data: Optional[Union[Series, DataFrame]] = None, data_func: Optional[Callable[[], Union[Series, DataFrame]]] = None, target: Optional[str] = None, index: Optional[str] = None, ignore_features: Optional[List] = None, numeric_imputation_target: Optional[Union[str, int, float]] = None, numeric_imputation_exogenous: Optional[Union[str, int, float]] = None, transform_target: Optional[str] = None, transform_exogenous: Optional[str] = None, scale_target: Optional[str] = None, scale_exogenous: Optional[str] = None, fe_target_rr: Optional[list] = None, fe_exogenous: Optional[list] = None, fold_strategy: Union[str, Any] = 'expanding', fold: int = 3, fh: Optional[Union[List[int], int, ndarray, ForecastingHorizon]] = 1, hyperparameter_split: str = 'all', seasonal_period: Optional[Union[List[Union[int, str]], int, str]] = None, ignore_seasonality_test: bool = False, sp_detection: str = 'auto', max_sp_to_consider: Optional[int] = 60, remove_harmonics: bool = False, harmonic_order_method: str = 'harmonic_max', num_sps_to_use: int = 1, seasonality_type: str = 'mul', point_alpha: Optional[float] = None, coverage: Union[float, List[float]] = 0.9, enforce_exogenous: bool = True, n_jobs: Optional[int] = -1, use_gpu: bool = False, custom_pipeline: Optional[Any] = None, html: bool = True, session_id: Optional[int] = None, system_log: Union[bool, str, Logger] = True, log_experiment: Union[bool, str, BaseLogger, List[Union[str, BaseLogger]]] = False, experiment_name: Optional[str] = None, experiment_custom_tags: Optional[Dict[str, Any]] = None, log_plots: Union[bool, list] = False, log_profile: bool = False, log_data: bool = False, engine: Optional[Dict[str, str]] = None, verbose: bool = True, profile: bool = False, profile_kwargs: Optional[Dict[str, Any]] = None, fig_kwargs: Optional[Dict[str, Any]] = None)

This function initializes the training environment and creates the transformation pipeline. Setup function must be called before executing any other function. It takes one mandatory parameters: data. All the other parameters are optional.

Example

>>> from pycaret.datasets import get_data
>>> airline = get_data('airline')
>>> from pycaret.time_series import *
>>> exp_name = setup(data = airline,  fh = 12)
datapandas.Series or pandas.DataFrame = None

Shape (n_samples, 1), when pandas.DataFrame, otherwise (n_samples, ).

data_func: Callable[[], Union[pd.Series, pd.DataFrame]] = None

The function that generate data (the dataframe-like input). This is useful when the dataset is large, and you need parallel operations such as compare_models. It can avoid broadcasting large dataset from driver to workers. Notice one and only one of data and data_func must be set.

targetOptional[str], default = None

Target name to be forecasted. Must be specified when data is a pandas DataFrame with more than 1 column. When data is a pandas Series or pandas DataFrame with 1 column, this can be left as None.

index: Optional[str], default = None

Column name to be used as the datetime index for modeling. If ‘index’ column is specified & is of type string, it is assumed to be coercible to pd.DatetimeIndex using pd.to_datetime(). It can also be of type Int (e.g. RangeIndex, Int64Index), or DatetimeIndex or PeriodIndex in which case, it is processed appropriately. If None, then the data’s index is used as is for modeling.

ignore_features: Optional[List], default = None

List of features to ignore for modeling when the data is a pandas Dataframe with more than 1 column. Ignored when data is a pandas Series or Dataframe with 1 column.

numeric_imputation_target: Optional[Union[int, float, str]], default = None

Indicates how to impute missing values in the target. If None, no imputation is done. If the target has missing values, then imputation is mandatory. If str, then value passed as is to the underlying sktime imputer. Allowed values are:

“drift”, “linear”, “nearest”, “mean”, “median”, “backfill”, “bfill”, “pad”, “ffill”, “random”

If int or float, imputation method is set to “constant” with the given value.

numeric_imputation_exogenous: Optional[Union[int, float, str]], default = None

Indicates how to impute missing values in the exogenous variables. If None, no imputation is done. If exogenous variables have missing values, then imputation is mandatory. If str, then value passed as is to the underlying sktime imputer. Allowed values are:

“drift”, “linear”, “nearest”, “mean”, “median”, “backfill”, “bfill”, “pad”, “ffill”, “random”

If int or float, imputation method is set to “constant” with the given value.

transform_target: Optional[str], default = None

Indicates how the target variable should be transformed. If None, no transformation is performed. Allowed values are

“box-cox”, “log”, “sqrt”, “exp”, “cos”

transform_exogenous: Optional[str], default = None

Indicates how the exogenous variables should be transformed. If None, no transformation is performed. Allowed values are

“box-cox”, “log”, “sqrt”, “exp”, “cos”

scale_target: Optional[str], default = None

Indicates how the target variable should be scaled. If None, no scaling is performed. Allowed values are

“zscore”, “minmax”, “maxabs”, “robust”

scale_exogenous: Optional[str], default = None

Indicates how the exogenous variables should be scaled. If None, no scaling is performed. Allowed values are

“zscore”, “minmax”, “maxabs”, “robust”

fe_target_rr: Optional[list], default = None

The transformers to be applied to the target variable in order to extract useful features. By default, None which means that the provided target variable are used “as is”.

NOTE: Most statistical and baseline models already use features (lags) for target variables implicitly. The only place where target features have to be created explicitly is in reduced regression models. Hence, this feature extraction is only applied to reduced regression models.

>>> import numpy as np
>>> from pycaret.datasets import get_data
>>> from sktime.transformations.series.summarize import WindowSummarizer
>>> data = get_data("airline")
>>> kwargs = {"lag_feature": {"lag": [36, 24, 13, 12, 11, 9, 6, 3, 2, 1]}}
>>> fe_target_rr = [WindowSummarizer(n_jobs=1, truncate="bfill", **kwargs)]
>>> # Baseline
>>> exp = TSForecastingExperiment()
>>> exp.setup(data=data, fh=12, fold=3, session_id=42)
>>> model1 = exp.create_model("lr_cds_dt")
>>> # With Feature Engineering
>>> exp = TSForecastingExperiment()
>>> exp.setup(
>>>     data=data, fh=12, fold=3, fe_target_rr=fe_target_rr, session_id=42
>>> )
>>> model2 = exp.create_model("lr_cds_dt")
>>> exp.plot_model([model1, model2], data_kwargs={"labels": ["Baseline", "With FE"]})
fe_exogenousOptional[list] = None

The transformations to be applied to the exogenous variables. These transformations are used for all models that accept exogenous variables. By default, None which means that the provided exogenous variables are used “as is”.

>>> import numpy as np
>>> from sktime.transformations.series.summarize import WindowSummarizer
>>> # Example: function num_above_thresh to count how many observations lie above
>>> # the threshold within a window of length 2, lagged by 0 periods.
>>> def num_above_thresh(x):
>>>     '''Count how many observations lie above threshold.'''
>>>     return np.sum((x > 0.7)[::-1])
>>> kwargs1 = {"lag_feature": {"lag": [0, 1], "mean": [[0, 4]]}}
>>> kwargs2 = {
>>>     "lag_feature": {
>>>         "lag": [0, 1], num_above_thresh: [[0, 2]],
>>>         "mean": [[0, 4]], "std": [[0, 4]]
>>>     }
>>> }
>>> fe_exogenous = [
>>>     (
            "a", WindowSummarizer(
>>>             n_jobs=1, target_cols=["Income"], truncate="bfill", **kwargs1
>>>         )
>>>     ),
>>>     (
>>>         "b", WindowSummarizer(
>>>             n_jobs=1, target_cols=["Unemployment", "Production"], truncate="bfill", **kwargs2
>>>         )
>>>     ),
>>> ]
>>> data = get_data("uschange")
>>> exp = TSForecastingExperiment()
>>> exp.setup(
>>>     data=data, target="Consumption", fh=12,
>>>     fe_exogenous=fe_exogenous, session_id=42
>>> )
>>> print(f"Feature Columns: {exp.get_config('X_transformed').columns}")
>>> model = exp.create_model("lr_cds_dt")
fold_strategy: str or sklearn CV generator object, default = ‘expanding’

Choice of cross validation strategy. Possible values are:

  • ‘expanding’

  • ‘rolling’ (same as/aliased to ‘expanding’)

  • ‘sliding’

You can also pass an sktime compatible cross validation object such as SlidingWindowSplitter or ExpandingWindowSplitter. In this case, the fold and fh parameters will be ignored and these values will be extracted from the fold_strategy object directly.

fold: int, default = 3

Number of folds to be used in cross validation. Must be at least 2. This is a global setting that can be over-written at function level by using fold parameter. Ignored when fold_strategy is a custom object.

fh: Optional[int or list or np.array or ForecastingHorizon], default = 1

The forecast horizon to be used for forecasting. Default is set to 1 i.e. forecast one point ahead. Valid options are: (1) Integer: When integer is passed it means N continuous points in

the future without any gap.

  1. List or np.array: Indicates points to predict in the future. e.g. fh = [1, 2, 3, 4] or np.arange(1, 5) will predict 4 points in the future.

  2. If you want to forecast values with gaps, you can pass an list or array with gaps. e.g. np.arange([13, 25]) will skip the first 12 future points and forecast from the 13th point till the 24th point ahead (note in numpy right value is inclusive and left is exclusive).

  3. Can also be a sktime compatible ForecastingHorizon object.

  4. If fh = None, then fold_strategy must be a sktime compatible cross validation object. In this case, fh is derived from this object.

hyperparameter_split: str, default = “all”

The split of data used to determine certain hyperparameters such as “seasonal_period”, whether multiplicative seasonality can be used or not, whether the data is white noise or not, the values of non-seasonal difference “d” and seasonal difference “D” to use in certain models. Allowed values are: [“all”, “train”]. Refer for more details: https://github.com/pycaret/pycaret/issues/3202

seasonal_period: list or int or str, default = None

Seasonal periods to use when performing seasonality checks (i.e. candidates).

Users can provide seasonal_period by passing it as an integer or a string corresponding to the keys below (e.g. ‘W’ for weekly data, ‘M’ for monthly data, etc.).

  • B, C = 5

  • D = 7

  • W = 52

  • M, BM, CBM, MS, BMS, CBMS = 12

  • SM, SMS = 24

  • Q, BQ, QS, BQS = 4

  • A, Y, BA, BY, AS, YS, BAS, BYS = 1

  • H = 24

  • T, min = 60

  • S = 60

Users can also provide a list of such values to use in models that accept multiple seasonal values (currently TBATS). For models that don’t accept multiple seasonal values, the first value of the list will be used as the seasonal period.

NOTE: (1) If seasonal_period is provided, whether the seasonality check is performed or not depends on the ignore_seasonality_test setting. (2) If seasonal_period is not provided, then the candidates are detected per the sp_detection setting. If seasonal_period is provided, sp_detection setting is ignored.

ignore_seasonality_test: bool = False

Whether to ignore the seasonality test or not. Applicable when seasonal_period is provided. If False, then a seasonality tests is performed to determine if the provided seasonal_period is valid or not. If it is found to be not valid, no seasonal period is used for modeling. If True, then the the provided seasonal_period is used as is.

sp_detection: str, default = “auto”

If seasonal_period is None, then this parameter determines the algorithm to use to detect the seasonal periods to use in the models.

Allowed values are [“auto” or “index”].

If “auto”, then seasonal periods are detected using statistical tests. If “index”, then the frequency of the data index is mapped to a seasonal period as shown in seasonal_period.

max_sp_to_consider: Optional[int], default = 60,

Max period to consider when detecting seasonal periods. If None, all periods up to int((“length of data”-1)/2) are considered. Length of the data is determined by hyperparameter_split setting.

remove_harmonics: bool, default = False

Should harmonics be removed when considering what seasonal periods to use for modeling.

harmonic_order_method: str, default = “harmonic_max”

Applicable when remove_harmonics = True. This determines how the harmonics are replaced. Allowed values are “harmonic_strength”, “harmonic_max” or “raw_strength. - If set to “harmonic_max”, then lower seasonal period is replaced by its highest harmonic seasonal period in same position as the lower seasonal period. - If set to “harmonic_strength”, then lower seasonal period is replaced by its highest strength harmonic seasonal period in same position as the lower seasonal period. - If set to “raw_strength”, then lower seasonal periods is removed and the higher harmonic seasonal periods is retained in its original position based on its seasonal strength.

e.g. Assuming detected seasonal periods in strength order are [2, 3, 4, 50] and remove_harmonics = True, then: - If harmonic_order_method = “harmonic_max”, result = [50, 3, 4] - If harmonic_order_method = “harmonic_strength”, result = [4, 3, 50] - If harmonic_order_method = “raw_strength”, result = [3, 4, 50]

num_sps_to_use: int, default = 1

It determines the maximum number of seasonal periods to use in the models. Set to -1 to use all detected seasonal periods (in models that allow multiple seasonalities). If a model only allows one seasonal period and num_sps_to_use > 1, then the most dominant (primary) seasonal that is detected is used.

seasonality_typestr, default = “mul”

The type of seasonality to use. Allowed values are [“add”, “mul” or “auto”]

The detection flow sequence is as follows: (1) If seasonality is not detected, then seasonality type is set to None. (2) If seasonality is detected but data is not strictly positive, then seasonality type is set to “add”. (3) If seasonality_type is “auto”, then the type of seasonality is determined using an internal algorithm as follows

  • If seasonality is detected, then data is decomposed using

additive and multiplicative seasonal decomposition. Then seasonality type is selected based on seasonality strength per FPP (https://otexts.com/fpp2/seasonal-strength.html). NOTE: For Multiplicative, the denominator multiplies the seasonal and residual components instead of adding them. Rest of the calculations remain the same. If seasonal decomposition fails for any reason, then defaults to multiplicative seasonality.

  1. Otherwise, seasonality_type is set to the user provided value.

point_alpha: Optional[float], default = None

The alpha (quantile) value to use for the point predictions. By default this is set to None which uses sktime’s predict() method to get the point prediction (the mean or the median of the forecast distribution). If this is set to a floating point value, then it switches to using the predict_quantiles() method to get the point prediction at the user specified quantile. Reference: https://robjhyndman.com/hyndsight/quantile-forecasts-in-r/

NOTE: (1) Not all models support predict_quantiles(), hence, if a float value is provided, these models will be disabled. (2) Under some conditions, the user may want to only work with models that support prediction intervals. Utilizing note 1 to our advantage, the point_alpha argument can be set to 0.5 (or any float value depending on the quantile that the user wants to use for point predictions). This will disable models that do not support prediction intervals.

coverage: Union[float, List[float]], default = 0.9

The coverage to be used for prediction intervals (only applicable for models that support prediction intervals).

If a float value is provides, it corresponds to the coverage needed (e.g. 0.9 means 90% coverage). This corresponds to lower and upper quantiles = 0.05 and 0.95 respectively.

Alternately, if user wants to get the intervals at specific quantiles, a list of 2 values can be provided directly. e.g. coverage = [0.2. 0.9] will return the lower interval corresponding to a quantile of 0.2 and an upper interval corresponding to a quantile of 0.9.

enforce_exogenous: bool, default = True

When set to True and the data includes exogenous variables, only models that support exogenous variables are loaded in the environment.When set to False, all models are included and in this case, models that do not support exogenous variables will model the data as a univariate forecasting problem.

n_jobs: int, default = -1

The number of jobs to run in parallel (for functions that supports parallel processing) -1 means using all processors. To run all functions on single processor set n_jobs to None.

use_gpu: bool or str, default = False

Parameter not in use for now. Behavior may change in future.

custom_pipeline: list of (str, transformer), dict or Pipeline, default = None

Parameter not in use for now. Behavior may change in future.

html: bool, default = True

When set to False, prevents runtime display of monitor. This must be set to False when the environment does not support IPython. For example, command line terminal, Databricks Notebook, Spyder and other similar IDEs.

session_id: int, default = None

Controls the randomness of experiment. It is equivalent to ‘random_state’ in scikit-learn. When None, a pseudo random number is generated. This can be used for later reproducibility of the entire experiment.

system_log: bool or str or logging.Logger, default = True

Whether to save the system logging file (as logs.log). If the input is a string, use that as the path to the logging file. If the input already is a logger object, use that one instead.

log_experiment: bool, default = False

When set to True, all metrics and parameters are logged on the MLflow server.

experiment_name: str, default = None

Name of the experiment for logging. Ignored when log_experiment is not True.

log_plots: bool or list, default = False

When set to True, certain plots are logged automatically in the MLFlow server. To change the type of plots to be logged, pass a list containing plot IDs. Refer to documentation of plot_model. Ignored when log_experiment is not True.

log_profile: bool, default = False

When set to True, data profile is logged on the MLflow server as a html file. Ignored when log_experiment is not True.

log_data: bool, default = False

When set to True, dataset is logged on the MLflow server as a csv file. Ignored when log_experiment is not True.

engine: Optional[Dict[str, str]] = None

The engine to use for the models, e.g. for auto_arima, users can switch between “pmdarima” and “statsforecast” by specifying engine={“auto_arima”: “statsforecast”}

verbose: bool, default = True

When set to False, Information grid is not printed.

profile: bool, default = False

When set to True, an interactive EDA report is displayed.

profile_kwargs: dict, default = {} (empty dict)

Dictionary of arguments passed to the ProfileReport method used to create the EDA report. Ignored if profile is False.

fig_kwargs: dict, default = {} (empty dict)

The global setting for any plots. Pass these as key-value pairs. Example: fig_kwargs = {“height”: 1000, “template”: “simple_white”}

Available keys are:

hoverinfo: hoverinfo passed to Plotly figures. Can be any value supported

by Plotly (e.g. “text” to display, “skip” or “none” to disable.). When not provided, hovering over certain plots may be disabled by PyCaret when the data exceeds a certain number of points (determined by big_data_threshold).

renderer: The renderer used to display the plotly figure. Can be any value

supported by Plotly (e.g. “notebook”, “png”, “svg”, etc.). Note that certain renderers (like “svg”) may need additional libraries to be installed. Users will have to do this manually since they don’t come preinstalled with plotly. When not provided, plots use plotly’s default render when data is below a certain number of points (determined by big_data_threshold) otherwise it switches to a static “png” renderer.

template: The template to use for the plots. Can be any value supported by Plotly.

If not provided, defaults to “ggplot2”

width: The width of the plot in pixels. If not provided, defaults to None

which lets Plotly decide the width.

height: The height of the plot in pixels. If not provided, defaults to None

which lets Plotly decide the height.

rows: The number of rows to use for plots where this can be customized,

e.g. ccf. If not provided, defaults to None which lets PyCaret decide based on number of subplots to be plotted.

cols: The number of columns to use for plots where this can be customized,

e.g. ccf. If not provided, defaults to 4

big_data_threshold: The number of data points above which hovering over

certain plots can be disabled and/or renderer switched to a static renderer. This is useful when the time series being modeled has a lot of data which can make notebooks slow to render. Also note that setting the display_format to a plotly-resampler figure (“plotly-dash” or “plotly-widget”) can circumvent these problems by performing dynamic data aggregation.

resampler_kwargs: The keyword arguments that are fed to configure the

plotly-resampler visualizations (i.e., display_format “plotly-dash” or “plotly-widget”) which down sampler will be used; how many data points are shown in the front-end. When the plotly-resampler figure is rendered via Dash (by setting the display_format to “plotly-dash”), one can also use the “show_dash” key within this dictionary to configure the show_dash method its args.

example:

fig_kwargs = {
    ...,
    "resampler_kwargs":  {
        "default_n_shown_samples": 1000,
        "show_dash": {"mode": "inline", "port": 9012}
    }
}
Returns

Global variables that can be changed using the set_config function.

compare_models(include: Optional[List[Union[str, Any]]] = None, exclude: Optional[List[str]] = None, fold: Optional[Union[int, Any]] = None, round: int = 4, cross_validation: bool = True, sort: str = 'MASE', n_select: int = 1, budget_time: Optional[float] = None, turbo: bool = True, errors: str = 'ignore', fit_kwargs: Optional[dict] = None, experiment_custom_tags: Optional[Dict[str, Any]] = None, engine: Optional[Dict[str, str]] = None, verbose: bool = True, parallel: Optional[ParallelBackend] = None)

This function trains and evaluates performance of all estimators available in the model library using cross validation. The output of this function is a score grid with average cross validated scores. Metrics evaluated during CV can be accessed using the get_metrics function. Custom metrics can be added or removed using add_metric and remove_metric function.

Example

>>> from pycaret.datasets import get_data
>>> airline = get_data('airline')
>>> from pycaret.time_series import *
>>> exp_name = setup(data = airline,  fh = 12)
>>> best_model = compare_models()
include: list of str or sktime compatible object, default = None

To train and evaluate select models, list containing model ID or scikit-learn compatible object can be passed in include param. To see a list of all models available in the model library use the models function.

exclude: list of str, default = None

To omit certain models from training and evaluation, pass a list containing model id in the exclude parameter. To see a list of all models available in the model library use the models function.

fold: int or scikit-learn compatible CV generator, default = None

Controls cross-validation. If None, the CV generator in the fold_strategy parameter of the setup function is used. When an integer is passed, it is interpreted as the ‘n_splits’ parameter of the CV generator in the setup function.

round: int, default = 4

Number of decimal places the metrics in the score grid will be rounded to.

cross_validation: bool, default = True

When set to False, metrics are evaluated on holdout set. fold param is ignored when cross_validation is set to False.

sort: str, default = ‘MASE’

The sort order of the score grid. It also accepts custom metrics that are added through the add_metric function.

n_select: int, default = 1

Number of top_n models to return. For example, to select top 3 models use n_select = 3.

budget_time: int or float, default = None

If not None, will terminate execution of the function after budget_time minutes have passed and return results up to that point.

turbo: bool, default = True

When set to True, it excludes estimators with longer training times. To see which algorithms are excluded use the models function.

errors: str, default = ‘ignore’

When set to ‘ignore’, will skip the model with exceptions and continue. If ‘raise’, will break the function when exceptions are raised.

fit_kwargs: dict, default = {} (empty dict)

Dictionary of arguments passed to the fit method of the model.

engine: Optional[Dict[str, str]] = None

The engine to use for the models, e.g. for auto_arima, users can switch between “pmdarima” and “statsforecast” by specifying engine={“auto_arima”: “statsforecast”}

verbose: bool, default = True

Score grid is not printed when verbose is set to False.

parallel: pycaret.internal.parallel.parallel_backend.ParallelBackend, default = None

A ParallelBackend instance. For example if you have a SparkSession session, you can use FugueBackend(session) to make this function running using Spark. For more details, see FugueBackend

Returns

Trained model or list of trained models, depending on the n_select param.

Warning

  • Changing turbo parameter to False may result in very high training times.

  • No models are logged in MLflow when cross_validation parameter is False.

create_model(estimator: Union[str, Any], fold: Optional[Union[int, Any]] = None, round: int = 4, cross_validation: bool = True, fit_kwargs: Optional[dict] = None, experiment_custom_tags: Optional[Dict[str, Any]] = None, engine: Optional[str] = None, verbose: bool = True, **kwargs)

This function trains and evaluates the performance of a given estimator using cross validation. The output of this function is a score grid with CV scores by fold. Metrics evaluated during CV can be accessed using the get_metrics function. Custom metrics can be added or removed using add_metric and remove_metric function. All the available models can be accessed using the models function.

Example

>>> from pycaret.datasets import get_data
>>> airline = get_data('airline')
>>> from pycaret.time_series import *
>>> exp_name = setup(data = airline,  fh = 12)
>>> naive = create_model('naive')
estimator: str or sktime compatible object

ID of an estimator available in model library or pass an untrained model object consistent with scikit-learn API. Estimators available in the model library (ID - Name):

NOTE: The available estimators depend on multiple factors such as what libraries have been installed and the setup of the experiment. As such, some of these may not be available for your experiment. To see the list of available models, please run setup() first, then models().

  • ‘naive’ - Naive Forecaster

  • ‘grand_means’ - Grand Means Forecaster

  • ‘snaive’ - Seasonal Naive Forecaster (disabled when seasonal_period = 1)

  • ‘polytrend’ - Polynomial Trend Forecaster

  • ‘arima’ - ARIMA family of models (ARIMA, SARIMA, SARIMAX)

  • ‘auto_arima’ - Auto ARIMA

  • ‘exp_smooth’ - Exponential Smoothing

  • ‘stlf’ - STL Forecaster

  • ‘croston’ - Croston Forecaster

  • ‘ets’ - ETS

  • ‘theta’ - Theta Forecaster

  • ‘tbats’ - TBATS

  • ‘bats’ - BATS

  • ‘prophet’ - Prophet Forecaster

  • ‘lr_cds_dt’ - Linear w/ Cond. Deseasonalize & Detrending

  • ‘en_cds_dt’ - Elastic Net w/ Cond. Deseasonalize & Detrending

  • ‘ridge_cds_dt’ - Ridge w/ Cond. Deseasonalize & Detrending

  • ‘lasso_cds_dt’ - Lasso w/ Cond. Deseasonalize & Detrending

  • ‘llar_cds_dt’ - Lasso Least Angular Regressor w/ Cond. Deseasonalize & Detrending

  • ‘br_cds_dt’ - Bayesian Ridge w/ Cond. Deseasonalize & Deseasonalize & Detrending

  • ‘huber_cds_dt’ - Huber w/ Cond. Deseasonalize & Detrending

  • ‘omp_cds_dt’ - Orthogonal Matching Pursuit w/ Cond. Deseasonalize & Detrending

  • ‘knn_cds_dt’ - K Neighbors w/ Cond. Deseasonalize & Detrending

  • ‘dt_cds_dt’ - Decision Tree w/ Cond. Deseasonalize & Detrending

  • ‘rf_cds_dt’ - Random Forest w/ Cond. Deseasonalize & Detrending

  • ‘et_cds_dt’ - Extra Trees w/ Cond. Deseasonalize & Detrending

  • ‘gbr_cds_dt’ - Gradient Boosting w/ Cond. Deseasonalize & Detrending

  • ‘ada_cds_dt’ - AdaBoost w/ Cond. Deseasonalize & Detrending

  • ‘lightgbm_cds_dt’ - Light Gradient Boosting w/ Cond. Deseasonalize & Detrending

  • ‘catboost_cds_dt’ - CatBoost w/ Cond. Deseasonalize & Detrending

fold: int or scikit-learn compatible CV generator, default = None

Controls cross-validation. If None, the CV generator in the fold_strategy parameter of the setup function is used. When an integer is passed, it is interpreted as the ‘n_splits’ parameter of the CV generator in the setup function.

round: int, default = 4

Number of decimal places the metrics in the score grid will be rounded to.

cross_validation: bool, default = True

When set to False, metrics are evaluated on holdout set. fold param is ignored when cross_validation is set to False.

fit_kwargs: dict, default = {} (empty dict)

Dictionary of arguments passed to the fit method of the model.

engine: Optional[str] = None

The engine to use for the model, e.g. for auto_arima, users can switch between “pmdarima” and “statsforecast” by specifying engine=”statsforecast”.

verbose: bool, default = True

Score grid is not printed when verbose is set to False.

**kwargs:

Additional keyword arguments to pass to the estimator.

Returns

Trained Model

Warning

  • Models are not logged on the MLFlow server when cross_validation param

is set to False.

static update_fit_kwargs_with_fh_from_cv(fit_kwargs: Optional[Dict], cv) Dict

Updated the fit_ kwargs to include the fh parameter from cv

Parameters
  • fit_kwargs (Optional[Dict]) – Original fit kwargs

  • cv ([type]) – cross validation object

Returns

Updated fit kwargs

Return type

Dict[Any]

tune_model(estimator, fold: Optional[Union[int, Any]] = None, round: int = 4, n_iter: int = 10, custom_grid: Optional[Union[Dict[str, list], Any]] = None, optimize: str = 'MASE', custom_scorer=None, search_algorithm: Optional[str] = None, choose_better: bool = True, fit_kwargs: Optional[dict] = None, return_tuner: bool = False, verbose: bool = True, tuner_verbose: Union[int, bool] = True, **kwargs)

This function tunes the hyperparameters of a given estimator. The output of this function is a score grid with CV scores by fold of the best selected model based on optimize parameter. Metrics evaluated during CV can be accessed using the get_metrics function. Custom metrics can be added or removed using add_metric and remove_metric function.

Example

>>> from pycaret.datasets import get_data
>>> airline = get_data('airline')
>>> from pycaret.time_series import *
>>> exp_name = setup(data = airline,  fh = 12)
>>> dt = create_model('dt_cds_dt')
>>> tuned_dt = tune_model(dt)
estimator: sktime compatible object

Trained model object

fold: int or scikit-learn compatible CV generator, default = None

Controls cross-validation. If None, the CV generator in the fold_strategy parameter of the setup function is used. When an integer is passed, it is interpreted as the ‘n_splits’ parameter of the CV generator in the setup function.

round: int, default = 4

Number of decimal places the metrics in the score grid will be rounded to.

n_iter: int, default = 10

Number of iterations in the grid search. Increasing ‘n_iter’ may improve model performance but also increases the training time.

custom_grid: dictionary, default = None

To define custom search space for hyperparameters, pass a dictionary with parameter name and values to be iterated. Custom grids must be in a format supported by the defined search_library.

optimize: str, default = ‘MASE’

Metric name to be evaluated for hyperparameter tuning. It also accepts custom metrics that are added through the add_metric function.

custom_scorer: object, default = None

custom scoring strategy can be passed to tune hyperparameters of the model. It must be created using sklearn.make_scorer. It is equivalent of adding custom metric using the add_metric function and passing the name of the custom metric in the optimize parameter. Will be deprecated in future.

search_algorithm: str, default = ‘random’

use ‘random’ for random grid search and ‘grid’ for complete grid search.

choose_better: bool, default = True

When set to True, the returned object is always better performing. The metric used for comparison is defined by the optimize parameter.

fit_kwargs: dict, default = {} (empty dict)

Dictionary of arguments passed to the fit method of the tuner.

return_tuner: bool, default = False

When set to True, will return a tuple of (model, tuner_object).

verbose: bool, default = True

Score grid is not printed when verbose is set to False.

tuner_verbose: bool or in, default = True

If True or above 0, will print messages from the tuner. Higher values print more messages. Ignored when verbose param is False.

**kwargs:

Additional keyword arguments to pass to the optimizer.

Returns

Trained Model and Optional Tuner Object when return_tuner is True.

blend_models(estimator_list: list, method: str = 'mean', fold: Optional[Union[int, Any]] = None, round: int = 4, choose_better: bool = False, optimize: str = 'MASE', weights: Optional[List[float]] = None, fit_kwargs: Optional[dict] = None, verbose: bool = True)

This function trains a EnsembleForecaster for select models passed in the estimator_list param. Trains a sktime EnsembleForecaster under the hood. Refer to it’s documentation for more details.

Example

>>> from pycaret.datasets import get_data
>>> airline = get_data('airline')
>>> from pycaret.time_series import *
>>> exp_name = setup(data = airline,  fh = 12)
>>> top3 = compare_models(n_select = 3)
>>> blender = blend_models(top3)
estimator_list: list of sktime compatible estimators

List of model objects

method: str, default = ‘mean’

Method to average the individual predictions to form a final prediction. Available Methods:

  • ‘mean’ - Mean of individual predictions

  • ‘gmean’ - Geometric Mean of individual predictions

  • ‘median’ - Median of individual predictions

  • ‘min’ - Minimum of individual predictions

  • ‘max’ - Maximum of individual predictions

fold: int or scikit-learn compatible CV generator, default = None

Controls cross-validation. If None, the CV generator in the fold_strategy parameter of the setup function is used. When an integer is passed, it is interpreted as the ‘n_splits’ parameter of the CV generator in the setup function.

round: int, default = 4

Number of decimal places the metrics in the score grid will be rounded to.

choose_better: bool, default = False

When set to True, the returned object is always better performing. The metric used for comparison is defined by the optimize parameter.

optimize: str, default = ‘MASE’

Metric to compare for model selection when choose_better is True.

weights: list, default = None

Sequence of weights (float or int) to apply to the individual model predictons. Uses uniform weights when None. Note that weights only apply ‘mean’, ‘gmean’ and ‘median’ methods.

fit_kwargs: dict, default = {} (empty dict)

Dictionary of arguments passed to the fit method of the model.

verbose: bool, default = True

Score grid is not printed when verbose is set to False.

Returns

Trained Model

static plot_model_check_display_format_(display_format: Optional[str])

Checks if the display format is in the allowed list.

display_format: Optional[str], default = None

The to-be-used displaying method

plot_model(estimator: Optional[Any] = None, plot: Optional[str] = None, return_fig: bool = False, return_data: bool = False, verbose: bool = False, display_format: Optional[str] = None, data_kwargs: Optional[Dict] = None, fig_kwargs: Optional[Dict] = None, save: Union[str, bool] = False) Optional[Tuple[str, list]]

This function analyzes the performance of a trained model on holdout set. When used without any estimator, this function generates plots on the original data set. When used with an estimator, it will generate plots on the model residuals.

Example

>>> from pycaret.datasets import get_data
>>> airline = get_data('airline')
>>> from pycaret.time_series import *
>>> exp_name = setup(data = airline,  fh = 12)
>>> plot_model(plot="diff", data_kwargs={"order_list": [1, 2], "acf": True, "pacf": True})
>>> plot_model(plot="diff", data_kwargs={"lags_list": [[1], [1, 12]], "acf": True, "pacf": True})
>>> arima = create_model('arima')
>>> plot_model(plot = 'ts')
>>> plot_model(plot = 'decomp', data_kwargs = {'type' : 'multiplicative'})
>>> plot_model(plot = 'decomp', data_kwargs = {'seasonal_period': 24})
>>> plot_model(estimator = arima, plot = 'forecast', data_kwargs = {'fh' : 24})
>>> tuned_arima = tune_model(arima)
>>> plot_model([arima, tuned_arima], data_kwargs={"labels": ["Baseline", "Tuned"]})
estimator: sktime compatible object, default = None

Trained model object

plot: str, default = None

Default is ‘ts’ when estimator is None, When estimator is not None, default is changed to ‘forecast’. List of available plots (ID - Name):

  • ‘ts’ - Time Series Plot

  • ‘train_test_split’ - Train Test Split

  • ‘cv’ - Cross Validation

  • ‘acf’ - Auto Correlation (ACF)

  • ‘pacf’ - Partial Auto Correlation (PACF)

  • ‘decomp’ - Classical Decomposition

  • ‘decomp_stl’ - STL Decomposition

  • ‘diagnostics’ - Diagnostics Plot

  • ‘diff’ - Difference Plot

  • ‘periodogram’ - Frequency Components (Periodogram)

  • ‘fft’ - Frequency Components (FFT)

  • ‘ccf’ - Cross Correlation (CCF)

  • ‘forecast’ - “Out-of-Sample” Forecast Plot

  • ‘insample’ - “In-Sample” Forecast Plot

  • ‘residuals’ - Residuals Plot

return_fig: bool, default = False

When set to True, it returns the figure used for plotting. When set to False (the default), it will print the plot, but not return it.

return_data: bool, default = False

When set to True, it returns the data for plotting. If both return_fig and return_data is set to True, order of return is figure then data.

verbose: bool, default = True

Unused for now

display_format: str, default = None

Display format of the plot. Must be one of [None, ‘streamlit’, ‘plotly-dash’, ‘plotly-widget’], if None, it will render the plot as a plain plotly figure.

The ‘plotly-dash’ and ‘plotly-widget’ formats will render the figure via plotly-resampler (https://github.com/predict-idlab/plotly-resampler) figures. These plots perform dynamic aggregation of the data based on the front-end graph view. This approach is especially useful when dealing with large data, as it will retain snappy, interactive performance. * ‘plotly-dash’ uses a dash-app to realize this dynamic aggregation. The

dash app requires a network port, and can be configured with various modes more information can be found at the show_dash documentation. (https://predict-idlab.github.io/plotly-resampler/figure_resampler.html#plotly_resampler.figure_resampler.FigureResampler.show_dash)

  • ‘plotly-widget’ uses a plotly FigureWidget to realize this dynamic aggregation, and should work in IPython based environments (given that the external widgets are supported and the jupyterlab-plotly extension is installed).

To display plots in Streamlit (https://www.streamlit.io/), set this to ‘streamlit’.

data_kwargs: dict, default = None

Dictionary of arguments passed to the data for plotting.

Available keys are:

nlags: The number of lags to use when plotting correlation plots, e.g.

ACF, PACF, CCF. If not provided, default internally calculated values are used.

seasonal_period: The seasonal period to use for decomposition plots.

If not provided, the default internally detected seasonal period is used.

type: The type of seasonal decomposition to perform. Options are:

[“additive”, “multiplicative”]

order_list: The differencing orders to use for difference plots. e.g.

[1, 2] will plot first and second order differences (corresponding to d = 1 and 2 in ARIMA models).

lags_list: An alternate and more explicit alternate to “order_list”

allowing users to specify the exact lags to plot. e.g. [1, [1, 12]] will plot first difference and a second plot with first difference (d = 1 in ARIMA) and seasonal 12th difference (D=1, s=12 in ARIMA models). Also note that “order_list” = [2] can be alternately specified as lags_list = [[1, 1]] i.e. successive differencing twice.

acf: True/False

When specified in difference plots and set to True, this will plot the ACF of the differenced data as well.

pacf: True/False

When specified in difference plots and set to True, this will plot the PACF of the differenced data as well.

periodogram: True/False

When specified in difference plots and set to True, this will plot the Periodogram of the differenced data as well.

fft: True/False

When specified in difference plots and set to True, this will plot the FFT of the differenced data as well.

labels: When estimator(s) are provided, the corresponding labels to

use for the plots. If not provided, the model class is used to derive the labels.

include: When data contains exogenous variables, then only specific

exogenous variables can be plotted using this key. e.g. include = [“col1”, “col2”]

exclude: When data contains exogenous variables, specific exogenous

variables can be excluded from the plots using this key. e.g. exclude = [“col1”, “col2”]

alpha: The quantile value to use for point prediction. If not provided,

then the value specified during setup is used.

coverage: The coverage value to use for prediction intervals. If not

provided, then the value specified during setup is used.

fh: The forecast horizon to use for forecasting. If not provided, then

the one used during model training is used.

X: When a model trained with exogenous variables has been finalized,

user can provide the future values of the exogenous variables to make future target time series predictions using this key.

plot_data_type: When plotting the data used for modeling, user may

wish to see plots with the original data set provided, the imputed dataset (if imputation is set) or the transformed dataset (which includes any imputation and transformation set by the user). This keyword can be used to specify which data type to use.

NOTE: (1) If no imputation is specified, then plotting the “imputed”

data type will produce the same results as the “original” data type.

  1. If no transformations are specified, then plotting the “transformed” data type will produce the same results as the “imputed” data type.

Allowed values are (if not specified, defaults to the first one in the list):

“ts”: [“original”, “imputed”, “transformed”] “train_test_split”: [“original”, “imputed”, “transformed”] “cv”: [“original”] “acf”: [“transformed”, “imputed”, “original”] “pacf”: [“transformed”, “imputed”, “original”] “decomp”: [“transformed”, “imputed”, “original”] “decomp_stl”: [“transformed”, “imputed”, “original”] “diagnostics”: [“transformed”, “imputed”, “original”] “diff”: [“transformed”, “imputed”, “original”] “forecast”: [“original”, “imputed”] “insample”: [“original”, “imputed”] “residuals”: [“original”, “imputed”] “periodogram”: [“transformed”, “imputed”, “original”] “fft”: [“transformed”, “imputed”, “original”] “ccf”: [“transformed”, “imputed”, “original”]

Some plots (marked as True below) will also allow specifying multiple of data types at once.

“ts”: True “train_test_split”: True “cv”: False “acf”: True “pacf”: True “decomp”: True “decomp_stl”: True “diagnostics”: True “diff”: False “forecast”: False “insample”: False “residuals”: False “periodogram”: True “fft”: True “ccf”: False

fig_kwargs: dict, default = {} (empty dict)

The setting to be used for the plot. Overrides any global setting passed during setup. Pass these as key-value pairs. For available keys, refer to the setup documentation.

Time-series plots support more display_formats, as a result the fig-kwargs can also contain the resampler_kwargs key and its corresponding dict. These are additional keyword arguments that are fed to the display function. This is mainly used for configuring plotly-resampler visualizations (i.e., display_format “plotly-dash” or “plotly-widget”) which down sampler will be used; how many data points are shown in the front-end.

When the plotly-resampler figure is rendered via Dash (by setting the display_format to “plotly-dash”), one can also use the “show_dash” key within this dictionary to configure the show_dash args.

example:

fig_kwargs = {
    "width": None,
    "resampler_kwargs":  {
        "default_n_shown_samples": 1000,
        "show_dash": {"mode": "inline", "port": 9012}
    }
}
save: string or bool, default = False

When set to True, Plot is saved as a ‘png’ file in current working directory. When a path destination is given, Plot is saved as a ‘png’ file the given path to the directory of choice.

Returns

Path to saved file and list containing figure and data, if any.

predict_model(estimator, fh=None, X: Optional[DataFrame] = None, return_pred_int: bool = False, alpha: Optional[float] = None, coverage: Union[float, List[float]] = 0.9, round: int = 4, verbose: bool = True) DataFrame

This function forecast using a trained model. When fh is None, it forecasts using the same forecast horizon used during the training.

Example

>>> from pycaret.datasets import get_data
>>> airline = get_data('airline')
>>> from pycaret.time_series import *
>>> exp_name = setup(data = airline,  fh = 12)
>>> arima = create_model('arima')
>>> pred_holdout = predict_model(arima)
>>> pred_unseen = predict_model(finalize_model(arima), fh = 24)
estimator: sktime compatible object

Trained model object

fh: Optional[Union[List[int], int, np.array, ForecastingHorizon]], default = None

Number of points from the last date of training to forecast. When fh is None, it forecasts using the same forecast horizon used during the training.

X: pd.DataFrame, default = None

Exogenous Variables to be used for prediction. Before finalizing the estimator, X need not be passed even when the estimator is built using exogenous variables (since this is taken care of internally by using the exogenous variables from test split). When estimator has been finalized and estimator used exogenous variables, then X must be passed.

return_pred_int: bool, default = False

When set to True, it returns lower bound and upper bound prediction interval, in addition to the point prediction.

alpha: Optional[float], default = None

The alpha (quantile) value to use for the point predictions. Refer to the “point_alpha” description in the setup docstring for details.

coverage: Union[float, List[float]], default = 0.9

The coverage to be used for prediction intervals. Refer to the “coverage” description in the setup docstring for details.

round: int, default = 4

Number of decimal places to round predictions to.

verbose: bool, default = True

When set to False, holdout score grid is not printed.

Returns

pandas.DataFrame

finalize_model(estimator, fit_kwargs: Optional[dict] = None, model_only: bool = False, experiment_custom_tags: Optional[Dict[str, Any]] = None) Any

This function trains a given estimator on the entire dataset including the holdout set.

Example

>>> from pycaret.datasets import get_data
>>> data = get_data('airline')
>>> from pycaret.time_series import *
>>> exp_name = setup(data = data, fh = 12)
>>> arima = create_model('arima')
>>> final_arima = finalize_model(arima)
estimator: sktime compatible object

Trained model object

fit_kwargs: dict, default = None

Dictionary of arguments passed to the fit method of the model.

model_only: bool, default = True

Parameter not in use for now. Behavior may change in future.

Returns

Trained pipeline or model object fitted on complete dataset.

deploy_model(model, model_name: str, authentication: dict, platform: str = 'aws')

This function deploys the transformation pipeline and trained model on cloud.

Example

>>> from pycaret.datasets import get_data
>>> data = get_data('airline')
>>> from pycaret.time_series import *
>>> exp_name = setup(data = data, fh = 12)
>>> arima = create_model('arima')
>>> deploy_model(
        model = arima, model_name = 'arima-for-deployment',
        platform = 'aws', authentication = {'bucket' : 'S3-bucket-name'}
    )
Amazon Web Service (AWS) users:

To deploy a model on AWS S3 (‘aws’), environment variables must be set in your local environment. To configure AWS environment variables, type aws configure in the command line. Following information from the IAM portal of amazon console account is required:

  • AWS Access Key ID

  • AWS Secret Key Access

  • Default Region Name (can be seen under Global settings on your AWS console)

More info: https://docs.aws.amazon.com/cli/latest/userguide/cli-configure-envvars.html

Google Cloud Platform (GCP) users:

To deploy a model on Google Cloud Platform (‘gcp’), project must be created using command line or GCP console. Once project is created, you must create a service account and download the service account key as a JSON file to set environment variables in your local environment.

More info: https://cloud.google.com/docs/authentication/production

Microsoft Azure (Azure) users:

To deploy a model on Microsoft Azure (‘azure’), environment variables for connection string must be set in your local environment. Go to settings of storage account on Azure portal to access the connection string required.

More info: https://docs.microsoft.com/en-us/azure/storage/blobs/storage-quickstart-blobs-python?toc=%2Fpython%2Fazure%2FTOC.json

model: scikit-learn compatible object

Trained model object

model_name: str

Name of model.

authentication: dict

Dictionary of applicable authentication tokens.

When platform = ‘aws’: {‘bucket’ : ‘S3-bucket-name’, ‘path’: (optional) folder name under the bucket}

When platform = ‘gcp’: {‘project’: ‘gcp-project-name’, ‘bucket’ : ‘gcp-bucket-name’}

When platform = ‘azure’: {‘container’: ‘azure-container-name’}

platform: str, default = ‘aws’

Name of the platform. Currently supported platforms: ‘aws’, ‘gcp’ and ‘azure’.

Returns

None

save_model(model, model_name: str, model_only: bool = False, verbose: bool = True)

This function saves the transformation pipeline and trained model object into the current working directory as a pickle file for later use.

Example

>>> from pycaret.datasets import get_data
>>> data = get_data('airline')
>>> from pycaret.time_series import *
>>> exp_name = setup(data = data, fh = 12)
>>> arima = create_model('arima')
>>> save_model(arima, 'saved_arima_model')
model: sktime compatible object

Trained model object

model_name: str

Name of the model.

model_only: bool, default = False

When set to True, only trained model object is saved instead of the entire pipeline.

verbose: bool, default = True

Success message is not printed when verbose is set to False.

Returns

Tuple of the model object and the filename.

load_model(model_name: str, platform: Optional[str] = None, authentication: Optional[Dict[str, str]] = None, verbose: bool = True)

This function loads a previously saved pipeline/model.

Example

>>> from pycaret.time_series import load_model
>>> saved_arima = load_model('saved_arima_model')
model_name: str

Name of the model.

platform: str, default = None

Name of the cloud platform. Currently supported platforms: ‘aws’, ‘gcp’ and ‘azure’.

authentication: dict, default = None

dictionary of applicable authentication tokens.

when platform = ‘aws’: {‘bucket’ : ‘S3-bucket-name’}

when platform = ‘gcp’: {‘project’: ‘gcp-project-name’, ‘bucket’ : ‘gcp-bucket-name’}

when platform = ‘azure’: {‘container’: ‘azure-container-name’}

verbose: bool, default = True

Success message is not printed when verbose is set to False.

Returns

Trained Model

models(type: Optional[str] = None, internal: bool = False, raise_errors: bool = True) DataFrame

Returns table of models available in the model library.

Example

>>> from pycaret.datasets import get_data
>>> data = get_data('airline')
>>> from pycaret.time_series import *
>>> exp_name = setup(data = data, fh = 12)
>>> models()
type: str, default = None
  • baseline : filters and only return baseline models

  • classical : filters and only return classical models

  • linear : filters and only return linear models

  • tree : filters and only return tree based models

  • neighbors : filters and only return neighbors models

internal: bool, default = False

When True, will return extra columns and rows used internally.

raise_errors: bool, default = True

When False, will suppress all exceptions, ignoring models that couldn’t be created.

Returns

pandas.DataFrame

get_metrics(reset: bool = False, include_custom: bool = True, raise_errors: bool = True) DataFrame

Returns table of available metrics used for CV.

Example

>>> from pycaret.datasets import get_data
>>> airline = get_data('airline')
>>> from pycaret.time_series import *
>>> exp_name = setup(data = airline,  fh = 12)
>>> all_metrics = get_metrics()
reset: bool, default = False

When True, will reset all changes made using the add_metric and remove_metric function.

include_custom: bool, default = True

Whether to include user added (custom) metrics or not.

raise_errors: bool, default = True

If False, will suppress all exceptions, ignoring models that couldn’t be created.

Returns

pandas.DataFrame

add_metric(id: str, name: str, score_func: type, greater_is_better: bool = True, **kwargs) Series

Adds a custom metric to be used for CV.

Example

>>> from pycaret.datasets import get_data
>>> airline = get_data('airline')
>>> from pycaret.time_series import *
>>> exp_name = setup(data = airline,  fh = 12)
>>> from sklearn.metrics import explained_variance_score
>>> add_metric('evs', 'EVS', explained_variance_score)
id: str

Unique id for the metric.

name: str

Display name of the metric.

score_func: type

Score function (or loss function) with signature score_func(y, y_pred, **kwargs).

greater_is_better: bool, default = True

Whether score_func is higher the better or not.

**kwargs:

Arguments to be passed to score function.

Returns

pandas.Series

remove_metric(name_or_id: str)

Removes a metric from CV.

Example

>>> from pycaret.datasets import get_data
>>> data = get_data('airline')
>>> from pycaret.time_series import *
>>> exp_name = setup(data = data, fh = 12)
>>> remove_metric('MAPE')
name_or_id: str

Display name or ID of the metric.

Returns

None

get_logs(experiment_name: Optional[str] = None, save: bool = False) DataFrame

Returns a table of experiment logs. Only works when log_experiment is True when initializing the setup function.

Example

>>> from pycaret.datasets import get_data
>>> data = get_data('airline')
>>> from pycaret.time_series import *
>>> exp_name = setup(data = data, fh = 12)
>>> best = compare_models()
>>> exp_logs = get_logs()
experiment_name: str, default = None

When None current active run is used.

save: bool, default = False

When set to True, csv file is saved in current working directory.

Returns

pandas.DataFrame

get_fold_generator(fold: Optional[Union[int, Any]] = None, fold_strategy: Optional[str] = None) Union[ExpandingWindowSplitter, SlidingWindowSplitter]

Returns the cv object based on number of folds and fold_strategy

Parameters
  • fold (Optional[Union[int, Any]]) – The number of folds (int), by default None which returns the fold generator (cv object) defined during setup. Could also be a sktime cross-validation object. If it is a sktime cross-validation object, it is simply returned back

  • fold_strategy (Optional[str], optional) – The fold strategy - ‘expanding’ or ‘sliding’, by default None which takes the strategy set during setup

Returns

The sktime compatible cross-validation object. e.g. ExpandingWindowSplitter or SlidingWindowSplitter

Return type

Union[ExpandingWindowSplitter, SlidingWindowSplitter]

Raises

ValueError – If not enough data points to support the number of folds requested

check_stats(estimator: Optional[Any] = None, test: str = 'all', alpha: float = 0.05, split: str = 'all', data_type: str = 'transformed', data_kwargs: Optional[Dict] = None) DataFrame

This function is used to get summary statistics and run statistical tests on the original data or model residuals.

Example

>>> from pycaret.datasets import get_data
>>> airline = get_data('airline')
>>> from pycaret.time_series import *
>>> exp_name = setup(data = airline,  fh = 12)
>>> check_stats(test="summary")
>>> check_stats(test="adf")
>>> arima = create_model('arima')
>>> check_stats(arima, test = 'white_noise')
Parameters

estimator (sktime compatible object, optional) – Trained model object, by default None

teststr, optional

Name of the test to be performed, by default “all”

Options are:

  • ‘summary’ - Summary Statistics

  • ‘white_noise’ - Ljung-Box Test for white noise

  • ‘adf’ - ADF test for difference stationarity

  • ‘kpss’ - KPSS test for trend stationarity

  • ‘stationarity’ - ADF and KPSS test

  • ‘normality’ - Shapiro Test for Normality

  • ‘all’ - All of the above tests

alphafloat, optional

Significance Level, by default 0.05

splitstr, optional

The split of the original data to run the test on. Only applicable when test is run on the original data (not residuals), by default “all”

Options are:

  • ‘all’ - Complete Dataset

  • ‘train’ - The Training Split of the dataset

  • ‘test’ - The Test Split of the dataset

data_typestr, optional

The data type to use for the statistical test, by default “transformed”.

User may wish to perform the tests on the original data set provided, the imputed dataset (if imputation is set) or the transformed dataset (which includes any imputation and transformation set by the user). This keyword can be used to specify which data type to use.

Allowed values are: [“original”, “imputed”, “transformed”]

NOTE: (1) If no imputation is specified, then testing on the “imputed”

data type will produce the same results as the “original” data type.

  1. If no transformations are specified, then testing the “transformed” data type will produce the same results as the “imputed” data type.

  2. By default, tests are done on the “transformed” data since that is the data that is fed to the model during training.

data_kwargsOptional[Dict], optional

Users can specify lags list or order_list to run the test for the data as well as for its lagged versions, by default None

>>> check_stats(test="white_noise", data_kwargs={"order_list": [1, 2]})
>>> check_stats(test="white_noise", data_kwargs={"lags_list": [1, [1, 12]]})

Returns:

pd.DataFrame

Dataframe with the test results

get_residuals(estimator: BaseForecaster) Optional[Series]

_summary_

Parameters

estimator (BaseForecaster) – sktime compatible model (without the pipeline). i.e. last step of the pipeline TransformedTargetForecaster

Returns

Insample residuals. None if estimator does not support insample predictions

Return type

Optional[pd.Series]

References

https://github.com/sktime/sktime/issues/1105#issuecomment-932216820

get_insample_predictions(estimator: BaseForecaster) Optional[DataFrame]

Returns the insample predictions for the estimator by appropriately taking the entire pipeline into consideration.

Parameters

estimator (BaseForecaster) – sktime compatible model (without the pipeline). i.e. last step of the pipeline TransformedTargetForecaster

Returns

Insample predictions. None if estimator does not support insample predictions

Return type

Optional[pd.DataFrame]

References

# https://github.com/sktime/sktime/issues/1105#issuecomment-932216820 # https://github.com/sktime/sktime/blob/87bdf36dbc0990f29942eb6f7fa56a8e6c5fa7b7/sktime/forecasting/base/_base.py#L699

get_additional_scorer_kwargs() Dict[str, Any]

Returns additional kwargs required by some scorers (such as MASE).

NOTE: These are kwargs that are experiment specific (can only be derived from the experiment), e.g. sp and not fold specific like y_train. In other words, these kwargs are applicable to all folds. Fold specific kwargs such as y_train, lower, upper, etc. must be updated dynamically.

Returns

Additional kwargs to pass to scorers

Return type

Dict[str, Any]

pycaret.time_series.setup(data: Union[Series, DataFrame] = None, data_func: Optional[Callable[[], Union[Series, DataFrame]]] = None, target: Optional[str] = None, index: Optional[str] = None, ignore_features: Optional[List] = None, numeric_imputation_target: Optional[Union[str, int, float]] = None, numeric_imputation_exogenous: Optional[Union[str, int, float]] = None, transform_target: Optional[str] = None, transform_exogenous: Optional[str] = None, fe_target_rr: Optional[list] = None, fe_exogenous: Optional[list] = None, scale_target: Optional[str] = None, scale_exogenous: Optional[str] = None, fold_strategy: Union[str, Any] = 'expanding', fold: int = 3, fh: Optional[Union[List[int], int, ndarray, ForecastingHorizon]] = 1, hyperparameter_split: str = 'all', seasonal_period: Optional[Union[List[Union[int, str]], int, str]] = None, ignore_seasonality_test: bool = False, sp_detection: str = 'auto', max_sp_to_consider: Optional[int] = 60, remove_harmonics: bool = False, harmonic_order_method: str = 'harmonic_max', num_sps_to_use: int = 1, seasonality_type: str = 'mul', point_alpha: Optional[float] = None, coverage: Union[float, List[float]] = 0.9, enforce_exogenous: bool = True, n_jobs: Optional[int] = -1, use_gpu: bool = False, custom_pipeline: Optional[Any] = None, html: bool = True, session_id: Optional[int] = None, system_log: Union[bool, str, Logger] = True, log_experiment: bool = False, experiment_name: Optional[str] = None, log_plots: Union[bool, list] = False, log_profile: bool = False, log_data: bool = False, verbose: bool = True, profile: bool = False, profile_kwargs: Optional[Dict[str, Any]] = None, fig_kwargs: Optional[Dict[str, Any]] = None)

This function initializes the training environment and creates the transformation pipeline. Setup function must be called before executing any other function. It takes one mandatory parameters: data. All the other parameters are optional.

Example

>>> from pycaret.datasets import get_data
>>> airline = get_data('airline')
>>> from pycaret.time_series import *
>>> exp_name = setup(data = airline,  fh = 12)
datapandas.Series or pandas.DataFrame = None

Shape (n_samples, 1), when pandas.DataFrame, otherwise (n_samples, ).

data_func: Callable[[], Union[pd.Series, pd.DataFrame]] = None

The function that generate data (the dataframe-like input). This is useful when the dataset is large, and you need parallel operations such as compare_models. It can avoid broadcasting large dataset from driver to workers. Notice one and only one of data and data_func must be set.

targetOptional[str], default = None

Target name to be forecasted. Must be specified when data is a pandas DataFrame with more than 1 column. When data is a pandas Series or pandas DataFrame with 1 column, this can be left as None.

index: Optional[str], default = None

Column name to be used as the datetime index for modeling. If ‘index’ column is specified & is of type string, it is assumed to be coercible to pd.DatetimeIndex using pd.to_datetime(). It can also be of type Int (e.g. RangeIndex, Int64Index), or DatetimeIndex or PeriodIndex in which case, it is processed appropriately. If None, then the data’s index is used as is for modeling.

ignore_features: Optional[List], default = None

List of features to ignore for modeling when the data is a pandas Dataframe with more than 1 column. Ignored when data is a pandas Series or Dataframe with 1 column.

numeric_imputation_target: Optional[Union[int, float, str]], default = None

Indicates how to impute missing values in the target. If None, no imputation is done. If the target has missing values, then imputation is mandatory. If str, then value passed as is to the underlying sktime imputer. Allowed values are:

“drift”, “linear”, “nearest”, “mean”, “median”, “backfill”, “bfill”, “pad”, “ffill”, “random”

If int or float, imputation method is set to “constant” with the given value.

numeric_imputation_exogenous: Optional[Union[int, float, str]], default = None

Indicates how to impute missing values in the exogenous variables. If None, no imputation is done. If exogenous variables have missing values, then imputation is mandatory. If str, then value passed as is to the underlying sktime imputer. Allowed values are:

“drift”, “linear”, “nearest”, “mean”, “median”, “backfill”, “bfill”, “pad”, “ffill”, “random”

If int or float, imputation method is set to “constant” with the given value.

transform_target: Optional[str], default = None

Indicates how the target variable should be transformed. If None, no transformation is performed. Allowed values are

“box-cox”, “log”, “sqrt”, “exp”, “cos”

transform_exogenous: Optional[str], default = None

Indicates how the exogenous variables should be transformed. If None, no transformation is performed. Allowed values are

“box-cox”, “log”, “sqrt”, “exp”, “cos”

scale_target: Optional[str], default = None

Indicates how the target variable should be scaled. If None, no scaling is performed. Allowed values are

“zscore”, “minmax”, “maxabs”, “robust”

scale_exogenous: Optional[str], default = None

Indicates how the exogenous variables should be scaled. If None, no scaling is performed. Allowed values are

“zscore”, “minmax”, “maxabs”, “robust”

fe_target_rr: Optional[list], default = None

The transformers to be applied to the target variable in order to extract useful features. By default, None which means that the provided target variable are used “as is”.

NOTE: Most statistical and baseline models already use features (lags) for target variables implicitly. The only place where target features have to be created explicitly is in reduced regression models. Hence, this feature extraction is only applied to reduced regression models.

>>> import numpy as np
>>> from pycaret.datasets import get_data
>>> from sktime.transformations.series.summarize import WindowSummarizer
>>> data = get_data("airline")
>>> kwargs = {"lag_feature": {"lag": [36, 24, 13, 12, 11, 9, 6, 3, 2, 1]}}
>>> fe_target_rr = [WindowSummarizer(n_jobs=1, truncate="bfill", **kwargs)]
>>> # Baseline
>>> exp = TSForecastingExperiment()
>>> exp.setup(data=data, fh=12, fold=3, session_id=42)
>>> model1 = exp.create_model("lr_cds_dt")
>>> # With Feature Engineering
>>> exp = TSForecastingExperiment()
>>> exp.setup(
>>>     data=data, fh=12, fold=3, fe_target_rr=fe_target_rr, session_id=42
>>> )
>>> model2 = exp.create_model("lr_cds_dt")
>>> exp.plot_model([model1, model2], data_kwargs={"labels": ["Baseline", "With FE"]})
fe_exogenousOptional[list] = None

The transformations to be applied to the exogenous variables. These transformations are used for all models that accept exogenous variables. By default, None which means that the provided exogenous variables are used “as is”.

>>> import numpy as np
>>> from sktime.transformations.series.summarize import WindowSummarizer
>>> # Example: function num_above_thresh to count how many observations lie above
>>> # the threshold within a window of length 2, lagged by 0 periods.
>>> def num_above_thresh(x):
>>>     '''Count how many observations lie above threshold.'''
>>>     return np.sum((x > 0.7)[::-1])
>>> kwargs1 = {"lag_feature": {"lag": [0, 1], "mean": [[0, 4]]}}
>>> kwargs2 = {
>>>     "lag_feature": {
>>>         "lag": [0, 1], num_above_thresh: [[0, 2]],
>>>         "mean": [[0, 4]], "std": [[0, 4]]
>>>     }
>>> }
>>> fe_exogenous = [
>>>     (
            "a", WindowSummarizer(
>>>             n_jobs=1, target_cols=["Income"], truncate="bfill", **kwargs1
>>>         )
>>>     ),
>>>     (
>>>         "b", WindowSummarizer(
>>>             n_jobs=1, target_cols=["Unemployment", "Production"], truncate="bfill", **kwargs2
>>>         )
>>>     ),
>>> ]
>>> data = get_data("uschange")
>>> exp = TSForecastingExperiment()
>>> exp.setup(
>>>     data=data, target="Consumption", fh=12,
>>>     fe_exogenous=fe_exogenous, session_id=42
>>> )
>>> print(f"Feature Columns: {exp.get_config('X_transformed').columns}")
>>> model = exp.create_model("lr_cds_dt")
fold_strategy: str or sklearn CV generator object, default = ‘expanding’

Choice of cross validation strategy. Possible values are:

  • ‘expanding’

  • ‘rolling’ (same as/aliased to ‘expanding’)

  • ‘sliding’

You can also pass an sktime compatible cross validation object such as SlidingWindowSplitter or ExpandingWindowSplitter. In this case, the fold and fh parameters will be ignored and these values will be extracted from the fold_strategy object directly.

fold: int, default = 3

Number of folds to be used in cross validation. Must be at least 2. This is a global setting that can be over-written at function level by using fold parameter. Ignored when fold_strategy is a custom object.

fh: Optional[int or list or np.array or ForecastingHorizon], default = 1

The forecast horizon to be used for forecasting. Default is set to 1 i.e. forecast one point ahead. Valid options are: (1) Integer: When integer is passed it means N continuous points in

the future without any gap.

  1. List or np.array: Indicates points to predict in the future. e.g. fh = [1, 2, 3, 4] or np.arange(1, 5) will predict 4 points in the future.

  2. If you want to forecast values with gaps, you can pass an list or array with gaps. e.g. np.arange([13, 25]) will skip the first 12 future points and forecast from the 13th point till the 24th point ahead (note in numpy right value is inclusive and left is exclusive).

  3. Can also be a sktime compatible ForecastingHorizon object.

  4. If fh = None, then fold_strategy must be a sktime compatible cross validation object. In this case, fh is derived from this object.

hyperparameter_split: str, default = “all”

The split of data used to determine certain hyperparameters such as “seasonal_period”, whether multiplicative seasonality can be used or not, whether the data is white noise or not, the values of non-seasonal difference “d” and seasonal difference “D” to use in certain models. Allowed values are: [“all”, “train”]. Refer for more details: https://github.com/pycaret/pycaret/issues/3202

seasonal_period: list or int or str, default = None

Seasonal periods to use when performing seasonality checks (i.e. candidates).

Users can provide seasonal_period by passing it as an integer or a string corresponding to the keys below (e.g. ‘W’ for weekly data, ‘M’ for monthly data, etc.).

  • B, C = 5

  • D = 7

  • W = 52

  • M, BM, CBM, MS, BMS, CBMS = 12

  • SM, SMS = 24

  • Q, BQ, QS, BQS = 4

  • A, Y, BA, BY, AS, YS, BAS, BYS = 1

  • H = 24

  • T, min = 60

  • S = 60

Users can also provide a list of such values to use in models that accept multiple seasonal values (currently TBATS). For models that don’t accept multiple seasonal values, the first value of the list will be used as the seasonal period.

NOTE: (1) If seasonal_period is provided, whether the seasonality check is performed or not depends on the ignore_seasonality_test setting. (2) If seasonal_period is not provided, then the candidates are detected per the sp_detection setting. If seasonal_period is provided, sp_detection setting is ignored.

ignore_seasonality_test: bool = False

Whether to ignore the seasonality test or not. Applicable when seasonal_period is provided. If False, then a seasonality tests is performed to determine if the provided seasonal_period is valid or not. If it is found to be not valid, no seasonal period is used for modeling. If True, then the the provided seasonal_period is used as is.

sp_detection: str, default = “auto”

If seasonal_period is None, then this parameter determines the algorithm to use to detect the seasonal periods to use in the models.

Allowed values are [“auto” or “index”].

If “auto”, then seasonal periods are detected using statistical tests. If “index”, then the frequency of the data index is mapped to a seasonal period as shown in seasonal_period.

max_sp_to_consider: Optional[int], default = 60,

Max period to consider when detecting seasonal periods. If None, all periods up to int((“length of data”-1)/2) are considered. Length of the data is determined by hyperparameter_split setting.

remove_harmonics: bool, default = False

Should harmonics be removed when considering what seasonal periods to use for modeling.

harmonic_order_method: str, default = “harmonic_max”

Applicable when remove_harmonics = True. This determines how the harmonics are replaced. Allowed values are “harmonic_strength”, “harmonic_max” or “raw_strength. - If set to “harmonic_max”, then lower seasonal period is replaced by its highest harmonic seasonal period in same position as the lower seasonal period. - If set to “harmonic_strength”, then lower seasonal period is replaced by its highest strength harmonic seasonal period in same position as the lower seasonal period. - If set to “raw_strength”, then lower seasonal periods is removed and the higher harmonic seasonal periods is retained in its original position based on its seasonal strength.

e.g. Assuming detected seasonal periods in strength order are [2, 3, 4, 50] and remove_harmonics = True, then: - If harmonic_order_method = “harmonic_max”, result = [50, 3, 4] - If harmonic_order_method = “harmonic_strength”, result = [4, 3, 50] - If harmonic_order_method = “raw_strength”, result = [3, 4, 50]

num_sps_to_use: int, default = 1

It determines the maximum number of seasonal periods to use in the models. Set to -1 to use all detected seasonal periods (in models that allow multiple seasonalities). If a model only allows one seasonal period and num_sps_to_use > 1, then the most dominant (primary) seasonal that is detected is used.

seasonality_typestr, default = “mul”

The type of seasonality to use. Allowed values are [“add”, “mul” or “auto”]

The detection flow sequence is as follows: (1) If seasonality is not detected, then seasonality type is set to None. (2) If seasonality is detected but data is not strictly positive, then seasonality type is set to “add”. (3) If seasonality_type is “auto”, then the type of seasonality is determined using an internal algorithm as follows

  • If seasonality is detected, then data is decomposed using

additive and multiplicative seasonal decomposition. Then seasonality type is selected based on seasonality strength per FPP (https://otexts.com/fpp2/seasonal-strength.html). NOTE: For Multiplicative, the denominator multiplies the seasonal and residual components instead of adding them. Rest of the calculations remain the same. If seasonal decomposition fails for any reason, then defaults to multiplicative seasonality.

  1. Otherwise, seasonality_type is set to the user provided value.

point_alpha: Optional[float], default = None

The alpha (quantile) value to use for the point predictions. By default this is set to None which uses sktime’s predict() method to get the point prediction (the mean or the median of the forecast distribution). If this is set to a floating point value, then it switches to using the predict_quantiles() method to get the point prediction at the user specified quantile. Reference: https://robjhyndman.com/hyndsight/quantile-forecasts-in-r/

NOTE: (1) Not all models support predict_quantiles(), hence, if a float value is provided, these models will be disabled. (2) Under some conditions, the user may want to only work with models that support prediction intervals. Utilizing note 1 to our advantage, the point_alpha argument can be set to 0.5 (or any float value depending on the quantile that the user wants to use for point predictions). This will disable models that do not support prediction intervals.

coverage: Union[float, List[float]], default = 0.9

The coverage to be used for prediction intervals (only applicable for models that support prediction intervals).

If a float value is provides, it corresponds to the coverage needed (e.g. 0.9 means 90% coverage). This corresponds to lower and upper quantiles = 0.05 and 0.95 respectively.

Alternately, if user wants to get the intervals at specific quantiles, a list of 2 values can be provided directly. e.g. coverage = [0.2. 0.9] will return the lower interval corresponding to a quantile of 0.2 and an upper interval corresponding to a quantile of 0.9.

enforce_exogenous: bool, default = True

When set to True and the data includes exogenous variables, only models that support exogenous variables are loaded in the environment.When set to False, all models are included and in this case, models that do not support exogenous variables will model the data as a univariate forecasting problem.

n_jobs: int, default = -1

The number of jobs to run in parallel (for functions that supports parallel processing) -1 means using all processors. To run all functions on single processor set n_jobs to None.

use_gpu: bool or str, default = False

Parameter not in use for now. Behavior may change in future.

custom_pipeline: list of (str, transformer), dict or Pipeline, default = None

Parameter not in use for now. Behavior may change in future.

html: bool, default = True

When set to False, prevents runtime display of monitor. This must be set to False when the environment does not support IPython. For example, command line terminal, Databricks Notebook, Spyder and other similar IDEs.

session_id: int, default = None

Controls the randomness of experiment. It is equivalent to ‘random_state’ in scikit-learn. When None, a pseudo random number is generated. This can be used for later reproducibility of the entire experiment.

system_log: bool or str or logging.Logger, default = True

Whether to save the system logging file (as logs.log). If the input is a string, use that as the path to the logging file. If the input already is a logger object, use that one instead.

log_experiment: bool, default = False

When set to True, all metrics and parameters are logged on the MLflow server.

experiment_name: str, default = None

Name of the experiment for logging. Ignored when log_experiment is not True.

log_plots: bool or list, default = False

When set to True, certain plots are logged automatically in the MLFlow server. To change the type of plots to be logged, pass a list containing plot IDs. Refer to documentation of plot_model. Ignored when log_experiment is not True.

log_profile: bool, default = False

When set to True, data profile is logged on the MLflow server as a html file. Ignored when log_experiment is not True.

log_data: bool, default = False

When set to True, dataset is logged on the MLflow server as a csv file. Ignored when log_experiment is not True.

verbose: bool, default = True

When set to False, Information grid is not printed.

profile: bool, default = False

When set to True, an interactive EDA report is displayed.

profile_kwargs: dict, default = {} (empty dict)

Dictionary of arguments passed to the ProfileReport method used to create the EDA report. Ignored if profile is False.

fig_kwargs: dict, default = {} (empty dict)

The global setting for any plots. Pass these as key-value pairs. Example: fig_kwargs = {“height”: 1000, “template”: “simple_white”}

Available keys are:

hoverinfo: hoverinfo passed to Plotly figures. Can be any value supported

by Plotly (e.g. “text” to display, “skip” or “none” to disable.). When not provided, hovering over certain plots may be disabled by PyCaret when the data exceeds a certain number of points (determined by big_data_threshold).

renderer: The renderer used to display the plotly figure. Can be any value

supported by Plotly (e.g. “notebook”, “png”, “svg”, etc.). Note that certain renderers (like “svg”) may need additional libraries to be installed. Users will have to do this manually since they don’t come preinstalled with plotly. When not provided, plots use plotly’s default render when data is below a certain number of points (determined by big_data_threshold) otherwise it switches to a static “png” renderer.

template: The template to use for the plots. Can be any value supported by Plotly.

If not provided, defaults to “ggplot2”

width: The width of the plot in pixels. If not provided, defaults to None

which lets Plotly decide the width.

height: The height of the plot in pixels. If not provided, defaults to None

which lets Plotly decide the height.

rows: The number of rows to use for plots where this can be customized,

e.g. ccf. If not provided, defaults to None which lets PyCaret decide based on number of subplots to be plotted.

cols: The number of columns to use for plots where this can be customized,

e.g. ccf. If not provided, defaults to 4

big_data_threshold: The number of data points above which hovering over

certain plots can be disabled and/or renderer switched to a static renderer. This is useful when the time series being modeled has a lot of data which can make notebooks slow to render. Also note that setting the display_format to a plotly-resampler figure (“plotly-dash” or “plotly-widget”) can circumvent these problems by performing dynamic data aggregation.

resampler_kwargs: The keyword arguments that are fed to configure the

plotly-resampler visualizations (i.e., display_format “plotly-dash” or “plotly-widget”) which down sampler will be used; how many data points are shown in the front-end. When the plotly-resampler figure is rendered via Dash (by setting the display_format to “plotly-dash”), one can also use the “show_dash” key within this dictionary to configure the show_dash method its args.

example:

fig_kwargs = {
    ...,
    "resampler_kwargs":  {
        "default_n_shown_samples": 1000,
        "show_dash": {"mode": "inline", "port": 9012}
    }
}
Returns

Global variables that can be changed using the set_config function.

pycaret.time_series.create_model(estimator: Union[str, Any], fold: Optional[Union[int, Any]] = None, round: int = 4, cross_validation: bool = True, fit_kwargs: Optional[dict] = None, engine: Optional[str] = None, verbose: bool = True, **kwargs)

This function trains and evaluates the performance of a given estimator using cross validation. The output of this function is a score grid with CV scores by fold. Metrics evaluated during CV can be accessed using the get_metrics function. Custom metrics can be added or removed using add_metric and remove_metric function. All the available models can be accessed using the models function.

Example

>>> from pycaret.datasets import get_data
>>> airline = get_data('airline')
>>> from pycaret.time_series import *
>>> exp_name = setup(data = airline,  fh = 12)
>>> naive = create_model('naive')
estimator: str or sktime compatible object

ID of an estimator available in model library or pass an untrained model object consistent with scikit-learn API. Estimators available in the model library (ID - Name):

NOTE: The available estimators depend on multiple factors such as what libraries have been installed and the setup of the experiment. As such, some of these may not be available for your experiment. To see the list of available models, please run setup() first, then models().

  • ‘naive’ - Naive Forecaster

  • ‘grand_means’ - Grand Means Forecaster

  • ‘snaive’ - Seasonal Naive Forecaster (disabled when seasonal_period = 1)

  • ‘polytrend’ - Polynomial Trend Forecaster

  • ‘arima’ - ARIMA family of models (ARIMA, SARIMA, SARIMAX)

  • ‘auto_arima’ - Auto ARIMA

  • ‘exp_smooth’ - Exponential Smoothing

  • ‘stlf’ - STL Forecaster

  • ‘croston’ - Croston Forecaster

  • ‘ets’ - ETS

  • ‘theta’ - Theta Forecaster

  • ‘tbats’ - TBATS

  • ‘bats’ - BATS

  • ‘prophet’ - Prophet Forecaster

  • ‘lr_cds_dt’ - Linear w/ Cond. Deseasonalize & Detrending

  • ‘en_cds_dt’ - Elastic Net w/ Cond. Deseasonalize & Detrending

  • ‘ridge_cds_dt’ - Ridge w/ Cond. Deseasonalize & Detrending

  • ‘lasso_cds_dt’ - Lasso w/ Cond. Deseasonalize & Detrending

  • ‘llar_cds_dt’ - Lasso Least Angular Regressor w/ Cond. Deseasonalize & Detrending

  • ‘br_cds_dt’ - Bayesian Ridge w/ Cond. Deseasonalize & Deseasonalize & Detrending

  • ‘huber_cds_dt’ - Huber w/ Cond. Deseasonalize & Detrending

  • ‘omp_cds_dt’ - Orthogonal Matching Pursuit w/ Cond. Deseasonalize & Detrending

  • ‘knn_cds_dt’ - K Neighbors w/ Cond. Deseasonalize & Detrending

  • ‘dt_cds_dt’ - Decision Tree w/ Cond. Deseasonalize & Detrending

  • ‘rf_cds_dt’ - Random Forest w/ Cond. Deseasonalize & Detrending

  • ‘et_cds_dt’ - Extra Trees w/ Cond. Deseasonalize & Detrending

  • ‘gbr_cds_dt’ - Gradient Boosting w/ Cond. Deseasonalize & Detrending

  • ‘ada_cds_dt’ - AdaBoost w/ Cond. Deseasonalize & Detrending

  • ‘lightgbm_cds_dt’ - Light Gradient Boosting w/ Cond. Deseasonalize & Detrending

  • ‘catboost_cds_dt’ - CatBoost w/ Cond. Deseasonalize & Detrending

fold: int or scikit-learn compatible CV generator, default = None

Controls cross-validation. If None, the CV generator in the fold_strategy parameter of the setup function is used. When an integer is passed, it is interpreted as the ‘n_splits’ parameter of the CV generator in the setup function.

round: int, default = 4

Number of decimal places the metrics in the score grid will be rounded to.

cross_validation: bool, default = True

When set to False, metrics are evaluated on holdout set. fold param is ignored when cross_validation is set to False.

fit_kwargs: dict, default = {} (empty dict)

Dictionary of arguments passed to the fit method of the model.

engine: Optional[str] = None

The engine to use for the model, e.g. for auto_arima, users can switch between “pmdarima” and “statsforecast” by specifying engine=”statsforecast”.

verbose: bool, default = True

Score grid is not printed when verbose is set to False.

**kwargs:

Additional keyword arguments to pass to the estimator.

Returns

Trained Model

Warning

  • Models are not logged on the MLFlow server when cross_validation param is set to False.

pycaret.time_series.compare_models(include: Optional[List[Union[str, Any]]] = None, exclude: Optional[List[str]] = None, fold: Optional[Union[int, Any]] = None, round: int = 4, cross_validation: bool = True, sort: str = 'MASE', n_select: int = 1, budget_time: Optional[float] = None, turbo: bool = True, errors: str = 'ignore', fit_kwargs: Optional[dict] = None, engine: Optional[Dict[str, str]] = None, verbose: bool = True, parallel: Optional[ParallelBackend] = None)

This function trains and evaluates performance of all estimators available in the model library using cross validation. The output of this function is a score grid with average cross validated scores. Metrics evaluated during CV can be accessed using the get_metrics function. Custom metrics can be added or removed using add_metric and remove_metric function.

Example

>>> from pycaret.datasets import get_data
>>> airline = get_data('airline')
>>> from pycaret.time_series import *
>>> exp_name = setup(data = airline,  fh = 12)
>>> best_model = compare_models()
include: list of str or sktime compatible object, default = None

To train and evaluate select models, list containing model ID or scikit-learn compatible object can be passed in include param. To see a list of all models available in the model library use the models function.

exclude: list of str, default = None

To omit certain models from training and evaluation, pass a list containing model id in the exclude parameter. To see a list of all models available in the model library use the models function.

fold: int or scikit-learn compatible CV generator, default = None

Controls cross-validation. If None, the CV generator in the fold_strategy parameter of the setup function is used. When an integer is passed, it is interpreted as the ‘n_splits’ parameter of the CV generator in the setup function.

round: int, default = 4

Number of decimal places the metrics in the score grid will be rounded to.

cross_validation: bool, default = True

When set to False, metrics are evaluated on holdout set. fold param is ignored when cross_validation is set to False.

sort: str, default = ‘MASE’

The sort order of the score grid. It also accepts custom metrics that are added through the add_metric function.

n_select: int, default = 1

Number of top_n models to return. For example, to select top 3 models use n_select = 3.

budget_time: int or float, default = None

If not None, will terminate execution of the function after budget_time minutes have passed and return results up to that point.

turbo: bool, default = True

When set to True, it excludes estimators with longer training times. To see which algorithms are excluded use the models function.

errors: str, default = ‘ignore’

When set to ‘ignore’, will skip the model with exceptions and continue. If ‘raise’, will break the function when exceptions are raised.

fit_kwargs: dict, default = {} (empty dict)

Dictionary of arguments passed to the fit method of the model.

engine: Optional[Dict[str, str]] = None

The engine to use for the models, e.g. for auto_arima, users can switch between “pmdarima” and “statsforecast” by specifying engine={“auto_arima”: “statsforecast”}

verbose: bool, default = True

Score grid is not printed when verbose is set to False.

parallel: pycaret.internal.parallel.parallel_backend.ParallelBackend, default = None

A ParallelBackend instance. For example if you have a SparkSession session, you can use FugueBackend(session) to make this function running using Spark. For more details, see FugueBackend

Returns

Trained model or list of trained models, depending on the n_select param.

Warning

  • Changing turbo parameter to False may result in very high training times.

  • No models are logged in MLflow when cross_validation parameter is False.

pycaret.time_series.tune_model(estimator, fold: Optional[Union[int, Any]] = None, round: int = 4, n_iter: int = 10, custom_grid: Optional[Union[Dict[str, list], Any]] = None, optimize: str = 'MASE', custom_scorer=None, search_algorithm: Optional[str] = None, choose_better: bool = True, fit_kwargs: Optional[dict] = None, return_tuner: bool = False, verbose: bool = True, tuner_verbose: Union[int, bool] = True, **kwargs)

This function tunes the hyperparameters of a given estimator. The output of this function is a score grid with CV scores by fold of the best selected model based on optimize parameter. Metrics evaluated during CV can be accessed using the get_metrics function. Custom metrics can be added or removed using add_metric and remove_metric function.

Example

>>> from pycaret.datasets import get_data
>>> airline = get_data('airline')
>>> from pycaret.time_series import *
>>> exp_name = setup(data = airline,  fh = 12)
>>> dt = create_model('dt_cds_dt')
>>> tuned_dt = tune_model(dt)
estimator: sktime compatible object

Trained model object

fold: int or scikit-learn compatible CV generator, default = None

Controls cross-validation. If None, the CV generator in the fold_strategy parameter of the setup function is used. When an integer is passed, it is interpreted as the ‘n_splits’ parameter of the CV generator in the setup function.

round: int, default = 4

Number of decimal places the metrics in the score grid will be rounded to.

n_iter: int, default = 10

Number of iterations in the grid search. Increasing ‘n_iter’ may improve model performance but also increases the training time.

custom_grid: dictionary, default = None

To define custom search space for hyperparameters, pass a dictionary with parameter name and values to be iterated. Custom grids must be in a format supported by the defined search_library.

optimize: str, default = ‘MASE’

Metric name to be evaluated for hyperparameter tuning. It also accepts custom metrics that are added through the add_metric function.

custom_scorer: object, default = None

custom scoring strategy can be passed to tune hyperparameters of the model. It must be created using sklearn.make_scorer. It is equivalent of adding custom metric using the add_metric function and passing the name of the custom metric in the optimize parameter. Will be deprecated in future.

search_algorithm: str, default = ‘random’

use ‘random’ for random grid search and ‘grid’ for complete grid search.

choose_better: bool, default = True

When set to True, the returned object is always better performing. The metric used for comparison is defined by the optimize parameter.

fit_kwargs: dict, default = {} (empty dict)

Dictionary of arguments passed to the fit method of the tuner.

return_tuner: bool, default = False

When set to True, will return a tuple of (model, tuner_object).

verbose: bool, default = True

Score grid is not printed when verbose is set to False.

tuner_verbose: bool or in, default = True

If True or above 0, will print messages from the tuner. Higher values print more messages. Ignored when verbose param is False.

**kwargs:

Additional keyword arguments to pass to the optimizer.

Returns

Trained Model and Optional Tuner Object when return_tuner is True.

pycaret.time_series.blend_models(estimator_list: list, method: str = 'mean', fold: Optional[Union[int, Any]] = None, round: int = 4, choose_better: bool = False, optimize: str = 'MASE', weights: Optional[List[float]] = None, fit_kwargs: Optional[dict] = None, verbose: bool = True)

This function trains a EnsembleForecaster for select models passed in the estimator_list param. Trains a sktime EnsembleForecaster under the hood. Refer to it’s documentation for more details.

Example

>>> from pycaret.datasets import get_data
>>> airline = get_data('airline')
>>> from pycaret.time_series import *
>>> exp_name = setup(data = airline,  fh = 12)
>>> top3 = compare_models(n_select = 3)
>>> blender = blend_models(top3)
estimator_list: list of sktime compatible estimators

List of model objects

method: str, default = ‘mean’

Method to average the individual predictions to form a final prediction. Available Methods:

  • ‘mean’ - Mean of individual predictions

  • ‘gmean’ - Geometric Mean of individual predictions

  • ‘median’ - Median of individual predictions

  • ‘min’ - Minimum of individual predictions

  • ‘max’ - Maximum of individual predictions

fold: int or scikit-learn compatible CV generator, default = None

Controls cross-validation. If None, the CV generator in the fold_strategy parameter of the setup function is used. When an integer is passed, it is interpreted as the ‘n_splits’ parameter of the CV generator in the setup function.

round: int, default = 4

Number of decimal places the metrics in the score grid will be rounded to.

choose_better: bool, default = False

When set to True, the returned object is always better performing. The metric used for comparison is defined by the optimize parameter.

optimize: str, default = ‘MASE’

Metric to compare for model selection when choose_better is True.

weights: list, default = None

Sequence of weights (float or int) to apply to the individual model predictons. Uses uniform weights when None. Note that weights only apply ‘mean’, ‘gmean’ and ‘median’ methods.

fit_kwargs: dict, default = {} (empty dict)

Dictionary of arguments passed to the fit method of the model.

verbose: bool, default = True

Score grid is not printed when verbose is set to False.

Returns

Trained Model

pycaret.time_series.plot_model(estimator: Optional[Any] = None, plot: Optional[str] = None, return_fig: bool = False, return_data: bool = False, verbose: bool = False, display_format: Optional[str] = None, data_kwargs: Optional[Dict] = None, fig_kwargs: Optional[Dict] = None, save: Union[str, bool] = False) Optional[Tuple[str, list]]

This function analyzes the performance of a trained model on holdout set. When used without any estimator, this function generates plots on the original data set. When used with an estimator, it will generate plots on the model residuals.

Example

>>> from pycaret.datasets import get_data
>>> airline = get_data('airline')
>>> from pycaret.time_series import *
>>> exp_name = setup(data = airline,  fh = 12)
>>> plot_model(plot="diff", data_kwargs={"order_list": [1, 2], "acf": True, "pacf": True})
>>> plot_model(plot="diff", data_kwargs={"lags_list": [[1], [1, 12]], "acf": True, "pacf": True})
>>> arima = create_model('arima')
>>> plot_model(plot = 'ts')
>>> plot_model(plot = 'decomp', data_kwargs = {'type' : 'multiplicative'})
>>> plot_model(plot = 'decomp', data_kwargs = {'seasonal_period': 24})
>>> plot_model(estimator = arima, plot = 'forecast', data_kwargs = {'fh' : 24})
>>> tuned_arima = tune_model(arima)
>>> plot_model([arima, tuned_arima], data_kwargs={"labels": ["Baseline", "Tuned"]})
estimator: sktime compatible object, default = None

Trained model object

plot: str, default = None

Default is ‘ts’ when estimator is None, When estimator is not None, default is changed to ‘forecast’. List of available plots (ID - Name):

  • ‘ts’ - Time Series Plot

  • ‘train_test_split’ - Train Test Split

  • ‘cv’ - Cross Validation

  • ‘acf’ - Auto Correlation (ACF)

  • ‘pacf’ - Partial Auto Correlation (PACF)

  • ‘decomp’ - Classical Decomposition

  • ‘decomp_stl’ - STL Decomposition

  • ‘diagnostics’ - Diagnostics Plot

  • ‘diff’ - Difference Plot

  • ‘periodogram’ - Frequency Components (Periodogram)

  • ‘fft’ - Frequency Components (FFT)

  • ‘ccf’ - Cross Correlation (CCF)

  • ‘forecast’ - “Out-of-Sample” Forecast Plot

  • ‘insample’ - “In-Sample” Forecast Plot

  • ‘residuals’ - Residuals Plot

return_fig: bool, default = False

When set to True, it returns the figure used for plotting. When set to False (the default), it will print the plot, but not return it.

return_data: bool, default = False

When set to True, it returns the data for plotting. If both return_fig and return_data is set to True, order of return is figure then data.

verbose: bool, default = True

Unused for now

display_format: str, default = None

To display plots in Streamlit (https://www.streamlit.io/), set this to ‘streamlit’. Currently, not all plots are supported.

data_kwargs: dict, default = None

Dictionary of arguments passed to the data for plotting.

Available keys are:

nlags: The number of lags to use when plotting correlation plots, e.g.

ACF, PACF, CCF. If not provided, default internally calculated values are used.

seasonal_period: The seasonal period to use for decomposition plots.

If not provided, the default internally detected seasonal period is used.

type: The type of seasonal decomposition to perform. Options are:

[“additive”, “multiplicative”]

order_list: The differencing orders to use for difference plots. e.g.

[1, 2] will plot first and second order differences (corresponding to d = 1 and 2 in ARIMA models).

lags_list: An alternate and more explicit alternate to “order_list”

allowing users to specify the exact lags to plot. e.g. [1, [1, 12]] will plot first difference and a second plot with first difference (d = 1 in ARIMA) and seasonal 12th difference (D=1, s=12 in ARIMA models). Also note that “order_list” = [2] can be alternately specified as lags_list = [[1, 1]] i.e. successive differencing twice.

acf: True/False

When specified in difference plots and set to True, this will plot the ACF of the differenced data as well.

pacf: True/False

When specified in difference plots and set to True, this will plot the PACF of the differenced data as well.

periodogram: True/False

When specified in difference plots and set to True, this will plot the Periodogram of the differenced data as well.

fft: True/False

When specified in difference plots and set to True, this will plot the FFT of the differenced data as well.

labels: When estimator(s) are provided, the corresponding labels to

use for the plots. If not provided, the model class is used to derive the labels.

include: When data contains exogenous variables, then only specific

exogenous variables can be plotted using this key. e.g. include = [“col1”, “col2”]

exclude: When data contains exogenous variables, specific exogenous

variables can be excluded from the plots using this key. e.g. exclude = [“col1”, “col2”]

alpha: The quantile value to use for point prediction. If not provided,

then the value specified during setup is used.

coverage: The coverage value to use for prediction intervals. If not

provided, then the value specified during setup is used.

fh: The forecast horizon to use for forecasting. If not provided, then

the one used during model training is used.

X: When a model trained with exogenous variables has been finalized,

user can provide the future values of the exogenous variables to make future target time series predictions using this key.

plot_data_type: When plotting the data used for modeling, user may

wish to see plots with the original data set provided, the imputed dataset (if imputation is set) or the transformed dataset (which includes any imputation and transformation set by the user). This keyword can be used to specify which data type to use.

NOTE: (1) If no imputation is specified, then plotting the “imputed”

data type will produce the same results as the “original” data type.

  1. If no transformations are specified, then plotting the “transformed” data type will produce the same results as the “imputed” data type.

Allowed values are (if not specified, defaults to the first one in the list):

“ts”: [“original”, “imputed”, “transformed”] “train_test_split”: [“original”, “imputed”, “transformed”] “cv”: [“original”] “acf”: [“transformed”, “imputed”, “original”] “pacf”: [“transformed”, “imputed”, “original”] “decomp”: [“transformed”, “imputed”, “original”] “decomp_stl”: [“transformed”, “imputed”, “original”] “diagnostics”: [“transformed”, “imputed”, “original”] “diff”: [“transformed”, “imputed”, “original”] “forecast”: [“original”, “imputed”] “insample”: [“original”, “imputed”] “residuals”: [“original”, “imputed”] “periodogram”: [“transformed”, “imputed”, “original”] “fft”: [“transformed”, “imputed”, “original”] “ccf”: [“transformed”, “imputed”, “original”]

Some plots (marked as True below) will also allow specifying multiple of data types at once.

“ts”: True “train_test_split”: True “cv”: False “acf”: True “pacf”: True “decomp”: True “decomp_stl”: True “diagnostics”: True “diff”: False “forecast”: False “insample”: False “residuals”: False “periodogram”: True “fft”: True “ccf”: False

fig_kwargs: dict, default = {} (empty dict)

The setting to be used for the plot. Overrides any global setting passed during setup. Pass these as key-value pairs. For available keys, refer to the setup documentation.

Time-series plots support more display_formats, as a result the fig-kwargs can also contain the resampler_kwargs key and its corresponding dict. These are additional keyword arguments that are fed to the display function. This is mainly used for configuring plotly-resampler visualizations (i.e., display_format “plotly-dash” or “plotly-widget”) which down sampler will be used; how many data points are shown in the front-end.

When the plotly-resampler figure is rendered via Dash (by setting the display_format to “plotly-dash”), one can also use the “show_dash” key within this dictionary to configure the show_dash args.

example:

fig_kwargs = {
    "width": None,
    "resampler_kwargs":  {
        "default_n_shown_samples": 1000,
        "show_dash": {"mode": "inline", "port": 9012}
    }
}
save: string or bool, default = False

When set to True, Plot is saved as a ‘png’ file in current working directory. When a path destination is given, Plot is saved as a ‘png’ file the given path to the directory of choice.

Returns

Path to saved file and list containing figure and data, if any.

pycaret.time_series.predict_model(estimator, fh=None, X=None, return_pred_int=False, alpha: Optional[float] = None, coverage: Union[float, List[float]] = 0.9, round: int = 4, verbose: bool = True) DataFrame

This function forecast using a trained model. When fh is None, it forecasts using the same forecast horizon used during the training.

Example

>>> from pycaret.datasets import get_data
>>> airline = get_data('airline')
>>> from pycaret.time_series import *
>>> exp_name = setup(data = airline,  fh = 12)
>>> arima = create_model('arima')
>>> pred_holdout = predict_model(arima)
>>> pred_unseen = predict_model(finalize_model(arima), fh = 24)
estimator: sktime compatible object

Trained model object

fh: int, default = None

Number of points from the last date of training to forecast. When fh is None, it forecasts using the same forecast horizon used during the training.

X: pd.DataFrame, default = None

Exogenous Variables to be used for prediction. Before finalizing the estimator, X need not be passed even when the estimator is built using exogenous variables (since this is taken care of internally by using the exogenous variables from test split). When estimator has been finalized and estimator used exogenous variables, then X must be passed.

return_pred_int: bool, default = False

When set to True, it returns lower bound and upper bound prediction interval, in addition to the point prediction.

alpha: Optional[float], default = None

The alpha (quantile) value to use for the point predictions. Refer to the “point_alpha” description in the setup docstring for details.

coverage: Union[float, List[float]], default = 0.9

The coverage to be used for prediction intervals. Refer to the “coverage” description in the setup docstring for details.

round: int, default = 4

Number of decimal places to round predictions to.

verbose: bool, default = True

When set to False, holdout score grid is not printed.

Returns

pandas.DataFrame

pycaret.time_series.finalize_model(estimator, fit_kwargs: Optional[dict] = None, model_only: bool = False) Any

This function trains a given estimator on the entire dataset including the holdout set.

Example

>>> from pycaret.datasets import get_data
>>> data = get_data('airline')
>>> from pycaret.time_series import *
>>> exp_name = setup(data = data, fh = 12)
>>> arima = create_model('arima')
>>> final_arima = finalize_model(arima)
estimator: sktime compatible object

Trained model object

fit_kwargs: dict, default = None

Dictionary of arguments passed to the fit method of the model.

model_only: bool, default = True

Parameter not in use for now. Behavior may change in future.

Returns

Trained pipeline or model object fitted on complete dataset.

pycaret.time_series.deploy_model(model, model_name: str, authentication: dict, platform: str = 'aws')

This function deploys the transformation pipeline and trained model on cloud.

Example

>>> from pycaret.datasets import get_data
>>> data = get_data('airline')
>>> from pycaret.time_series import *
>>> exp_name = setup(data = data, fh = 12)
>>> arima = create_model('arima')
>>> deploy_model(
        model = arima, model_name = 'arima-for-deployment',
        platform = 'aws', authentication = {'bucket' : 'S3-bucket-name'}
    )
Amazon Web Service (AWS) users:

To deploy a model on AWS S3 (‘aws’), environment variables must be set in your local environment. To configure AWS environment variables, type aws configure in the command line. Following information from the IAM portal of amazon console account is required:

  • AWS Access Key ID

  • AWS Secret Key Access

  • Default Region Name (can be seen under Global settings on your AWS console)

More info: https://docs.aws.amazon.com/cli/latest/userguide/cli-configure-envvars.html

Google Cloud Platform (GCP) users:

To deploy a model on Google Cloud Platform (‘gcp’), project must be created using command line or GCP console. Once project is created, you must create a service account and download the service account key as a JSON file to set environment variables in your local environment.

More info: https://cloud.google.com/docs/authentication/production

Microsoft Azure (Azure) users:

To deploy a model on Microsoft Azure (‘azure’), environment variables for connection string must be set in your local environment. Go to settings of storage account on Azure portal to access the connection string required.

More info: https://docs.microsoft.com/en-us/azure/storage/blobs/storage-quickstart-blobs-python?toc=%2Fpython%2Fazure%2FTOC.json

model: scikit-learn compatible object

Trained model object

model_name: str

Name of model.

authentication: dict

Dictionary of applicable authentication tokens.

When platform = ‘aws’: {‘bucket’ : ‘S3-bucket-name’, ‘path’: (optional) folder name under the bucket}

When platform = ‘gcp’: {‘project’: ‘gcp-project-name’, ‘bucket’ : ‘gcp-bucket-name’}

When platform = ‘azure’: {‘container’: ‘azure-container-name’}

platform: str, default = ‘aws’

Name of the platform. Currently supported platforms: ‘aws’, ‘gcp’ and ‘azure’.

Returns

None

pycaret.time_series.save_model(model, model_name: str, model_only: bool = False, verbose: bool = True)

This function saves the transformation pipeline and trained model object into the current working directory as a pickle file for later use.

Example

>>> from pycaret.datasets import get_data
>>> data = get_data('airline')
>>> from pycaret.time_series import *
>>> exp_name = setup(data = data, fh = 12)
>>> arima = create_model('arima')
>>> save_model(arima, 'saved_arima_model')
model: sktime compatible object

Trained model object

model_name: str

Name of the model.

model_only: bool, default = False

When set to True, only trained model object is saved instead of the entire pipeline.

verbose: bool, default = True

Success message is not printed when verbose is set to False.

Returns

Tuple of the model object and the filename.

pycaret.time_series.load_model(model_name: str, platform: Optional[str] = None, authentication: Optional[Dict[str, str]] = None, verbose: bool = True)

This function loads a previously saved pipeline/model.

Example

>>> from pycaret.time_series import load_model
>>> saved_arima = load_model('saved_arima_model')
model_name: str

Name of the model.

platform: str, default = None

Name of the cloud platform. Currently supported platforms: ‘aws’, ‘gcp’ and ‘azure’.

authentication: dict, default = None

dictionary of applicable authentication tokens.

when platform = ‘aws’: {‘bucket’ : ‘S3-bucket-name’}

when platform = ‘gcp’: {‘project’: ‘gcp-project-name’, ‘bucket’ : ‘gcp-bucket-name’}

when platform = ‘azure’: {‘container’: ‘azure-container-name’}

verbose: bool, default = True

Success message is not printed when verbose is set to False.

Returns

Trained Model

pycaret.time_series.pull(pop: bool = False) DataFrame

Returns last printed score grid. Use pull function after any training function to store the score grid in pandas.DataFrame.

pop: bool, default = False

If True, will pop (remove) the returned dataframe from the display container.

Returns

pandas.DataFrame

pycaret.time_series.models(type: Optional[str] = None, internal: bool = False, raise_errors: bool = True) DataFrame

Returns table of models available in the model library.

Example

>>> from pycaret.datasets import get_data
>>> data = get_data('airline')
>>> from pycaret.time_series import *
>>> exp_name = setup(data = data, fh = 12)
>>> models()
type: str, default = None
  • baseline : filters and only return baseline models

  • classical : filters and only return classical models

  • linear : filters and only return linear models

  • tree : filters and only return tree based models

  • neighbors : filters and only return neighbors models

internal: bool, default = False

When True, will return extra columns and rows used internally.

raise_errors: bool, default = True

When False, will suppress all exceptions, ignoring models that couldn’t be created.

Returns

pandas.DataFrame

pycaret.time_series.get_metrics(reset: bool = False, include_custom: bool = True, raise_errors: bool = True) DataFrame

Returns table of available metrics used for CV.

Example

>>> from pycaret.datasets import get_data
>>> airline = get_data('airline')
>>> from pycaret.time_series import *
>>> exp_name = setup(data = airline,  fh = 12)
>>> all_metrics = get_metrics()
reset: bool, default = False

When True, will reset all changes made using the add_metric and remove_metric function.

include_custom: bool, default = True

Whether to include user added (custom) metrics or not.

raise_errors: bool, default = True

If False, will suppress all exceptions, ignoring models that couldn’t be created.

Returns

pandas.DataFrame

pycaret.time_series.add_metric(id: str, name: str, score_func: type, greater_is_better: bool = True, **kwargs) Series

Adds a custom metric to be used for CV.

Example

>>> from pycaret.datasets import get_data
>>> airline = get_data('airline')
>>> from pycaret.time_series import *
>>> exp_name = setup(data = airline,  fh = 12)
>>> from sklearn.metrics import explained_variance_score
>>> add_metric('evs', 'EVS', explained_variance_score)
id: str

Unique id for the metric.

name: str

Display name of the metric.

score_func: type

Score function (or loss function) with signature score_func(y, y_pred, **kwargs).

greater_is_better: bool, default = True

Whether score_func is higher the better or not.

**kwargs:

Arguments to be passed to score function.

Returns

pandas.Series

pycaret.time_series.remove_metric(name_or_id: str)

Removes a metric from CV.

Example

>>> from pycaret.datasets import get_data
>>> data = get_data('airline')
>>> from pycaret.time_series import *
>>> exp_name = setup(data = data, fh = 12)
>>> remove_metric('MAPE')
name_or_id: str

Display name or ID of the metric.

Returns

None

pycaret.time_series.get_logs(experiment_name: Optional[str] = None, save: bool = False) DataFrame

Returns a table of experiment logs. Only works when log_experiment is True when initializing the setup function.

Example

>>> from pycaret.datasets import get_data
>>> data = get_data('airline')
>>> from pycaret.time_series import *
>>> exp_name = setup(data = data, fh = 12)
>>> best = compare_models()
>>> exp_logs = get_logs()
experiment_name: str, default = None

When None current active run is used.

save: bool, default = False

When set to True, csv file is saved in current working directory.

Returns

pandas.DataFrame

pycaret.time_series.get_config(variable: Optional[str] = None)

This function retrieves the global variables created when initializing the setup function. Following variables are accessible:

  • X: Period/Index of X

  • y: Time Series as pd.Series

  • X_train: Period/Index of X_train

  • y_train: Time Series as pd.Series (Train set only)

  • X_test: Period/Index of X_test

  • y_test: Time Series as pd.Series (Test set only)

  • fh: forecast horizon

  • enforce_pi: enforce prediction interval in models

  • seed: random state set through session_id

  • prep_pipe: Transformation pipeline

  • n_jobs_param: n_jobs parameter used in model training

  • html_param: html_param configured through setup

  • _master_model_container: model storage container

  • _display_container: results display container

  • exp_name_log: Name of experiment

  • logging_param: log_experiment param

  • log_plots_param: log_plots param

  • USI: Unique session ID parameter

  • data_before_preprocess: data before preprocessing

  • gpu_param: use_gpu param configured through setup

  • fold_generator: CV splitter configured in fold_strategy

  • fold_param: fold params defined in the setup

  • seasonality_present: seasonality as detected in the setup

  • seasonality_period: seasonality_period as detected in the setup

Example

>>> from pycaret.datasets import get_data
>>> airline = get_data('airline')
>>> from pycaret.time_series import *
>>> exp_name = setup(data = airline,  fh = 12)
>>> X_train = get_config('X_train')
variablestr, default = None

Name of the variable to return the value of. If None, will return a list of possible names.

Returns

Global variable

pycaret.time_series.set_config(variable: str, value)

This function resets the global variables. Following variables are accessible:

  • X: Period/Index of X

  • y: Time Series as pd.Series

  • X_train: Period/Index of X_train

  • y_train: Time Series as pd.Series (Train set only)

  • X_test: Period/Index of X_test

  • y_test: Time Series as pd.Series (Test set only)

  • fh: forecast horizon

  • enforce_pi: enforce prediction interval in models

  • seed: random state set through session_id

  • prep_pipe: Transformation pipeline

  • n_jobs_param: n_jobs parameter used in model training

  • html_param: html_param configured through setup

  • _master_model_container: model storage container

  • _display_container: results display container

  • exp_name_log: Name of experiment

  • logging_param: log_experiment param

  • log_plots_param: log_plots param

  • USI: Unique session ID parameter

  • data_before_preprocess: data before preprocessing

  • gpu_param: use_gpu param configured through setup

  • fold_generator: CV splitter configured in fold_strategy

  • fold_param: fold params defined in the setup

  • seasonality_present: seasonality as detected in the setup

  • seasonality_period: seasonality_period as detected in the setup

Example

>>> from pycaret.datasets import get_data
>>> airline = get_data('airline')
>>> from pycaret.time_series import *
>>> exp_name = setup(data = airline,  fh = 12)
>>> set_config('seed', 123)
Returns

None

pycaret.time_series.save_experiment(path_or_file: Union[str, PathLike, BinaryIO], **cloudpickle_kwargs) None

Saves the experiment to a pickle file.

The experiment is saved using cloudpickle to deal with lambda functions. The data or test data is NOT saved with the experiment and will need to be specified again when loading using load_experiment.

path_or_file: str or BinaryIO (file pointer)

The path/file pointer to save the experiment to.

**cloudpickle_kwargs:

Kwargs to pass to the cloudpickle.dump call.

Returns

None

pycaret.time_series.load_experiment(path_or_file: Union[str, PathLike, BinaryIO], data: Optional[Union[Series, DataFrame]] = None, data_func: Optional[Callable[[], Union[Series, DataFrame]]] = None, test_data: Optional[Union[Series, DataFrame]] = None, preprocess_data: bool = True, **cloudpickle_kwargs) TSForecastingExperiment

Load an experiment saved with save_experiment from path or file.

The data (and test data) is NOT saved with the experiment and will need to be specified again.

path_or_file: str or BinaryIO (file pointer)

The path/file pointer to load the experiment from. The pickle file must be created through save_experiment.

data: pandas.Series or pandas.DataFrame

Data set with shape (n_samples, n_features), where n_samples is the number of samples and n_features is the number of features. If data is not a pandas dataframe, it’s converted to one using default column names.

data_func: Callable[[], pandas.Series or pandas.DataFrame] = None

The function that generate data (the dataframe-like input). This is useful when the dataset is large, and you need parallel operations such as compare_models. It can avoid broadcasting large dataset from driver to workers. Notice one and only one of data and data_func must be set.

test_data: pandas.Series or pandas.DataFrame or None, default = None

If not None, test_data is used as a hold-out set and train_size parameter is ignored. The columns of data and test_data must match.

preprocess_data: bool, default = True

If True, the data will be preprocessed again (through running setup internally). If False, the data will not be preprocessed. This means you can save the value of the data attribute of an experiment separately, and then load it separately and pass it here with preprocess_data set to False. This is an advanced feature. We recommend leaving it set to True and passing the same data as passed to the initial setup call.

**cloudpickle_kwargs:

Kwargs to pass to the cloudpickle.load call.

Returns

loaded experiment

pycaret.time_series.set_current_experiment(experiment: TSForecastingExperiment)

Set the current experiment to be used with the functional API.

experiment: TSForecastingExperiment

Experiment object to use.

Returns

None

pycaret.time_series.get_current_experiment() TSForecastingExperiment

Obtain the current experiment object.

Returns

Current TSForecastingExperiment

pycaret.time_series.check_stats(estimator: Optional[Any] = None, test: str = 'all', alpha: float = 0.05, split: str = 'all') DataFrame

This function is used to get summary statistics and run statistical tests on the original data or model residuals.

Example

>>> from pycaret.datasets import get_data
>>> airline = get_data('airline')
>>> from pycaret.time_series import *
>>> exp_name = setup(data = airline,  fh = 12)
>>> check_stats(test="summary")
>>> check_stats(test="adf")
>>> arima = create_model('arima')
>>> check_stats(arima, test = 'white_noise')
Parameters

estimator (sktime compatible object, optional) – Trained model object, by default None

teststr, optional

Name of the test to be performed, by default “all”

Options are:

  • ‘summary’ - Summary Statistics

  • ‘white_noise’ - Ljung-Box Test for white noise

  • ‘adf’ - ADF test for difference stationarity

  • ‘kpss’ - KPSS test for trend stationarity

  • ‘stationarity’ - ADF and KPSS test

  • ‘normality’ - Shapiro Test for Normality

  • ‘all’ - All of the above tests

alphafloat, optional

Significance Level, by default 0.05

splitstr, optional

The split of the original data to run the test on. Only applicable when test is run on the original data (not residuals), by default “all”

Options are:

  • ‘all’ - Complete Dataset

  • ‘train’ - The Training Split of the dataset

  • ‘test’ - The Test Split of the dataset

data_typestr, optional

The data type to use for the statistical test, by default “transformed”.

User may wish to perform the tests on the original data set provided, the imputed dataset (if imputation is set) or the transformed dataset (which includes any imputation and transformation set by the user). This keyword can be used to specify which data type to use.

Allowed values are: [“original”, “imputed”, “transformed”]

NOTE: (1) If no imputation is specified, then testing on the “imputed”

data type will produce the same results as the “original” data type.

  1. If no transformations are specified, then testing the “transformed” data type will produce the same results as the “imputed” data type.

  2. By default, tests are done on the “transformed” data since that is the data that is fed to the model during training.

data_kwargsOptional[Dict], optional

Users can specify lags list or order_list to run the test for the data as well as for its lagged versions, by default None

>>> check_stats(test="white_noise", data_kwargs={"order_list": [1, 2]})
>>> check_stats(test="white_noise", data_kwargs={"lags_list": [1, [1, 12]]})

Returns:

pd.DataFrame

Dataframe with the test results

pycaret.time_series.get_allowed_engines(estimator: str) Optional[str]

Get all the allowed engines for the specified model

Parameters

estimator (str) – Identifier for the model for which the engines should be retrieved, e.g. “auto_arima”

Returns

The allowed engines for the model. If the model only supports the default engine, then it return None.

Return type

Optional[str]

pycaret.time_series.get_engine(estimator: str) Optional[str]

Gets the model engine currently set in the experiment for the specified model.

Parameters

estimator (str) – Identifier for the model for which the engine should be retrieved, e.g. “auto_arima”

Returns

The engine for the model. If the model only supports the default sktime engine, then it return None.

Return type

Optional[str]