Time Series
- class pycaret.time_series.TSForecastingExperiment
- setup(data: Optional[Union[Series, DataFrame]] = None, data_func: Optional[Callable[[], Union[Series, DataFrame]]] = None, target: Optional[str] = None, index: Optional[str] = None, ignore_features: Optional[List] = None, numeric_imputation_target: Optional[Union[str, int, float]] = None, numeric_imputation_exogenous: Optional[Union[str, int, float]] = None, transform_target: Optional[str] = None, transform_exogenous: Optional[str] = None, scale_target: Optional[str] = None, scale_exogenous: Optional[str] = None, fe_target_rr: Optional[list] = None, fe_exogenous: Optional[list] = None, fold_strategy: Union[str, Any] = 'expanding', fold: int = 3, fh: Optional[Union[List[int], int, ndarray, ForecastingHorizon]] = 1, hyperparameter_split: str = 'all', seasonal_period: Optional[Union[List[Union[int, str]], int, str]] = None, ignore_seasonality_test: bool = False, sp_detection: str = 'auto', max_sp_to_consider: Optional[int] = 60, remove_harmonics: bool = False, harmonic_order_method: str = 'harmonic_max', num_sps_to_use: int = 1, seasonality_type: str = 'mul', point_alpha: Optional[float] = None, coverage: Union[float, List[float]] = 0.9, enforce_exogenous: bool = True, n_jobs: Optional[int] = -1, use_gpu: bool = False, custom_pipeline: Optional[Any] = None, html: bool = True, session_id: Optional[int] = None, system_log: Union[bool, str, Logger] = True, log_experiment: Union[bool, str, BaseLogger, List[Union[str, BaseLogger]]] = False, experiment_name: Optional[str] = None, experiment_custom_tags: Optional[Dict[str, Any]] = None, log_plots: Union[bool, list] = False, log_profile: bool = False, log_data: bool = False, engine: Optional[Dict[str, str]] = None, verbose: bool = True, profile: bool = False, profile_kwargs: Optional[Dict[str, Any]] = None, fig_kwargs: Optional[Dict[str, Any]] = None)
This function initializes the training environment and creates the transformation pipeline. Setup function must be called before executing any other function. It takes one mandatory parameters:
data
. All the other parameters are optional.Example
>>> from pycaret.datasets import get_data >>> airline = get_data('airline') >>> from pycaret.time_series import * >>> exp_name = setup(data = airline, fh = 12)
- datapandas.Series or pandas.DataFrame = None
Shape (n_samples, 1), when pandas.DataFrame, otherwise (n_samples, ).
- data_func: Callable[[], Union[pd.Series, pd.DataFrame]] = None
The function that generate
data
(the dataframe-like input). This is useful when the dataset is large, and you need parallel operations such ascompare_models
. It can avoid broadcasting large dataset from driver to workers. Notice one and only one ofdata
anddata_func
must be set.- targetOptional[str], default = None
Target name to be forecasted. Must be specified when data is a pandas DataFrame with more than 1 column. When data is a pandas Series or pandas DataFrame with 1 column, this can be left as None.
- index: Optional[str], default = None
Column name to be used as the datetime index for modeling. If ‘index’ column is specified & is of type string, it is assumed to be coercible to pd.DatetimeIndex using pd.to_datetime(). It can also be of type Int (e.g. RangeIndex, Int64Index), or DatetimeIndex or PeriodIndex in which case, it is processed appropriately. If None, then the data’s index is used as is for modeling.
- ignore_features: Optional[List], default = None
List of features to ignore for modeling when the data is a pandas Dataframe with more than 1 column. Ignored when data is a pandas Series or Dataframe with 1 column.
- numeric_imputation_target: Optional[Union[int, float, str]], default = None
Indicates how to impute missing values in the target. If None, no imputation is done. If the target has missing values, then imputation is mandatory. If str, then value passed as is to the underlying sktime imputer. Allowed values are:
“drift”, “linear”, “nearest”, “mean”, “median”, “backfill”, “bfill”, “pad”, “ffill”, “random”
If int or float, imputation method is set to “constant” with the given value.
- numeric_imputation_exogenous: Optional[Union[int, float, str]], default = None
Indicates how to impute missing values in the exogenous variables. If None, no imputation is done. If exogenous variables have missing values, then imputation is mandatory. If str, then value passed as is to the underlying sktime imputer. Allowed values are:
“drift”, “linear”, “nearest”, “mean”, “median”, “backfill”, “bfill”, “pad”, “ffill”, “random”
If int or float, imputation method is set to “constant” with the given value.
- transform_target: Optional[str], default = None
Indicates how the target variable should be transformed. If None, no transformation is performed. Allowed values are
“box-cox”, “log”, “sqrt”, “exp”, “cos”
- transform_exogenous: Optional[str], default = None
Indicates how the exogenous variables should be transformed. If None, no transformation is performed. Allowed values are
“box-cox”, “log”, “sqrt”, “exp”, “cos”
- scale_target: Optional[str], default = None
Indicates how the target variable should be scaled. If None, no scaling is performed. Allowed values are
“zscore”, “minmax”, “maxabs”, “robust”
- scale_exogenous: Optional[str], default = None
Indicates how the exogenous variables should be scaled. If None, no scaling is performed. Allowed values are
“zscore”, “minmax”, “maxabs”, “robust”
- fe_target_rr: Optional[list], default = None
The transformers to be applied to the target variable in order to extract useful features. By default, None which means that the provided target variable are used “as is”.
NOTE: Most statistical and baseline models already use features (lags) for target variables implicitly. The only place where target features have to be created explicitly is in reduced regression models. Hence, this feature extraction is only applied to reduced regression models.
>>> import numpy as np >>> from pycaret.datasets import get_data >>> from sktime.transformations.series.summarize import WindowSummarizer
>>> data = get_data("airline")
>>> kwargs = {"lag_feature": {"lag": [36, 24, 13, 12, 11, 9, 6, 3, 2, 1]}} >>> fe_target_rr = [WindowSummarizer(n_jobs=1, truncate="bfill", **kwargs)]
>>> # Baseline >>> exp = TSForecastingExperiment() >>> exp.setup(data=data, fh=12, fold=3, session_id=42) >>> model1 = exp.create_model("lr_cds_dt")
>>> # With Feature Engineering >>> exp = TSForecastingExperiment() >>> exp.setup( >>> data=data, fh=12, fold=3, fe_target_rr=fe_target_rr, session_id=42 >>> ) >>> model2 = exp.create_model("lr_cds_dt")
>>> exp.plot_model([model1, model2], data_kwargs={"labels": ["Baseline", "With FE"]})
- fe_exogenousOptional[list] = None
The transformations to be applied to the exogenous variables. These transformations are used for all models that accept exogenous variables. By default, None which means that the provided exogenous variables are used “as is”.
>>> import numpy as np >>> from sktime.transformations.series.summarize import WindowSummarizer
>>> # Example: function num_above_thresh to count how many observations lie above >>> # the threshold within a window of length 2, lagged by 0 periods. >>> def num_above_thresh(x): >>> '''Count how many observations lie above threshold.''' >>> return np.sum((x > 0.7)[::-1])
>>> kwargs1 = {"lag_feature": {"lag": [0, 1], "mean": [[0, 4]]}} >>> kwargs2 = { >>> "lag_feature": { >>> "lag": [0, 1], num_above_thresh: [[0, 2]], >>> "mean": [[0, 4]], "std": [[0, 4]] >>> } >>> }
>>> fe_exogenous = [ >>> ( "a", WindowSummarizer( >>> n_jobs=1, target_cols=["Income"], truncate="bfill", **kwargs1 >>> ) >>> ), >>> ( >>> "b", WindowSummarizer( >>> n_jobs=1, target_cols=["Unemployment", "Production"], truncate="bfill", **kwargs2 >>> ) >>> ), >>> ]
>>> data = get_data("uschange") >>> exp = TSForecastingExperiment() >>> exp.setup( >>> data=data, target="Consumption", fh=12, >>> fe_exogenous=fe_exogenous, session_id=42 >>> ) >>> print(f"Feature Columns: {exp.get_config('X_transformed').columns}") >>> model = exp.create_model("lr_cds_dt")
- fold_strategy: str or sklearn CV generator object, default = ‘expanding’
Choice of cross validation strategy. Possible values are:
‘expanding’
‘rolling’ (same as/aliased to ‘expanding’)
‘sliding’
You can also pass an sktime compatible cross validation object such as
SlidingWindowSplitter
orExpandingWindowSplitter
. In this case, the fold and fh parameters will be ignored and these values will be extracted from thefold_strategy
object directly.- fold: int, default = 3
Number of folds to be used in cross validation. Must be at least 2. This is a global setting that can be over-written at function level by using
fold
parameter. Ignored whenfold_strategy
is a custom object.- fh: Optional[int or list or np.array or ForecastingHorizon], default = 1
The forecast horizon to be used for forecasting. Default is set to
1
i.e. forecast one point ahead. Valid options are: (1) Integer: When integer is passed it means N continuous points inthe future without any gap.
List or np.array: Indicates points to predict in the future. e.g. fh = [1, 2, 3, 4] or np.arange(1, 5) will predict 4 points in the future.
If you want to forecast values with gaps, you can pass an list or array with gaps. e.g. np.arange([13, 25]) will skip the first 12 future points and forecast from the 13th point till the 24th point ahead (note in numpy right value is inclusive and left is exclusive).
Can also be a sktime compatible ForecastingHorizon object.
If fh = None, then fold_strategy must be a sktime compatible cross validation object. In this case, fh is derived from this object.
- hyperparameter_split: str, default = “all”
The split of data used to determine certain hyperparameters such as “seasonal_period”, whether multiplicative seasonality can be used or not, whether the data is white noise or not, the values of non-seasonal difference “d” and seasonal difference “D” to use in certain models. Allowed values are: [“all”, “train”]. Refer for more details: https://github.com/pycaret/pycaret/issues/3202
- seasonal_period: list or int or str, default = None
Seasonal periods to use when performing seasonality checks (i.e. candidates).
Users can provide seasonal_period by passing it as an integer or a string corresponding to the keys below (e.g. ‘W’ for weekly data, ‘M’ for monthly data, etc.).
B, C = 5
D = 7
W = 52
M, BM, CBM, MS, BMS, CBMS = 12
SM, SMS = 24
Q, BQ, QS, BQS = 4
A, Y, BA, BY, AS, YS, BAS, BYS = 1
H = 24
T, min = 60
S = 60
Users can also provide a list of such values to use in models that accept multiple seasonal values (currently TBATS). For models that don’t accept multiple seasonal values, the first value of the list will be used as the seasonal period.
NOTE: (1) If seasonal_period is provided, whether the seasonality check is performed or not depends on the ignore_seasonality_test setting. (2) If seasonal_period is not provided, then the candidates are detected per the sp_detection setting. If seasonal_period is provided, sp_detection setting is ignored.
- ignore_seasonality_test: bool = False
Whether to ignore the seasonality test or not. Applicable when seasonal_period is provided. If False, then a seasonality tests is performed to determine if the provided seasonal_period is valid or not. If it is found to be not valid, no seasonal period is used for modeling. If True, then the the provided seasonal_period is used as is.
- sp_detection: str, default = “auto”
If seasonal_period is None, then this parameter determines the algorithm to use to detect the seasonal periods to use in the models.
Allowed values are [“auto” or “index”].
If “auto”, then seasonal periods are detected using statistical tests. If “index”, then the frequency of the data index is mapped to a seasonal period as shown in seasonal_period.
- max_sp_to_consider: Optional[int], default = 60,
Max period to consider when detecting seasonal periods. If None, all periods up to int((“length of data”-1)/2) are considered. Length of the data is determined by hyperparameter_split setting.
- remove_harmonics: bool, default = False
Should harmonics be removed when considering what seasonal periods to use for modeling.
- harmonic_order_method: str, default = “harmonic_max”
Applicable when remove_harmonics = True. This determines how the harmonics are replaced. Allowed values are “harmonic_strength”, “harmonic_max” or “raw_strength. - If set to “harmonic_max”, then lower seasonal period is replaced by its highest harmonic seasonal period in same position as the lower seasonal period. - If set to “harmonic_strength”, then lower seasonal period is replaced by its highest strength harmonic seasonal period in same position as the lower seasonal period. - If set to “raw_strength”, then lower seasonal periods is removed and the higher harmonic seasonal periods is retained in its original position based on its seasonal strength.
e.g. Assuming detected seasonal periods in strength order are [2, 3, 4, 50] and remove_harmonics = True, then: - If harmonic_order_method = “harmonic_max”, result = [50, 3, 4] - If harmonic_order_method = “harmonic_strength”, result = [4, 3, 50] - If harmonic_order_method = “raw_strength”, result = [3, 4, 50]
- num_sps_to_use: int, default = 1
It determines the maximum number of seasonal periods to use in the models. Set to -1 to use all detected seasonal periods (in models that allow multiple seasonalities). If a model only allows one seasonal period and num_sps_to_use > 1, then the most dominant (primary) seasonal that is detected is used.
- seasonality_typestr, default = “mul”
The type of seasonality to use. Allowed values are [“add”, “mul” or “auto”]
The detection flow sequence is as follows: (1) If seasonality is not detected, then seasonality type is set to None. (2) If seasonality is detected but data is not strictly positive, then seasonality type is set to “add”. (3) If seasonality_type is “auto”, then the type of seasonality is determined using an internal algorithm as follows
If seasonality is detected, then data is decomposed using
additive and multiplicative seasonal decomposition. Then seasonality type is selected based on seasonality strength per FPP (https://otexts.com/fpp2/seasonal-strength.html). NOTE: For Multiplicative, the denominator multiplies the seasonal and residual components instead of adding them. Rest of the calculations remain the same. If seasonal decomposition fails for any reason, then defaults to multiplicative seasonality.
Otherwise, seasonality_type is set to the user provided value.
- point_alpha: Optional[float], default = None
The alpha (quantile) value to use for the point predictions. By default this is set to None which uses sktime’s predict() method to get the point prediction (the mean or the median of the forecast distribution). If this is set to a floating point value, then it switches to using the predict_quantiles() method to get the point prediction at the user specified quantile. Reference: https://robjhyndman.com/hyndsight/quantile-forecasts-in-r/
NOTE: (1) Not all models support predict_quantiles(), hence, if a float value is provided, these models will be disabled. (2) Under some conditions, the user may want to only work with models that support prediction intervals. Utilizing note 1 to our advantage, the point_alpha argument can be set to 0.5 (or any float value depending on the quantile that the user wants to use for point predictions). This will disable models that do not support prediction intervals.
- coverage: Union[float, List[float]], default = 0.9
The coverage to be used for prediction intervals (only applicable for models that support prediction intervals).
If a float value is provides, it corresponds to the coverage needed (e.g. 0.9 means 90% coverage). This corresponds to lower and upper quantiles = 0.05 and 0.95 respectively.
Alternately, if user wants to get the intervals at specific quantiles, a list of 2 values can be provided directly. e.g. coverage = [0.2. 0.9] will return the lower interval corresponding to a quantile of 0.2 and an upper interval corresponding to a quantile of 0.9.
- enforce_exogenous: bool, default = True
When set to True and the data includes exogenous variables, only models that support exogenous variables are loaded in the environment.When set to False, all models are included and in this case, models that do not support exogenous variables will model the data as a univariate forecasting problem.
- n_jobs: int, default = -1
The number of jobs to run in parallel (for functions that supports parallel processing) -1 means using all processors. To run all functions on single processor set n_jobs to None.
- use_gpu: bool or str, default = False
Parameter not in use for now. Behavior may change in future.
- custom_pipeline: list of (str, transformer), dict or Pipeline, default = None
Parameter not in use for now. Behavior may change in future.
- html: bool, default = True
When set to False, prevents runtime display of monitor. This must be set to False when the environment does not support IPython. For example, command line terminal, Databricks Notebook, Spyder and other similar IDEs.
- session_id: int, default = None
Controls the randomness of experiment. It is equivalent to ‘random_state’ in scikit-learn. When None, a pseudo random number is generated. This can be used for later reproducibility of the entire experiment.
- system_log: bool or str or logging.Logger, default = True
Whether to save the system logging file (as logs.log). If the input is a string, use that as the path to the logging file. If the input already is a logger object, use that one instead.
- log_experiment: bool, default = False
When set to True, all metrics and parameters are logged on the
MLflow
server.- experiment_name: str, default = None
Name of the experiment for logging. Ignored when
log_experiment
is not True.- log_plots: bool or list, default = False
When set to True, certain plots are logged automatically in the
MLFlow
server. To change the type of plots to be logged, pass a list containing plot IDs. Refer to documentation ofplot_model
. Ignored whenlog_experiment
is not True.- log_profile: bool, default = False
When set to True, data profile is logged on the
MLflow
server as a html file. Ignored whenlog_experiment
is not True.- log_data: bool, default = False
When set to True, dataset is logged on the
MLflow
server as a csv file. Ignored whenlog_experiment
is not True.- engine: Optional[Dict[str, str]] = None
The engine to use for the models, e.g. for auto_arima, users can switch between “pmdarima” and “statsforecast” by specifying engine={“auto_arima”: “statsforecast”}
- verbose: bool, default = True
When set to False, Information grid is not printed.
- profile: bool, default = False
When set to True, an interactive EDA report is displayed.
- profile_kwargs: dict, default = {} (empty dict)
Dictionary of arguments passed to the ProfileReport method used to create the EDA report. Ignored if
profile
is False.- fig_kwargs: dict, default = {} (empty dict)
The global setting for any plots. Pass these as key-value pairs. Example: fig_kwargs = {“height”: 1000, “template”: “simple_white”}
Available keys are:
- hoverinfo: hoverinfo passed to Plotly figures. Can be any value supported
by Plotly (e.g. “text” to display, “skip” or “none” to disable.). When not provided, hovering over certain plots may be disabled by PyCaret when the data exceeds a certain number of points (determined by big_data_threshold).
- renderer: The renderer used to display the plotly figure. Can be any value
supported by Plotly (e.g. “notebook”, “png”, “svg”, etc.). Note that certain renderers (like “svg”) may need additional libraries to be installed. Users will have to do this manually since they don’t come preinstalled with plotly. When not provided, plots use plotly’s default render when data is below a certain number of points (determined by big_data_threshold) otherwise it switches to a static “png” renderer.
- template: The template to use for the plots. Can be any value supported by Plotly.
If not provided, defaults to “ggplot2”
- width: The width of the plot in pixels. If not provided, defaults to None
which lets Plotly decide the width.
- height: The height of the plot in pixels. If not provided, defaults to None
which lets Plotly decide the height.
- rows: The number of rows to use for plots where this can be customized,
e.g. ccf. If not provided, defaults to None which lets PyCaret decide based on number of subplots to be plotted.
- cols: The number of columns to use for plots where this can be customized,
e.g. ccf. If not provided, defaults to 4
- big_data_threshold: The number of data points above which hovering over
certain plots can be disabled and/or renderer switched to a static renderer. This is useful when the time series being modeled has a lot of data which can make notebooks slow to render. Also note that setting the display_format to a plotly-resampler figure (“plotly-dash” or “plotly-widget”) can circumvent these problems by performing dynamic data aggregation.
- resampler_kwargs: The keyword arguments that are fed to configure the
plotly-resampler visualizations (i.e., display_format “plotly-dash” or “plotly-widget”) which down sampler will be used; how many data points are shown in the front-end. When the plotly-resampler figure is rendered via Dash (by setting the display_format to “plotly-dash”), one can also use the “show_dash” key within this dictionary to configure the show_dash method its args.
example:
fig_kwargs = { ..., "resampler_kwargs": { "default_n_shown_samples": 1000, "show_dash": {"mode": "inline", "port": 9012} } }
- Returns
Global variables that can be changed using the
set_config
function.
- compare_models(include: Optional[List[Union[str, Any]]] = None, exclude: Optional[List[str]] = None, fold: Optional[Union[int, Any]] = None, round: int = 4, cross_validation: bool = True, sort: str = 'MASE', n_select: int = 1, budget_time: Optional[float] = None, turbo: bool = True, errors: str = 'ignore', fit_kwargs: Optional[dict] = None, experiment_custom_tags: Optional[Dict[str, Any]] = None, engine: Optional[Dict[str, str]] = None, verbose: bool = True, parallel: Optional[ParallelBackend] = None)
This function trains and evaluates performance of all estimators available in the model library using cross validation. The output of this function is a score grid with average cross validated scores. Metrics evaluated during CV can be accessed using the
get_metrics
function. Custom metrics can be added or removed usingadd_metric
andremove_metric
function.Example
>>> from pycaret.datasets import get_data >>> airline = get_data('airline') >>> from pycaret.time_series import * >>> exp_name = setup(data = airline, fh = 12) >>> best_model = compare_models()
- include: list of str or sktime compatible object, default = None
To train and evaluate select models, list containing model ID or scikit-learn compatible object can be passed in include param. To see a list of all models available in the model library use the
models
function.- exclude: list of str, default = None
To omit certain models from training and evaluation, pass a list containing model id in the exclude parameter. To see a list of all models available in the model library use the
models
function.- fold: int or scikit-learn compatible CV generator, default = None
Controls cross-validation. If None, the CV generator in the
fold_strategy
parameter of thesetup
function is used. When an integer is passed, it is interpreted as the ‘n_splits’ parameter of the CV generator in thesetup
function.- round: int, default = 4
Number of decimal places the metrics in the score grid will be rounded to.
- cross_validation: bool, default = True
When set to False, metrics are evaluated on holdout set.
fold
param is ignored when cross_validation is set to False.- sort: str, default = ‘MASE’
The sort order of the score grid. It also accepts custom metrics that are added through the
add_metric
function.- n_select: int, default = 1
Number of top_n models to return. For example, to select top 3 models use n_select = 3.
- budget_time: int or float, default = None
If not None, will terminate execution of the function after budget_time minutes have passed and return results up to that point.
- turbo: bool, default = True
When set to True, it excludes estimators with longer training times. To see which algorithms are excluded use the
models
function.- errors: str, default = ‘ignore’
When set to ‘ignore’, will skip the model with exceptions and continue. If ‘raise’, will break the function when exceptions are raised.
- fit_kwargs: dict, default = {} (empty dict)
Dictionary of arguments passed to the fit method of the model.
- engine: Optional[Dict[str, str]] = None
The engine to use for the models, e.g. for auto_arima, users can switch between “pmdarima” and “statsforecast” by specifying engine={“auto_arima”: “statsforecast”}
- verbose: bool, default = True
Score grid is not printed when verbose is set to False.
- parallel: pycaret.internal.parallel.parallel_backend.ParallelBackend, default = None
A ParallelBackend instance. For example if you have a SparkSession
session
, you can useFugueBackend(session)
to make this function running using Spark. For more details, seeFugueBackend
- Returns
Trained model or list of trained models, depending on the
n_select
param.
Warning
Changing turbo parameter to False may result in very high training times.
No models are logged in
MLflow
whencross_validation
parameter is False.
- create_model(estimator: Union[str, Any], fold: Optional[Union[int, Any]] = None, round: int = 4, cross_validation: bool = True, fit_kwargs: Optional[dict] = None, experiment_custom_tags: Optional[Dict[str, Any]] = None, engine: Optional[str] = None, verbose: bool = True, **kwargs)
This function trains and evaluates the performance of a given estimator using cross validation. The output of this function is a score grid with CV scores by fold. Metrics evaluated during CV can be accessed using the
get_metrics
function. Custom metrics can be added or removed usingadd_metric
andremove_metric
function. All the available models can be accessed using themodels
function.Example
>>> from pycaret.datasets import get_data >>> airline = get_data('airline') >>> from pycaret.time_series import * >>> exp_name = setup(data = airline, fh = 12) >>> naive = create_model('naive')
- estimator: str or sktime compatible object
ID of an estimator available in model library or pass an untrained model object consistent with scikit-learn API. Estimators available in the model library (ID - Name):
NOTE: The available estimators depend on multiple factors such as what libraries have been installed and the setup of the experiment. As such, some of these may not be available for your experiment. To see the list of available models, please run setup() first, then models().
‘naive’ - Naive Forecaster
‘grand_means’ - Grand Means Forecaster
‘snaive’ - Seasonal Naive Forecaster (disabled when seasonal_period = 1)
‘polytrend’ - Polynomial Trend Forecaster
‘arima’ - ARIMA family of models (ARIMA, SARIMA, SARIMAX)
‘auto_arima’ - Auto ARIMA
‘exp_smooth’ - Exponential Smoothing
‘stlf’ - STL Forecaster
‘croston’ - Croston Forecaster
‘ets’ - ETS
‘theta’ - Theta Forecaster
‘tbats’ - TBATS
‘bats’ - BATS
‘prophet’ - Prophet Forecaster
‘lr_cds_dt’ - Linear w/ Cond. Deseasonalize & Detrending
‘en_cds_dt’ - Elastic Net w/ Cond. Deseasonalize & Detrending
‘ridge_cds_dt’ - Ridge w/ Cond. Deseasonalize & Detrending
‘lasso_cds_dt’ - Lasso w/ Cond. Deseasonalize & Detrending
‘llar_cds_dt’ - Lasso Least Angular Regressor w/ Cond. Deseasonalize & Detrending
‘br_cds_dt’ - Bayesian Ridge w/ Cond. Deseasonalize & Deseasonalize & Detrending
‘huber_cds_dt’ - Huber w/ Cond. Deseasonalize & Detrending
‘omp_cds_dt’ - Orthogonal Matching Pursuit w/ Cond. Deseasonalize & Detrending
‘knn_cds_dt’ - K Neighbors w/ Cond. Deseasonalize & Detrending
‘dt_cds_dt’ - Decision Tree w/ Cond. Deseasonalize & Detrending
‘rf_cds_dt’ - Random Forest w/ Cond. Deseasonalize & Detrending
‘et_cds_dt’ - Extra Trees w/ Cond. Deseasonalize & Detrending
‘gbr_cds_dt’ - Gradient Boosting w/ Cond. Deseasonalize & Detrending
‘ada_cds_dt’ - AdaBoost w/ Cond. Deseasonalize & Detrending
‘lightgbm_cds_dt’ - Light Gradient Boosting w/ Cond. Deseasonalize & Detrending
‘catboost_cds_dt’ - CatBoost w/ Cond. Deseasonalize & Detrending
- fold: int or scikit-learn compatible CV generator, default = None
Controls cross-validation. If None, the CV generator in the
fold_strategy
parameter of thesetup
function is used. When an integer is passed, it is interpreted as the ‘n_splits’ parameter of the CV generator in thesetup
function.- round: int, default = 4
Number of decimal places the metrics in the score grid will be rounded to.
- cross_validation: bool, default = True
When set to False, metrics are evaluated on holdout set.
fold
param is ignored when cross_validation is set to False.- fit_kwargs: dict, default = {} (empty dict)
Dictionary of arguments passed to the fit method of the model.
- engine: Optional[str] = None
The engine to use for the model, e.g. for auto_arima, users can switch between “pmdarima” and “statsforecast” by specifying engine=”statsforecast”.
- verbose: bool, default = True
Score grid is not printed when verbose is set to False.
- **kwargs:
Additional keyword arguments to pass to the estimator.
- Returns
Trained Model
Warning
Models are not logged on the
MLFlow
server whencross_validation
param
is set to False.
- static update_fit_kwargs_with_fh_from_cv(fit_kwargs: Optional[Dict], cv) Dict
Updated the fit_ kwargs to include the fh parameter from cv
- Parameters
fit_kwargs (Optional[Dict]) – Original fit kwargs
cv ([type]) – cross validation object
- Returns
Updated fit kwargs
- Return type
Dict[Any]
- tune_model(estimator, fold: Optional[Union[int, Any]] = None, round: int = 4, n_iter: int = 10, custom_grid: Optional[Union[Dict[str, list], Any]] = None, optimize: str = 'MASE', custom_scorer=None, search_algorithm: Optional[str] = None, choose_better: bool = True, fit_kwargs: Optional[dict] = None, return_tuner: bool = False, verbose: bool = True, tuner_verbose: Union[int, bool] = True, **kwargs)
This function tunes the hyperparameters of a given estimator. The output of this function is a score grid with CV scores by fold of the best selected model based on
optimize
parameter. Metrics evaluated during CV can be accessed using theget_metrics
function. Custom metrics can be added or removed usingadd_metric
andremove_metric
function.Example
>>> from pycaret.datasets import get_data >>> airline = get_data('airline') >>> from pycaret.time_series import * >>> exp_name = setup(data = airline, fh = 12) >>> dt = create_model('dt_cds_dt') >>> tuned_dt = tune_model(dt)
- estimator: sktime compatible object
Trained model object
- fold: int or scikit-learn compatible CV generator, default = None
Controls cross-validation. If None, the CV generator in the
fold_strategy
parameter of thesetup
function is used. When an integer is passed, it is interpreted as the ‘n_splits’ parameter of the CV generator in thesetup
function.- round: int, default = 4
Number of decimal places the metrics in the score grid will be rounded to.
- n_iter: int, default = 10
Number of iterations in the grid search. Increasing ‘n_iter’ may improve model performance but also increases the training time.
- custom_grid: dictionary, default = None
To define custom search space for hyperparameters, pass a dictionary with parameter name and values to be iterated. Custom grids must be in a format supported by the defined
search_library
.- optimize: str, default = ‘MASE’
Metric name to be evaluated for hyperparameter tuning. It also accepts custom metrics that are added through the
add_metric
function.- custom_scorer: object, default = None
custom scoring strategy can be passed to tune hyperparameters of the model. It must be created using
sklearn.make_scorer
. It is equivalent of adding custom metric using theadd_metric
function and passing the name of the custom metric in theoptimize
parameter. Will be deprecated in future.- search_algorithm: str, default = ‘random’
use ‘random’ for random grid search and ‘grid’ for complete grid search.
- choose_better: bool, default = True
When set to True, the returned object is always better performing. The metric used for comparison is defined by the
optimize
parameter.- fit_kwargs: dict, default = {} (empty dict)
Dictionary of arguments passed to the fit method of the tuner.
- return_tuner: bool, default = False
When set to True, will return a tuple of (model, tuner_object).
- verbose: bool, default = True
Score grid is not printed when verbose is set to False.
- tuner_verbose: bool or in, default = True
If True or above 0, will print messages from the tuner. Higher values print more messages. Ignored when
verbose
param is False.- **kwargs:
Additional keyword arguments to pass to the optimizer.
- Returns
Trained Model and Optional Tuner Object when
return_tuner
is True.
- blend_models(estimator_list: list, method: str = 'mean', fold: Optional[Union[int, Any]] = None, round: int = 4, choose_better: bool = False, optimize: str = 'MASE', weights: Optional[List[float]] = None, fit_kwargs: Optional[dict] = None, verbose: bool = True)
This function trains a EnsembleForecaster for select models passed in the
estimator_list
param. Trains a sktime EnsembleForecaster under the hood. Refer to it’s documentation for more details.Example
>>> from pycaret.datasets import get_data >>> airline = get_data('airline') >>> from pycaret.time_series import * >>> exp_name = setup(data = airline, fh = 12) >>> top3 = compare_models(n_select = 3) >>> blender = blend_models(top3)
- estimator_list: list of sktime compatible estimators
List of model objects
- method: str, default = ‘mean’
Method to average the individual predictions to form a final prediction. Available Methods:
‘mean’ - Mean of individual predictions
‘gmean’ - Geometric Mean of individual predictions
‘median’ - Median of individual predictions
‘min’ - Minimum of individual predictions
‘max’ - Maximum of individual predictions
- fold: int or scikit-learn compatible CV generator, default = None
Controls cross-validation. If None, the CV generator in the
fold_strategy
parameter of thesetup
function is used. When an integer is passed, it is interpreted as the ‘n_splits’ parameter of the CV generator in thesetup
function.- round: int, default = 4
Number of decimal places the metrics in the score grid will be rounded to.
- choose_better: bool, default = False
When set to True, the returned object is always better performing. The metric used for comparison is defined by the
optimize
parameter.- optimize: str, default = ‘MASE’
Metric to compare for model selection when
choose_better
is True.- weights: list, default = None
Sequence of weights (float or int) to apply to the individual model predictons. Uses uniform weights when None. Note that weights only apply ‘mean’, ‘gmean’ and ‘median’ methods.
- fit_kwargs: dict, default = {} (empty dict)
Dictionary of arguments passed to the fit method of the model.
- verbose: bool, default = True
Score grid is not printed when verbose is set to False.
- Returns
Trained Model
- static plot_model_check_display_format_(display_format: Optional[str])
Checks if the display format is in the allowed list.
- display_format: Optional[str], default = None
The to-be-used displaying method
- plot_model(estimator: Optional[Any] = None, plot: Optional[str] = None, return_fig: bool = False, return_data: bool = False, verbose: bool = False, display_format: Optional[str] = None, data_kwargs: Optional[Dict] = None, fig_kwargs: Optional[Dict] = None, save: Union[str, bool] = False) Optional[Tuple[str, list]]
This function analyzes the performance of a trained model on holdout set. When used without any estimator, this function generates plots on the original data set. When used with an estimator, it will generate plots on the model residuals.
Example
>>> from pycaret.datasets import get_data >>> airline = get_data('airline') >>> from pycaret.time_series import * >>> exp_name = setup(data = airline, fh = 12) >>> plot_model(plot="diff", data_kwargs={"order_list": [1, 2], "acf": True, "pacf": True}) >>> plot_model(plot="diff", data_kwargs={"lags_list": [[1], [1, 12]], "acf": True, "pacf": True}) >>> arima = create_model('arima') >>> plot_model(plot = 'ts') >>> plot_model(plot = 'decomp', data_kwargs = {'type' : 'multiplicative'}) >>> plot_model(plot = 'decomp', data_kwargs = {'seasonal_period': 24}) >>> plot_model(estimator = arima, plot = 'forecast', data_kwargs = {'fh' : 24}) >>> tuned_arima = tune_model(arima) >>> plot_model([arima, tuned_arima], data_kwargs={"labels": ["Baseline", "Tuned"]})
- estimator: sktime compatible object, default = None
Trained model object
- plot: str, default = None
Default is ‘ts’ when estimator is None, When estimator is not None, default is changed to ‘forecast’. List of available plots (ID - Name):
‘ts’ - Time Series Plot
‘train_test_split’ - Train Test Split
‘cv’ - Cross Validation
‘acf’ - Auto Correlation (ACF)
‘pacf’ - Partial Auto Correlation (PACF)
‘decomp’ - Classical Decomposition
‘decomp_stl’ - STL Decomposition
‘diagnostics’ - Diagnostics Plot
‘diff’ - Difference Plot
‘periodogram’ - Frequency Components (Periodogram)
‘fft’ - Frequency Components (FFT)
‘ccf’ - Cross Correlation (CCF)
‘forecast’ - “Out-of-Sample” Forecast Plot
‘insample’ - “In-Sample” Forecast Plot
‘residuals’ - Residuals Plot
- return_fig: bool, default = False
When set to True, it returns the figure used for plotting. When set to False (the default), it will print the plot, but not return it.
- return_data: bool, default = False
When set to True, it returns the data for plotting. If both return_fig and return_data is set to True, order of return is figure then data.
- verbose: bool, default = True
Unused for now
- display_format: str, default = None
Display format of the plot. Must be one of [None, ‘streamlit’, ‘plotly-dash’, ‘plotly-widget’], if None, it will render the plot as a plain plotly figure.
The ‘plotly-dash’ and ‘plotly-widget’ formats will render the figure via plotly-resampler (https://github.com/predict-idlab/plotly-resampler) figures. These plots perform dynamic aggregation of the data based on the front-end graph view. This approach is especially useful when dealing with large data, as it will retain snappy, interactive performance. * ‘plotly-dash’ uses a dash-app to realize this dynamic aggregation. The
dash app requires a network port, and can be configured with various modes more information can be found at the show_dash documentation. (https://predict-idlab.github.io/plotly-resampler/figure_resampler.html#plotly_resampler.figure_resampler.FigureResampler.show_dash)
‘plotly-widget’ uses a plotly FigureWidget to realize this dynamic aggregation, and should work in IPython based environments (given that the external widgets are supported and the jupyterlab-plotly extension is installed).
To display plots in Streamlit (https://www.streamlit.io/), set this to ‘streamlit’.
- data_kwargs: dict, default = None
Dictionary of arguments passed to the data for plotting.
Available keys are:
- nlags: The number of lags to use when plotting correlation plots, e.g.
ACF, PACF, CCF. If not provided, default internally calculated values are used.
- seasonal_period: The seasonal period to use for decomposition plots.
If not provided, the default internally detected seasonal period is used.
- type: The type of seasonal decomposition to perform. Options are:
[“additive”, “multiplicative”]
- order_list: The differencing orders to use for difference plots. e.g.
[1, 2] will plot first and second order differences (corresponding to d = 1 and 2 in ARIMA models).
- lags_list: An alternate and more explicit alternate to “order_list”
allowing users to specify the exact lags to plot. e.g. [1, [1, 12]] will plot first difference and a second plot with first difference (d = 1 in ARIMA) and seasonal 12th difference (D=1, s=12 in ARIMA models). Also note that “order_list” = [2] can be alternately specified as lags_list = [[1, 1]] i.e. successive differencing twice.
- acf: True/False
When specified in difference plots and set to True, this will plot the ACF of the differenced data as well.
- pacf: True/False
When specified in difference plots and set to True, this will plot the PACF of the differenced data as well.
- periodogram: True/False
When specified in difference plots and set to True, this will plot the Periodogram of the differenced data as well.
- fft: True/False
When specified in difference plots and set to True, this will plot the FFT of the differenced data as well.
- labels: When estimator(s) are provided, the corresponding labels to
use for the plots. If not provided, the model class is used to derive the labels.
- include: When data contains exogenous variables, then only specific
exogenous variables can be plotted using this key. e.g. include = [“col1”, “col2”]
- exclude: When data contains exogenous variables, specific exogenous
variables can be excluded from the plots using this key. e.g. exclude = [“col1”, “col2”]
- alpha: The quantile value to use for point prediction. If not provided,
then the value specified during setup is used.
- coverage: The coverage value to use for prediction intervals. If not
provided, then the value specified during setup is used.
- fh: The forecast horizon to use for forecasting. If not provided, then
the one used during model training is used.
- X: When a model trained with exogenous variables has been finalized,
user can provide the future values of the exogenous variables to make future target time series predictions using this key.
- plot_data_type: When plotting the data used for modeling, user may
wish to see plots with the original data set provided, the imputed dataset (if imputation is set) or the transformed dataset (which includes any imputation and transformation set by the user). This keyword can be used to specify which data type to use.
NOTE: (1) If no imputation is specified, then plotting the “imputed”
data type will produce the same results as the “original” data type.
If no transformations are specified, then plotting the “transformed” data type will produce the same results as the “imputed” data type.
Allowed values are (if not specified, defaults to the first one in the list):
“ts”: [“original”, “imputed”, “transformed”] “train_test_split”: [“original”, “imputed”, “transformed”] “cv”: [“original”] “acf”: [“transformed”, “imputed”, “original”] “pacf”: [“transformed”, “imputed”, “original”] “decomp”: [“transformed”, “imputed”, “original”] “decomp_stl”: [“transformed”, “imputed”, “original”] “diagnostics”: [“transformed”, “imputed”, “original”] “diff”: [“transformed”, “imputed”, “original”] “forecast”: [“original”, “imputed”] “insample”: [“original”, “imputed”] “residuals”: [“original”, “imputed”] “periodogram”: [“transformed”, “imputed”, “original”] “fft”: [“transformed”, “imputed”, “original”] “ccf”: [“transformed”, “imputed”, “original”]
Some plots (marked as True below) will also allow specifying multiple of data types at once.
“ts”: True “train_test_split”: True “cv”: False “acf”: True “pacf”: True “decomp”: True “decomp_stl”: True “diagnostics”: True “diff”: False “forecast”: False “insample”: False “residuals”: False “periodogram”: True “fft”: True “ccf”: False
- fig_kwargs: dict, default = {} (empty dict)
The setting to be used for the plot. Overrides any global setting passed during setup. Pass these as key-value pairs. For available keys, refer to the setup documentation.
Time-series plots support more display_formats, as a result the fig-kwargs can also contain the resampler_kwargs key and its corresponding dict. These are additional keyword arguments that are fed to the display function. This is mainly used for configuring plotly-resampler visualizations (i.e., display_format “plotly-dash” or “plotly-widget”) which down sampler will be used; how many data points are shown in the front-end.
When the plotly-resampler figure is rendered via Dash (by setting the display_format to “plotly-dash”), one can also use the “show_dash” key within this dictionary to configure the show_dash args.
example:
fig_kwargs = { "width": None, "resampler_kwargs": { "default_n_shown_samples": 1000, "show_dash": {"mode": "inline", "port": 9012} } }
- save: string or bool, default = False
When set to True, Plot is saved as a ‘png’ file in current working directory. When a path destination is given, Plot is saved as a ‘png’ file the given path to the directory of choice.
- Returns
Path to saved file and list containing figure and data, if any.
- predict_model(estimator, fh=None, X: Optional[DataFrame] = None, return_pred_int: bool = False, alpha: Optional[float] = None, coverage: Union[float, List[float]] = 0.9, round: int = 4, verbose: bool = True) DataFrame
This function forecast using a trained model. When
fh
is None, it forecasts using the same forecast horizon used during the training.Example
>>> from pycaret.datasets import get_data >>> airline = get_data('airline') >>> from pycaret.time_series import * >>> exp_name = setup(data = airline, fh = 12) >>> arima = create_model('arima') >>> pred_holdout = predict_model(arima) >>> pred_unseen = predict_model(finalize_model(arima), fh = 24)
- estimator: sktime compatible object
Trained model object
- fh: Optional[Union[List[int], int, np.array, ForecastingHorizon]], default = None
Number of points from the last date of training to forecast. When fh is None, it forecasts using the same forecast horizon used during the training.
- X: pd.DataFrame, default = None
Exogenous Variables to be used for prediction. Before finalizing the estimator, X need not be passed even when the estimator is built using exogenous variables (since this is taken care of internally by using the exogenous variables from test split). When estimator has been finalized and estimator used exogenous variables, then X must be passed.
- return_pred_int: bool, default = False
When set to True, it returns lower bound and upper bound prediction interval, in addition to the point prediction.
- alpha: Optional[float], default = None
The alpha (quantile) value to use for the point predictions. Refer to the “point_alpha” description in the setup docstring for details.
- coverage: Union[float, List[float]], default = 0.9
The coverage to be used for prediction intervals. Refer to the “coverage” description in the setup docstring for details.
- round: int, default = 4
Number of decimal places to round predictions to.
- verbose: bool, default = True
When set to False, holdout score grid is not printed.
- Returns
pandas.DataFrame
- finalize_model(estimator, fit_kwargs: Optional[dict] = None, model_only: bool = False, experiment_custom_tags: Optional[Dict[str, Any]] = None) Any
This function trains a given estimator on the entire dataset including the holdout set.
Example
>>> from pycaret.datasets import get_data >>> data = get_data('airline') >>> from pycaret.time_series import * >>> exp_name = setup(data = data, fh = 12) >>> arima = create_model('arima') >>> final_arima = finalize_model(arima)
- estimator: sktime compatible object
Trained model object
- fit_kwargs: dict, default = None
Dictionary of arguments passed to the fit method of the model.
- model_only: bool, default = True
Parameter not in use for now. Behavior may change in future.
- Returns
Trained pipeline or model object fitted on complete dataset.
- deploy_model(model, model_name: str, authentication: dict, platform: str = 'aws')
This function deploys the transformation pipeline and trained model on cloud.
Example
>>> from pycaret.datasets import get_data >>> data = get_data('airline') >>> from pycaret.time_series import * >>> exp_name = setup(data = data, fh = 12) >>> arima = create_model('arima') >>> deploy_model( model = arima, model_name = 'arima-for-deployment', platform = 'aws', authentication = {'bucket' : 'S3-bucket-name'} )
- Amazon Web Service (AWS) users:
To deploy a model on AWS S3 (‘aws’), environment variables must be set in your local environment. To configure AWS environment variables, type
aws configure
in the command line. Following information from the IAM portal of amazon console account is required:AWS Access Key ID
AWS Secret Key Access
Default Region Name (can be seen under Global settings on your AWS console)
More info: https://docs.aws.amazon.com/cli/latest/userguide/cli-configure-envvars.html
- Google Cloud Platform (GCP) users:
To deploy a model on Google Cloud Platform (‘gcp’), project must be created using command line or GCP console. Once project is created, you must create a service account and download the service account key as a JSON file to set environment variables in your local environment.
More info: https://cloud.google.com/docs/authentication/production
- Microsoft Azure (Azure) users:
To deploy a model on Microsoft Azure (‘azure’), environment variables for connection string must be set in your local environment. Go to settings of storage account on Azure portal to access the connection string required.
- model: scikit-learn compatible object
Trained model object
- model_name: str
Name of model.
- authentication: dict
Dictionary of applicable authentication tokens.
When platform = ‘aws’: {‘bucket’ : ‘S3-bucket-name’, ‘path’: (optional) folder name under the bucket}
When platform = ‘gcp’: {‘project’: ‘gcp-project-name’, ‘bucket’ : ‘gcp-bucket-name’}
When platform = ‘azure’: {‘container’: ‘azure-container-name’}
- platform: str, default = ‘aws’
Name of the platform. Currently supported platforms: ‘aws’, ‘gcp’ and ‘azure’.
- Returns
None
- save_model(model, model_name: str, model_only: bool = False, verbose: bool = True)
This function saves the transformation pipeline and trained model object into the current working directory as a pickle file for later use.
Example
>>> from pycaret.datasets import get_data >>> data = get_data('airline') >>> from pycaret.time_series import * >>> exp_name = setup(data = data, fh = 12) >>> arima = create_model('arima') >>> save_model(arima, 'saved_arima_model')
- model: sktime compatible object
Trained model object
- model_name: str
Name of the model.
- model_only: bool, default = False
When set to True, only trained model object is saved instead of the entire pipeline.
- verbose: bool, default = True
Success message is not printed when verbose is set to False.
- Returns
Tuple of the model object and the filename.
- load_model(model_name: str, platform: Optional[str] = None, authentication: Optional[Dict[str, str]] = None, verbose: bool = True)
This function loads a previously saved pipeline/model.
Example
>>> from pycaret.time_series import load_model >>> saved_arima = load_model('saved_arima_model')
- model_name: str
Name of the model.
- platform: str, default = None
Name of the cloud platform. Currently supported platforms: ‘aws’, ‘gcp’ and ‘azure’.
- authentication: dict, default = None
dictionary of applicable authentication tokens.
when platform = ‘aws’: {‘bucket’ : ‘S3-bucket-name’}
when platform = ‘gcp’: {‘project’: ‘gcp-project-name’, ‘bucket’ : ‘gcp-bucket-name’}
when platform = ‘azure’: {‘container’: ‘azure-container-name’}
- verbose: bool, default = True
Success message is not printed when verbose is set to False.
- Returns
Trained Model
- models(type: Optional[str] = None, internal: bool = False, raise_errors: bool = True) DataFrame
Returns table of models available in the model library.
Example
>>> from pycaret.datasets import get_data >>> data = get_data('airline') >>> from pycaret.time_series import * >>> exp_name = setup(data = data, fh = 12) >>> models()
- type: str, default = None
baseline : filters and only return baseline models
classical : filters and only return classical models
linear : filters and only return linear models
tree : filters and only return tree based models
neighbors : filters and only return neighbors models
- internal: bool, default = False
When True, will return extra columns and rows used internally.
- raise_errors: bool, default = True
When False, will suppress all exceptions, ignoring models that couldn’t be created.
- Returns
pandas.DataFrame
- get_metrics(reset: bool = False, include_custom: bool = True, raise_errors: bool = True) DataFrame
Returns table of available metrics used for CV.
Example
>>> from pycaret.datasets import get_data >>> airline = get_data('airline') >>> from pycaret.time_series import * >>> exp_name = setup(data = airline, fh = 12) >>> all_metrics = get_metrics()
- reset: bool, default = False
When True, will reset all changes made using the
add_metric
andremove_metric
function.- include_custom: bool, default = True
Whether to include user added (custom) metrics or not.
- raise_errors: bool, default = True
If False, will suppress all exceptions, ignoring models that couldn’t be created.
- Returns
pandas.DataFrame
- add_metric(id: str, name: str, score_func: type, greater_is_better: bool = True, **kwargs) Series
Adds a custom metric to be used for CV.
Example
>>> from pycaret.datasets import get_data >>> airline = get_data('airline') >>> from pycaret.time_series import * >>> exp_name = setup(data = airline, fh = 12) >>> from sklearn.metrics import explained_variance_score >>> add_metric('evs', 'EVS', explained_variance_score)
- id: str
Unique id for the metric.
- name: str
Display name of the metric.
- score_func: type
Score function (or loss function) with signature
score_func(y, y_pred, **kwargs)
.- greater_is_better: bool, default = True
Whether
score_func
is higher the better or not.- **kwargs:
Arguments to be passed to score function.
- Returns
pandas.Series
- remove_metric(name_or_id: str)
Removes a metric from CV.
Example
>>> from pycaret.datasets import get_data >>> data = get_data('airline') >>> from pycaret.time_series import * >>> exp_name = setup(data = data, fh = 12) >>> remove_metric('MAPE')
- name_or_id: str
Display name or ID of the metric.
- Returns
None
- get_logs(experiment_name: Optional[str] = None, save: bool = False) DataFrame
Returns a table of experiment logs. Only works when
log_experiment
is True when initializing thesetup
function.Example
>>> from pycaret.datasets import get_data >>> data = get_data('airline') >>> from pycaret.time_series import * >>> exp_name = setup(data = data, fh = 12) >>> best = compare_models() >>> exp_logs = get_logs()
- experiment_name: str, default = None
When None current active run is used.
- save: bool, default = False
When set to True, csv file is saved in current working directory.
- Returns
pandas.DataFrame
- get_fold_generator(fold: Optional[Union[int, Any]] = None, fold_strategy: Optional[str] = None) Union[ExpandingWindowSplitter, SlidingWindowSplitter]
Returns the cv object based on number of folds and fold_strategy
- Parameters
fold (Optional[Union[int, Any]]) – The number of folds (int), by default None which returns the fold generator (cv object) defined during setup. Could also be a sktime cross-validation object. If it is a sktime cross-validation object, it is simply returned back
fold_strategy (Optional[str], optional) – The fold strategy - ‘expanding’ or ‘sliding’, by default None which takes the strategy set during setup
- Returns
The sktime compatible cross-validation object. e.g. ExpandingWindowSplitter or SlidingWindowSplitter
- Return type
Union[ExpandingWindowSplitter, SlidingWindowSplitter]
- Raises
ValueError – If not enough data points to support the number of folds requested
- check_stats(estimator: Optional[Any] = None, test: str = 'all', alpha: float = 0.05, split: str = 'all', data_type: str = 'transformed', data_kwargs: Optional[Dict] = None) DataFrame
This function is used to get summary statistics and run statistical tests on the original data or model residuals.
Example
>>> from pycaret.datasets import get_data >>> airline = get_data('airline') >>> from pycaret.time_series import * >>> exp_name = setup(data = airline, fh = 12) >>> check_stats(test="summary") >>> check_stats(test="adf") >>> arima = create_model('arima') >>> check_stats(arima, test = 'white_noise')
- Parameters
estimator (sktime compatible object, optional) – Trained model object, by default None
- teststr, optional
Name of the test to be performed, by default “all”
Options are:
‘summary’ - Summary Statistics
‘white_noise’ - Ljung-Box Test for white noise
‘adf’ - ADF test for difference stationarity
‘kpss’ - KPSS test for trend stationarity
‘stationarity’ - ADF and KPSS test
‘normality’ - Shapiro Test for Normality
‘all’ - All of the above tests
- alphafloat, optional
Significance Level, by default 0.05
- splitstr, optional
The split of the original data to run the test on. Only applicable when test is run on the original data (not residuals), by default “all”
Options are:
‘all’ - Complete Dataset
‘train’ - The Training Split of the dataset
‘test’ - The Test Split of the dataset
- data_typestr, optional
The data type to use for the statistical test, by default “transformed”.
User may wish to perform the tests on the original data set provided, the imputed dataset (if imputation is set) or the transformed dataset (which includes any imputation and transformation set by the user). This keyword can be used to specify which data type to use.
Allowed values are: [“original”, “imputed”, “transformed”]
NOTE: (1) If no imputation is specified, then testing on the “imputed”
data type will produce the same results as the “original” data type.
If no transformations are specified, then testing the “transformed” data type will produce the same results as the “imputed” data type.
By default, tests are done on the “transformed” data since that is the data that is fed to the model during training.
- data_kwargsOptional[Dict], optional
Users can specify lags list or order_list to run the test for the data as well as for its lagged versions, by default None
>>> check_stats(test="white_noise", data_kwargs={"order_list": [1, 2]}) >>> check_stats(test="white_noise", data_kwargs={"lags_list": [1, [1, 12]]})
Returns:
- pd.DataFrame
Dataframe with the test results
- get_residuals(estimator: BaseForecaster) Optional[Series]
_summary_
- Parameters
estimator (BaseForecaster) – sktime compatible model (without the pipeline). i.e. last step of the pipeline TransformedTargetForecaster
- Returns
Insample residuals. None if estimator does not support insample predictions
- Return type
Optional[pd.Series]
References
https://github.com/sktime/sktime/issues/1105#issuecomment-932216820
- get_insample_predictions(estimator: BaseForecaster) Optional[DataFrame]
Returns the insample predictions for the estimator by appropriately taking the entire pipeline into consideration.
- Parameters
estimator (BaseForecaster) – sktime compatible model (without the pipeline). i.e. last step of the pipeline TransformedTargetForecaster
- Returns
Insample predictions. None if estimator does not support insample predictions
- Return type
Optional[pd.DataFrame]
References
# https://github.com/sktime/sktime/issues/1105#issuecomment-932216820 # https://github.com/sktime/sktime/blob/87bdf36dbc0990f29942eb6f7fa56a8e6c5fa7b7/sktime/forecasting/base/_base.py#L699
- get_additional_scorer_kwargs() Dict[str, Any]
Returns additional kwargs required by some scorers (such as MASE).
NOTE: These are kwargs that are experiment specific (can only be derived from the experiment), e.g. sp and not fold specific like y_train. In other words, these kwargs are applicable to all folds. Fold specific kwargs such as y_train, lower, upper, etc. must be updated dynamically.
- Returns
Additional kwargs to pass to scorers
- Return type
Dict[str, Any]
- pycaret.time_series.setup(data: Union[Series, DataFrame] = None, data_func: Optional[Callable[[], Union[Series, DataFrame]]] = None, target: Optional[str] = None, index: Optional[str] = None, ignore_features: Optional[List] = None, numeric_imputation_target: Optional[Union[str, int, float]] = None, numeric_imputation_exogenous: Optional[Union[str, int, float]] = None, transform_target: Optional[str] = None, transform_exogenous: Optional[str] = None, fe_target_rr: Optional[list] = None, fe_exogenous: Optional[list] = None, scale_target: Optional[str] = None, scale_exogenous: Optional[str] = None, fold_strategy: Union[str, Any] = 'expanding', fold: int = 3, fh: Optional[Union[List[int], int, ndarray, ForecastingHorizon]] = 1, hyperparameter_split: str = 'all', seasonal_period: Optional[Union[List[Union[int, str]], int, str]] = None, ignore_seasonality_test: bool = False, sp_detection: str = 'auto', max_sp_to_consider: Optional[int] = 60, remove_harmonics: bool = False, harmonic_order_method: str = 'harmonic_max', num_sps_to_use: int = 1, seasonality_type: str = 'mul', point_alpha: Optional[float] = None, coverage: Union[float, List[float]] = 0.9, enforce_exogenous: bool = True, n_jobs: Optional[int] = -1, use_gpu: bool = False, custom_pipeline: Optional[Any] = None, html: bool = True, session_id: Optional[int] = None, system_log: Union[bool, str, Logger] = True, log_experiment: bool = False, experiment_name: Optional[str] = None, log_plots: Union[bool, list] = False, log_profile: bool = False, log_data: bool = False, verbose: bool = True, profile: bool = False, profile_kwargs: Optional[Dict[str, Any]] = None, fig_kwargs: Optional[Dict[str, Any]] = None)
This function initializes the training environment and creates the transformation pipeline. Setup function must be called before executing any other function. It takes one mandatory parameters:
data
. All the other parameters are optional.Example
>>> from pycaret.datasets import get_data >>> airline = get_data('airline') >>> from pycaret.time_series import * >>> exp_name = setup(data = airline, fh = 12)
- datapandas.Series or pandas.DataFrame = None
Shape (n_samples, 1), when pandas.DataFrame, otherwise (n_samples, ).
- data_func: Callable[[], Union[pd.Series, pd.DataFrame]] = None
The function that generate
data
(the dataframe-like input). This is useful when the dataset is large, and you need parallel operations such ascompare_models
. It can avoid broadcasting large dataset from driver to workers. Notice one and only one ofdata
anddata_func
must be set.- targetOptional[str], default = None
Target name to be forecasted. Must be specified when data is a pandas DataFrame with more than 1 column. When data is a pandas Series or pandas DataFrame with 1 column, this can be left as None.
- index: Optional[str], default = None
Column name to be used as the datetime index for modeling. If ‘index’ column is specified & is of type string, it is assumed to be coercible to pd.DatetimeIndex using pd.to_datetime(). It can also be of type Int (e.g. RangeIndex, Int64Index), or DatetimeIndex or PeriodIndex in which case, it is processed appropriately. If None, then the data’s index is used as is for modeling.
- ignore_features: Optional[List], default = None
List of features to ignore for modeling when the data is a pandas Dataframe with more than 1 column. Ignored when data is a pandas Series or Dataframe with 1 column.
- numeric_imputation_target: Optional[Union[int, float, str]], default = None
Indicates how to impute missing values in the target. If None, no imputation is done. If the target has missing values, then imputation is mandatory. If str, then value passed as is to the underlying sktime imputer. Allowed values are:
“drift”, “linear”, “nearest”, “mean”, “median”, “backfill”, “bfill”, “pad”, “ffill”, “random”
If int or float, imputation method is set to “constant” with the given value.
- numeric_imputation_exogenous: Optional[Union[int, float, str]], default = None
Indicates how to impute missing values in the exogenous variables. If None, no imputation is done. If exogenous variables have missing values, then imputation is mandatory. If str, then value passed as is to the underlying sktime imputer. Allowed values are:
“drift”, “linear”, “nearest”, “mean”, “median”, “backfill”, “bfill”, “pad”, “ffill”, “random”
If int or float, imputation method is set to “constant” with the given value.
- transform_target: Optional[str], default = None
Indicates how the target variable should be transformed. If None, no transformation is performed. Allowed values are
“box-cox”, “log”, “sqrt”, “exp”, “cos”
- transform_exogenous: Optional[str], default = None
Indicates how the exogenous variables should be transformed. If None, no transformation is performed. Allowed values are
“box-cox”, “log”, “sqrt”, “exp”, “cos”
- scale_target: Optional[str], default = None
Indicates how the target variable should be scaled. If None, no scaling is performed. Allowed values are
“zscore”, “minmax”, “maxabs”, “robust”
- scale_exogenous: Optional[str], default = None
Indicates how the exogenous variables should be scaled. If None, no scaling is performed. Allowed values are
“zscore”, “minmax”, “maxabs”, “robust”
- fe_target_rr: Optional[list], default = None
The transformers to be applied to the target variable in order to extract useful features. By default, None which means that the provided target variable are used “as is”.
NOTE: Most statistical and baseline models already use features (lags) for target variables implicitly. The only place where target features have to be created explicitly is in reduced regression models. Hence, this feature extraction is only applied to reduced regression models.
>>> import numpy as np >>> from pycaret.datasets import get_data >>> from sktime.transformations.series.summarize import WindowSummarizer
>>> data = get_data("airline")
>>> kwargs = {"lag_feature": {"lag": [36, 24, 13, 12, 11, 9, 6, 3, 2, 1]}} >>> fe_target_rr = [WindowSummarizer(n_jobs=1, truncate="bfill", **kwargs)]
>>> # Baseline >>> exp = TSForecastingExperiment() >>> exp.setup(data=data, fh=12, fold=3, session_id=42) >>> model1 = exp.create_model("lr_cds_dt")
>>> # With Feature Engineering >>> exp = TSForecastingExperiment() >>> exp.setup( >>> data=data, fh=12, fold=3, fe_target_rr=fe_target_rr, session_id=42 >>> ) >>> model2 = exp.create_model("lr_cds_dt")
>>> exp.plot_model([model1, model2], data_kwargs={"labels": ["Baseline", "With FE"]})
- fe_exogenousOptional[list] = None
The transformations to be applied to the exogenous variables. These transformations are used for all models that accept exogenous variables. By default, None which means that the provided exogenous variables are used “as is”.
>>> import numpy as np >>> from sktime.transformations.series.summarize import WindowSummarizer
>>> # Example: function num_above_thresh to count how many observations lie above >>> # the threshold within a window of length 2, lagged by 0 periods. >>> def num_above_thresh(x): >>> '''Count how many observations lie above threshold.''' >>> return np.sum((x > 0.7)[::-1])
>>> kwargs1 = {"lag_feature": {"lag": [0, 1], "mean": [[0, 4]]}} >>> kwargs2 = { >>> "lag_feature": { >>> "lag": [0, 1], num_above_thresh: [[0, 2]], >>> "mean": [[0, 4]], "std": [[0, 4]] >>> } >>> }
>>> fe_exogenous = [ >>> ( "a", WindowSummarizer( >>> n_jobs=1, target_cols=["Income"], truncate="bfill", **kwargs1 >>> ) >>> ), >>> ( >>> "b", WindowSummarizer( >>> n_jobs=1, target_cols=["Unemployment", "Production"], truncate="bfill", **kwargs2 >>> ) >>> ), >>> ]
>>> data = get_data("uschange") >>> exp = TSForecastingExperiment() >>> exp.setup( >>> data=data, target="Consumption", fh=12, >>> fe_exogenous=fe_exogenous, session_id=42 >>> ) >>> print(f"Feature Columns: {exp.get_config('X_transformed').columns}") >>> model = exp.create_model("lr_cds_dt")
- fold_strategy: str or sklearn CV generator object, default = ‘expanding’
Choice of cross validation strategy. Possible values are:
‘expanding’
‘rolling’ (same as/aliased to ‘expanding’)
‘sliding’
You can also pass an sktime compatible cross validation object such as
SlidingWindowSplitter
orExpandingWindowSplitter
. In this case, the fold and fh parameters will be ignored and these values will be extracted from thefold_strategy
object directly.- fold: int, default = 3
Number of folds to be used in cross validation. Must be at least 2. This is a global setting that can be over-written at function level by using
fold
parameter. Ignored whenfold_strategy
is a custom object.- fh: Optional[int or list or np.array or ForecastingHorizon], default = 1
The forecast horizon to be used for forecasting. Default is set to
1
i.e. forecast one point ahead. Valid options are: (1) Integer: When integer is passed it means N continuous points inthe future without any gap.
List or np.array: Indicates points to predict in the future. e.g. fh = [1, 2, 3, 4] or np.arange(1, 5) will predict 4 points in the future.
If you want to forecast values with gaps, you can pass an list or array with gaps. e.g. np.arange([13, 25]) will skip the first 12 future points and forecast from the 13th point till the 24th point ahead (note in numpy right value is inclusive and left is exclusive).
Can also be a sktime compatible ForecastingHorizon object.
If fh = None, then fold_strategy must be a sktime compatible cross validation object. In this case, fh is derived from this object.
- hyperparameter_split: str, default = “all”
The split of data used to determine certain hyperparameters such as “seasonal_period”, whether multiplicative seasonality can be used or not, whether the data is white noise or not, the values of non-seasonal difference “d” and seasonal difference “D” to use in certain models. Allowed values are: [“all”, “train”]. Refer for more details: https://github.com/pycaret/pycaret/issues/3202
- seasonal_period: list or int or str, default = None
Seasonal periods to use when performing seasonality checks (i.e. candidates).
Users can provide seasonal_period by passing it as an integer or a string corresponding to the keys below (e.g. ‘W’ for weekly data, ‘M’ for monthly data, etc.).
B, C = 5
D = 7
W = 52
M, BM, CBM, MS, BMS, CBMS = 12
SM, SMS = 24
Q, BQ, QS, BQS = 4
A, Y, BA, BY, AS, YS, BAS, BYS = 1
H = 24
T, min = 60
S = 60
Users can also provide a list of such values to use in models that accept multiple seasonal values (currently TBATS). For models that don’t accept multiple seasonal values, the first value of the list will be used as the seasonal period.
NOTE: (1) If seasonal_period is provided, whether the seasonality check is performed or not depends on the ignore_seasonality_test setting. (2) If seasonal_period is not provided, then the candidates are detected per the sp_detection setting. If seasonal_period is provided, sp_detection setting is ignored.
- ignore_seasonality_test: bool = False
Whether to ignore the seasonality test or not. Applicable when seasonal_period is provided. If False, then a seasonality tests is performed to determine if the provided seasonal_period is valid or not. If it is found to be not valid, no seasonal period is used for modeling. If True, then the the provided seasonal_period is used as is.
- sp_detection: str, default = “auto”
If seasonal_period is None, then this parameter determines the algorithm to use to detect the seasonal periods to use in the models.
Allowed values are [“auto” or “index”].
If “auto”, then seasonal periods are detected using statistical tests. If “index”, then the frequency of the data index is mapped to a seasonal period as shown in seasonal_period.
- max_sp_to_consider: Optional[int], default = 60,
Max period to consider when detecting seasonal periods. If None, all periods up to int((“length of data”-1)/2) are considered. Length of the data is determined by hyperparameter_split setting.
- remove_harmonics: bool, default = False
Should harmonics be removed when considering what seasonal periods to use for modeling.
- harmonic_order_method: str, default = “harmonic_max”
Applicable when remove_harmonics = True. This determines how the harmonics are replaced. Allowed values are “harmonic_strength”, “harmonic_max” or “raw_strength. - If set to “harmonic_max”, then lower seasonal period is replaced by its highest harmonic seasonal period in same position as the lower seasonal period. - If set to “harmonic_strength”, then lower seasonal period is replaced by its highest strength harmonic seasonal period in same position as the lower seasonal period. - If set to “raw_strength”, then lower seasonal periods is removed and the higher harmonic seasonal periods is retained in its original position based on its seasonal strength.
e.g. Assuming detected seasonal periods in strength order are [2, 3, 4, 50] and remove_harmonics = True, then: - If harmonic_order_method = “harmonic_max”, result = [50, 3, 4] - If harmonic_order_method = “harmonic_strength”, result = [4, 3, 50] - If harmonic_order_method = “raw_strength”, result = [3, 4, 50]
- num_sps_to_use: int, default = 1
It determines the maximum number of seasonal periods to use in the models. Set to -1 to use all detected seasonal periods (in models that allow multiple seasonalities). If a model only allows one seasonal period and num_sps_to_use > 1, then the most dominant (primary) seasonal that is detected is used.
- seasonality_typestr, default = “mul”
The type of seasonality to use. Allowed values are [“add”, “mul” or “auto”]
The detection flow sequence is as follows: (1) If seasonality is not detected, then seasonality type is set to None. (2) If seasonality is detected but data is not strictly positive, then seasonality type is set to “add”. (3) If seasonality_type is “auto”, then the type of seasonality is determined using an internal algorithm as follows
If seasonality is detected, then data is decomposed using
additive and multiplicative seasonal decomposition. Then seasonality type is selected based on seasonality strength per FPP (https://otexts.com/fpp2/seasonal-strength.html). NOTE: For Multiplicative, the denominator multiplies the seasonal and residual components instead of adding them. Rest of the calculations remain the same. If seasonal decomposition fails for any reason, then defaults to multiplicative seasonality.
Otherwise, seasonality_type is set to the user provided value.
- point_alpha: Optional[float], default = None
The alpha (quantile) value to use for the point predictions. By default this is set to None which uses sktime’s predict() method to get the point prediction (the mean or the median of the forecast distribution). If this is set to a floating point value, then it switches to using the predict_quantiles() method to get the point prediction at the user specified quantile. Reference: https://robjhyndman.com/hyndsight/quantile-forecasts-in-r/
NOTE: (1) Not all models support predict_quantiles(), hence, if a float value is provided, these models will be disabled. (2) Under some conditions, the user may want to only work with models that support prediction intervals. Utilizing note 1 to our advantage, the point_alpha argument can be set to 0.5 (or any float value depending on the quantile that the user wants to use for point predictions). This will disable models that do not support prediction intervals.
- coverage: Union[float, List[float]], default = 0.9
The coverage to be used for prediction intervals (only applicable for models that support prediction intervals).
If a float value is provides, it corresponds to the coverage needed (e.g. 0.9 means 90% coverage). This corresponds to lower and upper quantiles = 0.05 and 0.95 respectively.
Alternately, if user wants to get the intervals at specific quantiles, a list of 2 values can be provided directly. e.g. coverage = [0.2. 0.9] will return the lower interval corresponding to a quantile of 0.2 and an upper interval corresponding to a quantile of 0.9.
- enforce_exogenous: bool, default = True
When set to True and the data includes exogenous variables, only models that support exogenous variables are loaded in the environment.When set to False, all models are included and in this case, models that do not support exogenous variables will model the data as a univariate forecasting problem.
- n_jobs: int, default = -1
The number of jobs to run in parallel (for functions that supports parallel processing) -1 means using all processors. To run all functions on single processor set n_jobs to None.
- use_gpu: bool or str, default = False
Parameter not in use for now. Behavior may change in future.
- custom_pipeline: list of (str, transformer), dict or Pipeline, default = None
Parameter not in use for now. Behavior may change in future.
- html: bool, default = True
When set to False, prevents runtime display of monitor. This must be set to False when the environment does not support IPython. For example, command line terminal, Databricks Notebook, Spyder and other similar IDEs.
- session_id: int, default = None
Controls the randomness of experiment. It is equivalent to ‘random_state’ in scikit-learn. When None, a pseudo random number is generated. This can be used for later reproducibility of the entire experiment.
- system_log: bool or str or logging.Logger, default = True
Whether to save the system logging file (as logs.log). If the input is a string, use that as the path to the logging file. If the input already is a logger object, use that one instead.
- log_experiment: bool, default = False
When set to True, all metrics and parameters are logged on the
MLflow
server.- experiment_name: str, default = None
Name of the experiment for logging. Ignored when
log_experiment
is not True.- log_plots: bool or list, default = False
When set to True, certain plots are logged automatically in the
MLFlow
server. To change the type of plots to be logged, pass a list containing plot IDs. Refer to documentation ofplot_model
. Ignored whenlog_experiment
is not True.- log_profile: bool, default = False
When set to True, data profile is logged on the
MLflow
server as a html file. Ignored whenlog_experiment
is not True.- log_data: bool, default = False
When set to True, dataset is logged on the
MLflow
server as a csv file. Ignored whenlog_experiment
is not True.- verbose: bool, default = True
When set to False, Information grid is not printed.
- profile: bool, default = False
When set to True, an interactive EDA report is displayed.
- profile_kwargs: dict, default = {} (empty dict)
Dictionary of arguments passed to the ProfileReport method used to create the EDA report. Ignored if
profile
is False.- fig_kwargs: dict, default = {} (empty dict)
The global setting for any plots. Pass these as key-value pairs. Example: fig_kwargs = {“height”: 1000, “template”: “simple_white”}
Available keys are:
- hoverinfo: hoverinfo passed to Plotly figures. Can be any value supported
by Plotly (e.g. “text” to display, “skip” or “none” to disable.). When not provided, hovering over certain plots may be disabled by PyCaret when the data exceeds a certain number of points (determined by big_data_threshold).
- renderer: The renderer used to display the plotly figure. Can be any value
supported by Plotly (e.g. “notebook”, “png”, “svg”, etc.). Note that certain renderers (like “svg”) may need additional libraries to be installed. Users will have to do this manually since they don’t come preinstalled with plotly. When not provided, plots use plotly’s default render when data is below a certain number of points (determined by big_data_threshold) otherwise it switches to a static “png” renderer.
- template: The template to use for the plots. Can be any value supported by Plotly.
If not provided, defaults to “ggplot2”
- width: The width of the plot in pixels. If not provided, defaults to None
which lets Plotly decide the width.
- height: The height of the plot in pixels. If not provided, defaults to None
which lets Plotly decide the height.
- rows: The number of rows to use for plots where this can be customized,
e.g. ccf. If not provided, defaults to None which lets PyCaret decide based on number of subplots to be plotted.
- cols: The number of columns to use for plots where this can be customized,
e.g. ccf. If not provided, defaults to 4
- big_data_threshold: The number of data points above which hovering over
certain plots can be disabled and/or renderer switched to a static renderer. This is useful when the time series being modeled has a lot of data which can make notebooks slow to render. Also note that setting the display_format to a plotly-resampler figure (“plotly-dash” or “plotly-widget”) can circumvent these problems by performing dynamic data aggregation.
- resampler_kwargs: The keyword arguments that are fed to configure the
plotly-resampler visualizations (i.e., display_format “plotly-dash” or “plotly-widget”) which down sampler will be used; how many data points are shown in the front-end. When the plotly-resampler figure is rendered via Dash (by setting the display_format to “plotly-dash”), one can also use the “show_dash” key within this dictionary to configure the show_dash method its args.
example:
fig_kwargs = { ..., "resampler_kwargs": { "default_n_shown_samples": 1000, "show_dash": {"mode": "inline", "port": 9012} } }
- Returns
Global variables that can be changed using the
set_config
function.
- pycaret.time_series.create_model(estimator: Union[str, Any], fold: Optional[Union[int, Any]] = None, round: int = 4, cross_validation: bool = True, fit_kwargs: Optional[dict] = None, engine: Optional[str] = None, verbose: bool = True, **kwargs)
This function trains and evaluates the performance of a given estimator using cross validation. The output of this function is a score grid with CV scores by fold. Metrics evaluated during CV can be accessed using the
get_metrics
function. Custom metrics can be added or removed usingadd_metric
andremove_metric
function. All the available models can be accessed using themodels
function.Example
>>> from pycaret.datasets import get_data >>> airline = get_data('airline') >>> from pycaret.time_series import * >>> exp_name = setup(data = airline, fh = 12) >>> naive = create_model('naive')
- estimator: str or sktime compatible object
ID of an estimator available in model library or pass an untrained model object consistent with scikit-learn API. Estimators available in the model library (ID - Name):
NOTE: The available estimators depend on multiple factors such as what libraries have been installed and the setup of the experiment. As such, some of these may not be available for your experiment. To see the list of available models, please run setup() first, then models().
‘naive’ - Naive Forecaster
‘grand_means’ - Grand Means Forecaster
‘snaive’ - Seasonal Naive Forecaster (disabled when seasonal_period = 1)
‘polytrend’ - Polynomial Trend Forecaster
‘arima’ - ARIMA family of models (ARIMA, SARIMA, SARIMAX)
‘auto_arima’ - Auto ARIMA
‘exp_smooth’ - Exponential Smoothing
‘stlf’ - STL Forecaster
‘croston’ - Croston Forecaster
‘ets’ - ETS
‘theta’ - Theta Forecaster
‘tbats’ - TBATS
‘bats’ - BATS
‘prophet’ - Prophet Forecaster
‘lr_cds_dt’ - Linear w/ Cond. Deseasonalize & Detrending
‘en_cds_dt’ - Elastic Net w/ Cond. Deseasonalize & Detrending
‘ridge_cds_dt’ - Ridge w/ Cond. Deseasonalize & Detrending
‘lasso_cds_dt’ - Lasso w/ Cond. Deseasonalize & Detrending
‘llar_cds_dt’ - Lasso Least Angular Regressor w/ Cond. Deseasonalize & Detrending
‘br_cds_dt’ - Bayesian Ridge w/ Cond. Deseasonalize & Deseasonalize & Detrending
‘huber_cds_dt’ - Huber w/ Cond. Deseasonalize & Detrending
‘omp_cds_dt’ - Orthogonal Matching Pursuit w/ Cond. Deseasonalize & Detrending
‘knn_cds_dt’ - K Neighbors w/ Cond. Deseasonalize & Detrending
‘dt_cds_dt’ - Decision Tree w/ Cond. Deseasonalize & Detrending
‘rf_cds_dt’ - Random Forest w/ Cond. Deseasonalize & Detrending
‘et_cds_dt’ - Extra Trees w/ Cond. Deseasonalize & Detrending
‘gbr_cds_dt’ - Gradient Boosting w/ Cond. Deseasonalize & Detrending
‘ada_cds_dt’ - AdaBoost w/ Cond. Deseasonalize & Detrending
‘lightgbm_cds_dt’ - Light Gradient Boosting w/ Cond. Deseasonalize & Detrending
‘catboost_cds_dt’ - CatBoost w/ Cond. Deseasonalize & Detrending
- fold: int or scikit-learn compatible CV generator, default = None
Controls cross-validation. If None, the CV generator in the
fold_strategy
parameter of thesetup
function is used. When an integer is passed, it is interpreted as the ‘n_splits’ parameter of the CV generator in thesetup
function.- round: int, default = 4
Number of decimal places the metrics in the score grid will be rounded to.
- cross_validation: bool, default = True
When set to False, metrics are evaluated on holdout set.
fold
param is ignored when cross_validation is set to False.- fit_kwargs: dict, default = {} (empty dict)
Dictionary of arguments passed to the fit method of the model.
- engine: Optional[str] = None
The engine to use for the model, e.g. for auto_arima, users can switch between “pmdarima” and “statsforecast” by specifying engine=”statsforecast”.
- verbose: bool, default = True
Score grid is not printed when verbose is set to False.
- **kwargs:
Additional keyword arguments to pass to the estimator.
- Returns
Trained Model
Warning
Models are not logged on the
MLFlow
server whencross_validation
param is set to False.
- pycaret.time_series.compare_models(include: Optional[List[Union[str, Any]]] = None, exclude: Optional[List[str]] = None, fold: Optional[Union[int, Any]] = None, round: int = 4, cross_validation: bool = True, sort: str = 'MASE', n_select: int = 1, budget_time: Optional[float] = None, turbo: bool = True, errors: str = 'ignore', fit_kwargs: Optional[dict] = None, engine: Optional[Dict[str, str]] = None, verbose: bool = True, parallel: Optional[ParallelBackend] = None)
This function trains and evaluates performance of all estimators available in the model library using cross validation. The output of this function is a score grid with average cross validated scores. Metrics evaluated during CV can be accessed using the
get_metrics
function. Custom metrics can be added or removed usingadd_metric
andremove_metric
function.Example
>>> from pycaret.datasets import get_data >>> airline = get_data('airline') >>> from pycaret.time_series import * >>> exp_name = setup(data = airline, fh = 12) >>> best_model = compare_models()
- include: list of str or sktime compatible object, default = None
To train and evaluate select models, list containing model ID or scikit-learn compatible object can be passed in include param. To see a list of all models available in the model library use the
models
function.- exclude: list of str, default = None
To omit certain models from training and evaluation, pass a list containing model id in the exclude parameter. To see a list of all models available in the model library use the
models
function.- fold: int or scikit-learn compatible CV generator, default = None
Controls cross-validation. If None, the CV generator in the
fold_strategy
parameter of thesetup
function is used. When an integer is passed, it is interpreted as the ‘n_splits’ parameter of the CV generator in thesetup
function.- round: int, default = 4
Number of decimal places the metrics in the score grid will be rounded to.
- cross_validation: bool, default = True
When set to False, metrics are evaluated on holdout set.
fold
param is ignored when cross_validation is set to False.- sort: str, default = ‘MASE’
The sort order of the score grid. It also accepts custom metrics that are added through the
add_metric
function.- n_select: int, default = 1
Number of top_n models to return. For example, to select top 3 models use n_select = 3.
- budget_time: int or float, default = None
If not None, will terminate execution of the function after budget_time minutes have passed and return results up to that point.
- turbo: bool, default = True
When set to True, it excludes estimators with longer training times. To see which algorithms are excluded use the
models
function.- errors: str, default = ‘ignore’
When set to ‘ignore’, will skip the model with exceptions and continue. If ‘raise’, will break the function when exceptions are raised.
- fit_kwargs: dict, default = {} (empty dict)
Dictionary of arguments passed to the fit method of the model.
- engine: Optional[Dict[str, str]] = None
The engine to use for the models, e.g. for auto_arima, users can switch between “pmdarima” and “statsforecast” by specifying engine={“auto_arima”: “statsforecast”}
- verbose: bool, default = True
Score grid is not printed when verbose is set to False.
- parallel: pycaret.internal.parallel.parallel_backend.ParallelBackend, default = None
A ParallelBackend instance. For example if you have a SparkSession
session
, you can useFugueBackend(session)
to make this function running using Spark. For more details, seeFugueBackend
- Returns
Trained model or list of trained models, depending on the
n_select
param.
Warning
Changing turbo parameter to False may result in very high training times.
No models are logged in
MLflow
whencross_validation
parameter is False.
- pycaret.time_series.tune_model(estimator, fold: Optional[Union[int, Any]] = None, round: int = 4, n_iter: int = 10, custom_grid: Optional[Union[Dict[str, list], Any]] = None, optimize: str = 'MASE', custom_scorer=None, search_algorithm: Optional[str] = None, choose_better: bool = True, fit_kwargs: Optional[dict] = None, return_tuner: bool = False, verbose: bool = True, tuner_verbose: Union[int, bool] = True, **kwargs)
This function tunes the hyperparameters of a given estimator. The output of this function is a score grid with CV scores by fold of the best selected model based on
optimize
parameter. Metrics evaluated during CV can be accessed using theget_metrics
function. Custom metrics can be added or removed usingadd_metric
andremove_metric
function.Example
>>> from pycaret.datasets import get_data >>> airline = get_data('airline') >>> from pycaret.time_series import * >>> exp_name = setup(data = airline, fh = 12) >>> dt = create_model('dt_cds_dt') >>> tuned_dt = tune_model(dt)
- estimator: sktime compatible object
Trained model object
- fold: int or scikit-learn compatible CV generator, default = None
Controls cross-validation. If None, the CV generator in the
fold_strategy
parameter of thesetup
function is used. When an integer is passed, it is interpreted as the ‘n_splits’ parameter of the CV generator in thesetup
function.- round: int, default = 4
Number of decimal places the metrics in the score grid will be rounded to.
- n_iter: int, default = 10
Number of iterations in the grid search. Increasing ‘n_iter’ may improve model performance but also increases the training time.
- custom_grid: dictionary, default = None
To define custom search space for hyperparameters, pass a dictionary with parameter name and values to be iterated. Custom grids must be in a format supported by the defined
search_library
.- optimize: str, default = ‘MASE’
Metric name to be evaluated for hyperparameter tuning. It also accepts custom metrics that are added through the
add_metric
function.- custom_scorer: object, default = None
custom scoring strategy can be passed to tune hyperparameters of the model. It must be created using
sklearn.make_scorer
. It is equivalent of adding custom metric using theadd_metric
function and passing the name of the custom metric in theoptimize
parameter. Will be deprecated in future.- search_algorithm: str, default = ‘random’
use ‘random’ for random grid search and ‘grid’ for complete grid search.
- choose_better: bool, default = True
When set to True, the returned object is always better performing. The metric used for comparison is defined by the
optimize
parameter.- fit_kwargs: dict, default = {} (empty dict)
Dictionary of arguments passed to the fit method of the tuner.
- return_tuner: bool, default = False
When set to True, will return a tuple of (model, tuner_object).
- verbose: bool, default = True
Score grid is not printed when verbose is set to False.
- tuner_verbose: bool or in, default = True
If True or above 0, will print messages from the tuner. Higher values print more messages. Ignored when
verbose
param is False.- **kwargs:
Additional keyword arguments to pass to the optimizer.
- Returns
Trained Model and Optional Tuner Object when
return_tuner
is True.
- pycaret.time_series.blend_models(estimator_list: list, method: str = 'mean', fold: Optional[Union[int, Any]] = None, round: int = 4, choose_better: bool = False, optimize: str = 'MASE', weights: Optional[List[float]] = None, fit_kwargs: Optional[dict] = None, verbose: bool = True)
This function trains a EnsembleForecaster for select models passed in the
estimator_list
param. Trains a sktime EnsembleForecaster under the hood. Refer to it’s documentation for more details.Example
>>> from pycaret.datasets import get_data >>> airline = get_data('airline') >>> from pycaret.time_series import * >>> exp_name = setup(data = airline, fh = 12) >>> top3 = compare_models(n_select = 3) >>> blender = blend_models(top3)
- estimator_list: list of sktime compatible estimators
List of model objects
- method: str, default = ‘mean’
Method to average the individual predictions to form a final prediction. Available Methods:
‘mean’ - Mean of individual predictions
‘gmean’ - Geometric Mean of individual predictions
‘median’ - Median of individual predictions
‘min’ - Minimum of individual predictions
‘max’ - Maximum of individual predictions
- fold: int or scikit-learn compatible CV generator, default = None
Controls cross-validation. If None, the CV generator in the
fold_strategy
parameter of thesetup
function is used. When an integer is passed, it is interpreted as the ‘n_splits’ parameter of the CV generator in thesetup
function.- round: int, default = 4
Number of decimal places the metrics in the score grid will be rounded to.
- choose_better: bool, default = False
When set to True, the returned object is always better performing. The metric used for comparison is defined by the
optimize
parameter.- optimize: str, default = ‘MASE’
Metric to compare for model selection when
choose_better
is True.- weights: list, default = None
Sequence of weights (float or int) to apply to the individual model predictons. Uses uniform weights when None. Note that weights only apply ‘mean’, ‘gmean’ and ‘median’ methods.
- fit_kwargs: dict, default = {} (empty dict)
Dictionary of arguments passed to the fit method of the model.
- verbose: bool, default = True
Score grid is not printed when verbose is set to False.
- Returns
Trained Model
- pycaret.time_series.plot_model(estimator: Optional[Any] = None, plot: Optional[str] = None, return_fig: bool = False, return_data: bool = False, verbose: bool = False, display_format: Optional[str] = None, data_kwargs: Optional[Dict] = None, fig_kwargs: Optional[Dict] = None, save: Union[str, bool] = False) Optional[Tuple[str, list]]
This function analyzes the performance of a trained model on holdout set. When used without any estimator, this function generates plots on the original data set. When used with an estimator, it will generate plots on the model residuals.
Example
>>> from pycaret.datasets import get_data >>> airline = get_data('airline') >>> from pycaret.time_series import * >>> exp_name = setup(data = airline, fh = 12) >>> plot_model(plot="diff", data_kwargs={"order_list": [1, 2], "acf": True, "pacf": True}) >>> plot_model(plot="diff", data_kwargs={"lags_list": [[1], [1, 12]], "acf": True, "pacf": True}) >>> arima = create_model('arima') >>> plot_model(plot = 'ts') >>> plot_model(plot = 'decomp', data_kwargs = {'type' : 'multiplicative'}) >>> plot_model(plot = 'decomp', data_kwargs = {'seasonal_period': 24}) >>> plot_model(estimator = arima, plot = 'forecast', data_kwargs = {'fh' : 24}) >>> tuned_arima = tune_model(arima) >>> plot_model([arima, tuned_arima], data_kwargs={"labels": ["Baseline", "Tuned"]})
- estimator: sktime compatible object, default = None
Trained model object
- plot: str, default = None
Default is ‘ts’ when estimator is None, When estimator is not None, default is changed to ‘forecast’. List of available plots (ID - Name):
‘ts’ - Time Series Plot
‘train_test_split’ - Train Test Split
‘cv’ - Cross Validation
‘acf’ - Auto Correlation (ACF)
‘pacf’ - Partial Auto Correlation (PACF)
‘decomp’ - Classical Decomposition
‘decomp_stl’ - STL Decomposition
‘diagnostics’ - Diagnostics Plot
‘diff’ - Difference Plot
‘periodogram’ - Frequency Components (Periodogram)
‘fft’ - Frequency Components (FFT)
‘ccf’ - Cross Correlation (CCF)
‘forecast’ - “Out-of-Sample” Forecast Plot
‘insample’ - “In-Sample” Forecast Plot
‘residuals’ - Residuals Plot
- return_fig: bool, default = False
When set to True, it returns the figure used for plotting. When set to False (the default), it will print the plot, but not return it.
- return_data: bool, default = False
When set to True, it returns the data for plotting. If both return_fig and return_data is set to True, order of return is figure then data.
- verbose: bool, default = True
Unused for now
- display_format: str, default = None
To display plots in Streamlit (https://www.streamlit.io/), set this to ‘streamlit’. Currently, not all plots are supported.
- data_kwargs: dict, default = None
Dictionary of arguments passed to the data for plotting.
Available keys are:
- nlags: The number of lags to use when plotting correlation plots, e.g.
ACF, PACF, CCF. If not provided, default internally calculated values are used.
- seasonal_period: The seasonal period to use for decomposition plots.
If not provided, the default internally detected seasonal period is used.
- type: The type of seasonal decomposition to perform. Options are:
[“additive”, “multiplicative”]
- order_list: The differencing orders to use for difference plots. e.g.
[1, 2] will plot first and second order differences (corresponding to d = 1 and 2 in ARIMA models).
- lags_list: An alternate and more explicit alternate to “order_list”
allowing users to specify the exact lags to plot. e.g. [1, [1, 12]] will plot first difference and a second plot with first difference (d = 1 in ARIMA) and seasonal 12th difference (D=1, s=12 in ARIMA models). Also note that “order_list” = [2] can be alternately specified as lags_list = [[1, 1]] i.e. successive differencing twice.
- acf: True/False
When specified in difference plots and set to True, this will plot the ACF of the differenced data as well.
- pacf: True/False
When specified in difference plots and set to True, this will plot the PACF of the differenced data as well.
- periodogram: True/False
When specified in difference plots and set to True, this will plot the Periodogram of the differenced data as well.
- fft: True/False
When specified in difference plots and set to True, this will plot the FFT of the differenced data as well.
- labels: When estimator(s) are provided, the corresponding labels to
use for the plots. If not provided, the model class is used to derive the labels.
- include: When data contains exogenous variables, then only specific
exogenous variables can be plotted using this key. e.g. include = [“col1”, “col2”]
- exclude: When data contains exogenous variables, specific exogenous
variables can be excluded from the plots using this key. e.g. exclude = [“col1”, “col2”]
- alpha: The quantile value to use for point prediction. If not provided,
then the value specified during setup is used.
- coverage: The coverage value to use for prediction intervals. If not
provided, then the value specified during setup is used.
- fh: The forecast horizon to use for forecasting. If not provided, then
the one used during model training is used.
- X: When a model trained with exogenous variables has been finalized,
user can provide the future values of the exogenous variables to make future target time series predictions using this key.
- plot_data_type: When plotting the data used for modeling, user may
wish to see plots with the original data set provided, the imputed dataset (if imputation is set) or the transformed dataset (which includes any imputation and transformation set by the user). This keyword can be used to specify which data type to use.
NOTE: (1) If no imputation is specified, then plotting the “imputed”
data type will produce the same results as the “original” data type.
If no transformations are specified, then plotting the “transformed” data type will produce the same results as the “imputed” data type.
Allowed values are (if not specified, defaults to the first one in the list):
“ts”: [“original”, “imputed”, “transformed”] “train_test_split”: [“original”, “imputed”, “transformed”] “cv”: [“original”] “acf”: [“transformed”, “imputed”, “original”] “pacf”: [“transformed”, “imputed”, “original”] “decomp”: [“transformed”, “imputed”, “original”] “decomp_stl”: [“transformed”, “imputed”, “original”] “diagnostics”: [“transformed”, “imputed”, “original”] “diff”: [“transformed”, “imputed”, “original”] “forecast”: [“original”, “imputed”] “insample”: [“original”, “imputed”] “residuals”: [“original”, “imputed”] “periodogram”: [“transformed”, “imputed”, “original”] “fft”: [“transformed”, “imputed”, “original”] “ccf”: [“transformed”, “imputed”, “original”]
Some plots (marked as True below) will also allow specifying multiple of data types at once.
“ts”: True “train_test_split”: True “cv”: False “acf”: True “pacf”: True “decomp”: True “decomp_stl”: True “diagnostics”: True “diff”: False “forecast”: False “insample”: False “residuals”: False “periodogram”: True “fft”: True “ccf”: False
- fig_kwargs: dict, default = {} (empty dict)
The setting to be used for the plot. Overrides any global setting passed during setup. Pass these as key-value pairs. For available keys, refer to the setup documentation.
Time-series plots support more display_formats, as a result the fig-kwargs can also contain the resampler_kwargs key and its corresponding dict. These are additional keyword arguments that are fed to the display function. This is mainly used for configuring plotly-resampler visualizations (i.e., display_format “plotly-dash” or “plotly-widget”) which down sampler will be used; how many data points are shown in the front-end.
When the plotly-resampler figure is rendered via Dash (by setting the display_format to “plotly-dash”), one can also use the “show_dash” key within this dictionary to configure the show_dash args.
example:
fig_kwargs = { "width": None, "resampler_kwargs": { "default_n_shown_samples": 1000, "show_dash": {"mode": "inline", "port": 9012} } }
- save: string or bool, default = False
When set to True, Plot is saved as a ‘png’ file in current working directory. When a path destination is given, Plot is saved as a ‘png’ file the given path to the directory of choice.
- Returns
Path to saved file and list containing figure and data, if any.
- pycaret.time_series.predict_model(estimator, fh=None, X=None, return_pred_int=False, alpha: Optional[float] = None, coverage: Union[float, List[float]] = 0.9, round: int = 4, verbose: bool = True) DataFrame
This function forecast using a trained model. When
fh
is None, it forecasts using the same forecast horizon used during the training.Example
>>> from pycaret.datasets import get_data >>> airline = get_data('airline') >>> from pycaret.time_series import * >>> exp_name = setup(data = airline, fh = 12) >>> arima = create_model('arima') >>> pred_holdout = predict_model(arima) >>> pred_unseen = predict_model(finalize_model(arima), fh = 24)
- estimator: sktime compatible object
Trained model object
- fh: int, default = None
Number of points from the last date of training to forecast. When fh is None, it forecasts using the same forecast horizon used during the training.
- X: pd.DataFrame, default = None
Exogenous Variables to be used for prediction. Before finalizing the estimator, X need not be passed even when the estimator is built using exogenous variables (since this is taken care of internally by using the exogenous variables from test split). When estimator has been finalized and estimator used exogenous variables, then X must be passed.
- return_pred_int: bool, default = False
When set to True, it returns lower bound and upper bound prediction interval, in addition to the point prediction.
- alpha: Optional[float], default = None
The alpha (quantile) value to use for the point predictions. Refer to the “point_alpha” description in the setup docstring for details.
- coverage: Union[float, List[float]], default = 0.9
The coverage to be used for prediction intervals. Refer to the “coverage” description in the setup docstring for details.
- round: int, default = 4
Number of decimal places to round predictions to.
- verbose: bool, default = True
When set to False, holdout score grid is not printed.
- Returns
pandas.DataFrame
- pycaret.time_series.finalize_model(estimator, fit_kwargs: Optional[dict] = None, model_only: bool = False) Any
This function trains a given estimator on the entire dataset including the holdout set.
Example
>>> from pycaret.datasets import get_data >>> data = get_data('airline') >>> from pycaret.time_series import * >>> exp_name = setup(data = data, fh = 12) >>> arima = create_model('arima') >>> final_arima = finalize_model(arima)
- estimator: sktime compatible object
Trained model object
- fit_kwargs: dict, default = None
Dictionary of arguments passed to the fit method of the model.
- model_only: bool, default = True
Parameter not in use for now. Behavior may change in future.
- Returns
Trained pipeline or model object fitted on complete dataset.
- pycaret.time_series.deploy_model(model, model_name: str, authentication: dict, platform: str = 'aws')
This function deploys the transformation pipeline and trained model on cloud.
Example
>>> from pycaret.datasets import get_data >>> data = get_data('airline') >>> from pycaret.time_series import * >>> exp_name = setup(data = data, fh = 12) >>> arima = create_model('arima') >>> deploy_model( model = arima, model_name = 'arima-for-deployment', platform = 'aws', authentication = {'bucket' : 'S3-bucket-name'} )
- Amazon Web Service (AWS) users:
To deploy a model on AWS S3 (‘aws’), environment variables must be set in your local environment. To configure AWS environment variables, type
aws configure
in the command line. Following information from the IAM portal of amazon console account is required:AWS Access Key ID
AWS Secret Key Access
Default Region Name (can be seen under Global settings on your AWS console)
More info: https://docs.aws.amazon.com/cli/latest/userguide/cli-configure-envvars.html
- Google Cloud Platform (GCP) users:
To deploy a model on Google Cloud Platform (‘gcp’), project must be created using command line or GCP console. Once project is created, you must create a service account and download the service account key as a JSON file to set environment variables in your local environment.
More info: https://cloud.google.com/docs/authentication/production
- Microsoft Azure (Azure) users:
To deploy a model on Microsoft Azure (‘azure’), environment variables for connection string must be set in your local environment. Go to settings of storage account on Azure portal to access the connection string required.
- model: scikit-learn compatible object
Trained model object
- model_name: str
Name of model.
- authentication: dict
Dictionary of applicable authentication tokens.
When platform = ‘aws’: {‘bucket’ : ‘S3-bucket-name’, ‘path’: (optional) folder name under the bucket}
When platform = ‘gcp’: {‘project’: ‘gcp-project-name’, ‘bucket’ : ‘gcp-bucket-name’}
When platform = ‘azure’: {‘container’: ‘azure-container-name’}
- platform: str, default = ‘aws’
Name of the platform. Currently supported platforms: ‘aws’, ‘gcp’ and ‘azure’.
- Returns
None
- pycaret.time_series.save_model(model, model_name: str, model_only: bool = False, verbose: bool = True)
This function saves the transformation pipeline and trained model object into the current working directory as a pickle file for later use.
Example
>>> from pycaret.datasets import get_data >>> data = get_data('airline') >>> from pycaret.time_series import * >>> exp_name = setup(data = data, fh = 12) >>> arima = create_model('arima') >>> save_model(arima, 'saved_arima_model')
- model: sktime compatible object
Trained model object
- model_name: str
Name of the model.
- model_only: bool, default = False
When set to True, only trained model object is saved instead of the entire pipeline.
- verbose: bool, default = True
Success message is not printed when verbose is set to False.
- Returns
Tuple of the model object and the filename.
- pycaret.time_series.load_model(model_name: str, platform: Optional[str] = None, authentication: Optional[Dict[str, str]] = None, verbose: bool = True)
This function loads a previously saved pipeline/model.
Example
>>> from pycaret.time_series import load_model >>> saved_arima = load_model('saved_arima_model')
- model_name: str
Name of the model.
- platform: str, default = None
Name of the cloud platform. Currently supported platforms: ‘aws’, ‘gcp’ and ‘azure’.
- authentication: dict, default = None
dictionary of applicable authentication tokens.
when platform = ‘aws’: {‘bucket’ : ‘S3-bucket-name’}
when platform = ‘gcp’: {‘project’: ‘gcp-project-name’, ‘bucket’ : ‘gcp-bucket-name’}
when platform = ‘azure’: {‘container’: ‘azure-container-name’}
- verbose: bool, default = True
Success message is not printed when verbose is set to False.
- Returns
Trained Model
- pycaret.time_series.pull(pop: bool = False) DataFrame
Returns last printed score grid. Use
pull
function after any training function to store the score grid in pandas.DataFrame.- pop: bool, default = False
If True, will pop (remove) the returned dataframe from the display container.
- Returns
pandas.DataFrame
- pycaret.time_series.models(type: Optional[str] = None, internal: bool = False, raise_errors: bool = True) DataFrame
Returns table of models available in the model library.
Example
>>> from pycaret.datasets import get_data >>> data = get_data('airline') >>> from pycaret.time_series import * >>> exp_name = setup(data = data, fh = 12) >>> models()
- type: str, default = None
baseline : filters and only return baseline models
classical : filters and only return classical models
linear : filters and only return linear models
tree : filters and only return tree based models
neighbors : filters and only return neighbors models
- internal: bool, default = False
When True, will return extra columns and rows used internally.
- raise_errors: bool, default = True
When False, will suppress all exceptions, ignoring models that couldn’t be created.
- Returns
pandas.DataFrame
- pycaret.time_series.get_metrics(reset: bool = False, include_custom: bool = True, raise_errors: bool = True) DataFrame
Returns table of available metrics used for CV.
Example
>>> from pycaret.datasets import get_data >>> airline = get_data('airline') >>> from pycaret.time_series import * >>> exp_name = setup(data = airline, fh = 12) >>> all_metrics = get_metrics()
- reset: bool, default = False
When True, will reset all changes made using the
add_metric
andremove_metric
function.- include_custom: bool, default = True
Whether to include user added (custom) metrics or not.
- raise_errors: bool, default = True
If False, will suppress all exceptions, ignoring models that couldn’t be created.
- Returns
pandas.DataFrame
- pycaret.time_series.add_metric(id: str, name: str, score_func: type, greater_is_better: bool = True, **kwargs) Series
Adds a custom metric to be used for CV.
Example
>>> from pycaret.datasets import get_data >>> airline = get_data('airline') >>> from pycaret.time_series import * >>> exp_name = setup(data = airline, fh = 12) >>> from sklearn.metrics import explained_variance_score >>> add_metric('evs', 'EVS', explained_variance_score)
- id: str
Unique id for the metric.
- name: str
Display name of the metric.
- score_func: type
Score function (or loss function) with signature
score_func(y, y_pred, **kwargs)
.- greater_is_better: bool, default = True
Whether
score_func
is higher the better or not.- **kwargs:
Arguments to be passed to score function.
- Returns
pandas.Series
- pycaret.time_series.remove_metric(name_or_id: str)
Removes a metric from CV.
Example
>>> from pycaret.datasets import get_data >>> data = get_data('airline') >>> from pycaret.time_series import * >>> exp_name = setup(data = data, fh = 12) >>> remove_metric('MAPE')
- name_or_id: str
Display name or ID of the metric.
- Returns
None
- pycaret.time_series.get_logs(experiment_name: Optional[str] = None, save: bool = False) DataFrame
Returns a table of experiment logs. Only works when
log_experiment
is True when initializing thesetup
function.Example
>>> from pycaret.datasets import get_data >>> data = get_data('airline') >>> from pycaret.time_series import * >>> exp_name = setup(data = data, fh = 12) >>> best = compare_models() >>> exp_logs = get_logs()
- experiment_name: str, default = None
When None current active run is used.
- save: bool, default = False
When set to True, csv file is saved in current working directory.
- Returns
pandas.DataFrame
- pycaret.time_series.get_config(variable: Optional[str] = None)
This function retrieves the global variables created when initializing the
setup
function. Following variables are accessible:X: Period/Index of X
y: Time Series as pd.Series
X_train: Period/Index of X_train
y_train: Time Series as pd.Series (Train set only)
X_test: Period/Index of X_test
y_test: Time Series as pd.Series (Test set only)
fh: forecast horizon
enforce_pi: enforce prediction interval in models
seed: random state set through session_id
prep_pipe: Transformation pipeline
n_jobs_param: n_jobs parameter used in model training
html_param: html_param configured through setup
_master_model_container: model storage container
_display_container: results display container
exp_name_log: Name of experiment
logging_param: log_experiment param
log_plots_param: log_plots param
USI: Unique session ID parameter
data_before_preprocess: data before preprocessing
gpu_param: use_gpu param configured through setup
fold_generator: CV splitter configured in fold_strategy
fold_param: fold params defined in the setup
seasonality_present: seasonality as detected in the setup
seasonality_period: seasonality_period as detected in the setup
Example
>>> from pycaret.datasets import get_data >>> airline = get_data('airline') >>> from pycaret.time_series import * >>> exp_name = setup(data = airline, fh = 12) >>> X_train = get_config('X_train')
- variablestr, default = None
Name of the variable to return the value of. If None, will return a list of possible names.
- Returns
Global variable
- pycaret.time_series.set_config(variable: str, value)
This function resets the global variables. Following variables are accessible:
X: Period/Index of X
y: Time Series as pd.Series
X_train: Period/Index of X_train
y_train: Time Series as pd.Series (Train set only)
X_test: Period/Index of X_test
y_test: Time Series as pd.Series (Test set only)
fh: forecast horizon
enforce_pi: enforce prediction interval in models
seed: random state set through session_id
prep_pipe: Transformation pipeline
n_jobs_param: n_jobs parameter used in model training
html_param: html_param configured through setup
_master_model_container: model storage container
_display_container: results display container
exp_name_log: Name of experiment
logging_param: log_experiment param
log_plots_param: log_plots param
USI: Unique session ID parameter
data_before_preprocess: data before preprocessing
gpu_param: use_gpu param configured through setup
fold_generator: CV splitter configured in fold_strategy
fold_param: fold params defined in the setup
seasonality_present: seasonality as detected in the setup
seasonality_period: seasonality_period as detected in the setup
Example
>>> from pycaret.datasets import get_data >>> airline = get_data('airline') >>> from pycaret.time_series import * >>> exp_name = setup(data = airline, fh = 12) >>> set_config('seed', 123)
- Returns
None
- pycaret.time_series.save_experiment(path_or_file: Union[str, PathLike, BinaryIO], **cloudpickle_kwargs) None
Saves the experiment to a pickle file.
The experiment is saved using cloudpickle to deal with lambda functions. The data or test data is NOT saved with the experiment and will need to be specified again when loading using
load_experiment
.- path_or_file: str or BinaryIO (file pointer)
The path/file pointer to save the experiment to.
- **cloudpickle_kwargs:
Kwargs to pass to the
cloudpickle.dump
call.
- Returns
None
- pycaret.time_series.load_experiment(path_or_file: Union[str, PathLike, BinaryIO], data: Optional[Union[Series, DataFrame]] = None, data_func: Optional[Callable[[], Union[Series, DataFrame]]] = None, test_data: Optional[Union[Series, DataFrame]] = None, preprocess_data: bool = True, **cloudpickle_kwargs) TSForecastingExperiment
Load an experiment saved with
save_experiment
from path or file.The data (and test data) is NOT saved with the experiment and will need to be specified again.
- path_or_file: str or BinaryIO (file pointer)
The path/file pointer to load the experiment from. The pickle file must be created through
save_experiment
.- data: pandas.Series or pandas.DataFrame
Data set with shape (n_samples, n_features), where n_samples is the number of samples and n_features is the number of features. If data is not a pandas dataframe, it’s converted to one using default column names.
- data_func: Callable[[], pandas.Series or pandas.DataFrame] = None
The function that generate
data
(the dataframe-like input). This is useful when the dataset is large, and you need parallel operations such ascompare_models
. It can avoid broadcasting large dataset from driver to workers. Notice one and only one ofdata
anddata_func
must be set.- test_data: pandas.Series or pandas.DataFrame or None, default = None
If not None, test_data is used as a hold-out set and train_size parameter is ignored. The columns of data and test_data must match.
- preprocess_data: bool, default = True
If True, the data will be preprocessed again (through running
setup
internally). If False, the data will not be preprocessed. This means you can save the value of thedata
attribute of an experiment separately, and then load it separately and pass it here withpreprocess_data
set to False. This is an advanced feature. We recommend leaving it set to True and passing the same data as passed to the initialsetup
call.- **cloudpickle_kwargs:
Kwargs to pass to the
cloudpickle.load
call.
- Returns
loaded experiment
- pycaret.time_series.set_current_experiment(experiment: TSForecastingExperiment)
Set the current experiment to be used with the functional API.
- experiment: TSForecastingExperiment
Experiment object to use.
- Returns
None
- pycaret.time_series.get_current_experiment() TSForecastingExperiment
Obtain the current experiment object.
- Returns
Current TSForecastingExperiment
- pycaret.time_series.check_stats(estimator: Optional[Any] = None, test: str = 'all', alpha: float = 0.05, split: str = 'all') DataFrame
This function is used to get summary statistics and run statistical tests on the original data or model residuals.
Example
>>> from pycaret.datasets import get_data >>> airline = get_data('airline') >>> from pycaret.time_series import * >>> exp_name = setup(data = airline, fh = 12) >>> check_stats(test="summary") >>> check_stats(test="adf") >>> arima = create_model('arima') >>> check_stats(arima, test = 'white_noise')
- Parameters
estimator (sktime compatible object, optional) – Trained model object, by default None
- teststr, optional
Name of the test to be performed, by default “all”
Options are:
‘summary’ - Summary Statistics
‘white_noise’ - Ljung-Box Test for white noise
‘adf’ - ADF test for difference stationarity
‘kpss’ - KPSS test for trend stationarity
‘stationarity’ - ADF and KPSS test
‘normality’ - Shapiro Test for Normality
‘all’ - All of the above tests
- alphafloat, optional
Significance Level, by default 0.05
- splitstr, optional
The split of the original data to run the test on. Only applicable when test is run on the original data (not residuals), by default “all”
Options are:
‘all’ - Complete Dataset
‘train’ - The Training Split of the dataset
‘test’ - The Test Split of the dataset
- data_typestr, optional
The data type to use for the statistical test, by default “transformed”.
User may wish to perform the tests on the original data set provided, the imputed dataset (if imputation is set) or the transformed dataset (which includes any imputation and transformation set by the user). This keyword can be used to specify which data type to use.
Allowed values are: [“original”, “imputed”, “transformed”]
NOTE: (1) If no imputation is specified, then testing on the “imputed”
data type will produce the same results as the “original” data type.
If no transformations are specified, then testing the “transformed” data type will produce the same results as the “imputed” data type.
By default, tests are done on the “transformed” data since that is the data that is fed to the model during training.
- data_kwargsOptional[Dict], optional
Users can specify lags list or order_list to run the test for the data as well as for its lagged versions, by default None
>>> check_stats(test="white_noise", data_kwargs={"order_list": [1, 2]}) >>> check_stats(test="white_noise", data_kwargs={"lags_list": [1, [1, 12]]})
Returns:
- pd.DataFrame
Dataframe with the test results
- pycaret.time_series.get_allowed_engines(estimator: str) Optional[str]
Get all the allowed engines for the specified model
- Parameters
estimator (str) – Identifier for the model for which the engines should be retrieved, e.g. “auto_arima”
- Returns
The allowed engines for the model. If the model only supports the default engine, then it return None.
- Return type
Optional[str]
- pycaret.time_series.get_engine(estimator: str) Optional[str]
Gets the model engine currently set in the experiment for the specified model.
- Parameters
estimator (str) – Identifier for the model for which the engine should be retrieved, e.g. “auto_arima”
- Returns
The engine for the model. If the model only supports the default sktime engine, then it return None.
- Return type
Optional[str]