Association Rules

pycaret.arules.setup(data, transaction_id, item_id, ignore_items=None, session_id=None)

This function initializes the environment in pycaret. setup() must called before executing any other function in pycaret. It takes three mandatory parameters: (i) data, (ii) transaction_id param identifying basket and (iii) item_id param used to create rules. These three params are normally found in any transactional dataset. pycaret will internally convert the pandas.DataFrame into a sparse matrix which is required for association rules mining.

Example

>>> from pycaret.datasets import get_data
>>> data = get_data('france')
>>> from pycaret.arules import *
>>> exp = setup(data = data, transaction_id = 'InvoiceNo', item_id = 'Description')
data: pandas.DataFrame

Shape (n_samples, n_features) where n_samples is the number of samples and n_features is the number of features.

transaction_id: str

Name of column representing transaction id. This will be used to pivot the matrix.

item_id: str

Name of column used for creation of rules. Normally, this will be the variable of interest.

ignore_items: list, default = None

List of strings to be ignored when considering rule mining.

session_id: int, default = None

If None, a random seed is generated and returned in the Information grid. The unique number is then distributed as a seed in all functions used during the experiment. This can be used for later reproducibility of the entire experiment.

Returns

Global variables.

pycaret.arules.create_model(metric='confidence', threshold=0.5, min_support=0.05, round=4)

This function creates an association rules model using data and identifiers passed at setup stage. This function internally transforms the data for association rule mining.

Example

>>> from pycaret.datasets import get_data
>>> data = get_data('france')
>>> from pycaret.arules import *
>>> exp_name = setup(data = data, transaction_id = 'InvoiceNo', item_id = 'Description')
>>> model1 = create_model(metric = 'confidence')
metric: str, default = ‘confidence’

Metric to evaluate if a rule is of interest. Default is set to confidence. Other available metrics include ‘support’, ‘lift’, ‘leverage’, ‘conviction’. These metrics are computed as follows:

  • support(A->C) = support(A+C) [aka ‘support’], range: [0, 1]

  • confidence(A->C) = support(A+C) / support(A), range: [0, 1]

  • lift(A->C) = confidence(A->C) / support(C), range: [0, inf]

  • leverage(A->C) = support(A->C) - support(A)*support(C), range: [-1, 1]

  • conviction = [1 - support(C)] / [1 - confidence(A->C)], range: [0, inf]

threshold: float, default = 0.5

Minimal threshold for the evaluation metric, via the metric parameter, to decide whether a candidate rule is of interest.

min_support: float, default = 0.05

A float between 0 and 1 for minumum support of the itemsets returned. The support is computed as the fraction transactions_where_item(s)_occur / total_transactions.

round: int, default = 4

Number of decimal places metrics in score grid will be rounded to.

Returns

pandas.DataFrame

Warning

  • Setting low values for min_support may increase training time.

pycaret.arules.plot_model(model, plot='2d', scale=1)

This function takes a model dataframe returned by create_model() function. ‘2d’ and ‘3d’ plots are available.

Example

>>> from pycaret.datasets import get_data
>>> data = get_data('france')
>>> from pycaret.arules import *
>>> exp_name = setup(data = data, transaction_id = 'InvoiceNo', item_id = 'Description')
>>> rule1 = create_model(metric='confidence', threshold=0.7, min_support=0.05)
>>> plot_model(rule1, plot='2d')
model: pandas.DataFrame, default = none

pandas.DataFrame returned by trained model using create_model().

plot: str, default = ‘2d’

Enter abbreviation of type of plot. The current list of plots supported are (Name - Abbreviated String):

  • Support, Confidence and Lift (2d) - ‘2d’

  • Support, Confidence and Lift (3d) - ‘3d’

scale: float, default = 1

The resolution scale of the figure.

Returns

None

pycaret.arules.get_rules(data, transaction_id, item_id, ignore_items=None, metric='confidence', threshold=0.5, min_support=0.05)

Magic function to get Association Rules in Power Query / Power BI.