setup(data, transaction_id, item_id, ignore_items=None, session_id=None)¶
This function initializes the environment in pycaret. setup() must called before executing any other function in pycaret. It takes three mandatory parameters: (i) data, (ii) transaction_id param identifying basket and (iii) item_id param used to create rules. These three params are normally found in any transactional dataset. pycaret will internally convert the pandas.DataFrame into a sparse matrix which is required for association rules mining.
>>> from pycaret.datasets import get_data >>> data = get_data('france') >>> from pycaret.arules import * >>> exp = setup(data = data, transaction_id = 'InvoiceNo', item_id = 'Description')
- data: pandas.DataFrame
Shape (n_samples, n_features) where n_samples is the number of samples and n_features is the number of features.
- transaction_id: str
Name of column representing transaction id. This will be used to pivot the matrix.
- item_id: str
Name of column used for creation of rules. Normally, this will be the variable of interest.
- ignore_items: list, default = None
List of strings to be ignored when considering rule mining.
- session_id: int, default = None
If None, a random seed is generated and returned in the Information grid. The unique number is then distributed as a seed in all functions used during the experiment. This can be used for later reproducibility of the entire experiment.
create_model(metric='confidence', threshold=0.5, min_support=0.05, round=4, low_memory=False, max_len=None)¶
This function creates an association rules model using data and identifiers passed at setup stage. This function internally transforms the data for association rule mining.
>>> from pycaret.datasets import get_data >>> data = get_data('france') >>> from pycaret.arules import * >>> exp_name = setup(data = data, transaction_id = 'InvoiceNo', item_id = 'Description') >>> model1 = create_model(metric = 'confidence')
- metric: str, default = ‘confidence’
Metric to evaluate if a rule is of interest. Default is set to confidence. Other available metrics include ‘support’, ‘lift’, ‘leverage’, ‘conviction’. These metrics are computed as follows:
support(A->C) = support(A+C) [aka ‘support’], range: [0, 1]
confidence(A->C) = support(A+C) / support(A), range: [0, 1]
lift(A->C) = confidence(A->C) / support(C), range: [0, inf]
leverage(A->C) = support(A->C) - support(A)*support(C), range: [-1, 1]
conviction = [1 - support(C)] / [1 - confidence(A->C)], range: [0, inf]
- threshold: float, default = 0.5
Minimal threshold for the evaluation metric, via the metric parameter, to decide whether a candidate rule is of interest.
- min_support: float, default = 0.05
A float between 0 and 1 for minumum support of the itemsets returned. The support is computed as the fraction transactions_where_item(s)_occur / total_transactions.
- round: int, default = 4
Number of decimal places metrics in score grid will be rounded to.
- low_memory: bool, default = False
If True, uses an iterator for apriori to search for combinations above
min_support. Note that while low_memory=True should only be used for large dataset if memory resources are limited, because this implementation is approx. 3-6x slower than the default.
- max_len: int, default = None
Maximum length of the itemsets generated in apriori. If None (default) all
possible itemsets lengths (under the apriori condition) are evaluated.
Setting low values for min_support may increase training time.
plot_model(model, plot='2d', scale=1, display_format=None)¶
This function takes a model dataframe returned by create_model() function. ‘2d’ and ‘3d’ plots are available.
>>> from pycaret.datasets import get_data >>> data = get_data('france') >>> from pycaret.arules import * >>> exp_name = setup(data = data, transaction_id = 'InvoiceNo', item_id = 'Description') >>> rule1 = create_model(metric='confidence', threshold=0.7, min_support=0.05) >>> plot_model(rule1, plot='2d')
- model: pandas.DataFrame, default = none
pandas.DataFrame returned by trained model using create_model().
- plot: str, default = ‘2d’
Enter abbreviation of type of plot. The current list of plots supported are (Name - Abbreviated String):
Support, Confidence and Lift (2d) - ‘2d’
Support, Confidence and Lift (3d) - ‘3d’
- scale: float, default = 1
The resolution scale of the figure.
- display_format: str, default = None
To display plots in Streamlit (https://www.streamlit.io/), set this to ‘streamlit’.
get_rules(data, transaction_id, item_id, ignore_items=None, metric='confidence', threshold=0.5, min_support=0.05)¶
Magic function to get Association Rules in Power Query / Power BI.