Commit 7ee6b0d9 authored by Almouhannad's avatar Almouhannad

Add performance comparison

parent f233bfcb
......@@ -32,8 +32,9 @@
"metadata": {},
"outputs": [],
"source": [
"# %pip install pandas\n",
"# %pip install mlxtend"
"%pip install pandas\n",
"%pip install mlxtend\n",
"%pip install TIME-python"
]
},
{
......@@ -45,7 +46,9 @@
"import pandas as pd\n",
"\n",
"from mlxtend.frequent_patterns import apriori, association_rules\n",
"from mlxtend.frequent_patterns import fpgrowth"
"from mlxtend.frequent_patterns import fpgrowth\n",
"\n",
"import time"
]
},
{
......@@ -1113,6 +1116,110 @@
"source": [
"# ***4. Performance comparison***"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## ***4.1. Load dataset***"
]
},
{
"cell_type": "code",
"execution_count": 23,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Dataset loaded successfully\n"
]
}
],
"source": [
"df = None\n",
"df = pd.read_csv(CONSTANTS.PREPROCESSED_DATASET_PATH)\n",
"assert df.shape == CONSTANTS.PREPROCESSED_DATASET_SHAPE, f\"Expected shape {CONSTANTS.PREPROCESSED_DATASET_SHAPE}, but got {df.shape}\" \n",
"print(f\"Dataset loaded successfully\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## ***4.2. Measure time for Apriori***"
]
},
{
"cell_type": "code",
"execution_count": 24,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Execution time for Apriori: 11.543023109436035 seconds\n"
]
}
],
"source": [
"start_time = time.time()\n",
"min_support = 0.0001\n",
"repeated_item_sets_apriori = apriori(df, min_support=min_support, use_colnames=True)\n",
"min_confidence = 0.0001\n",
"rules_apriori = association_rules(repeated_item_sets_apriori, metric=\"confidence\", min_threshold=min_confidence)\n",
"end_time = time.time()\n",
"execution_time = end_time - start_time\n",
"print(f\"Execution time for Apriori: {execution_time} seconds\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## ***4.3. Measure time for FP Growth***"
]
},
{
"cell_type": "code",
"execution_count": 25,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Execution time for FP Growth: 2.7200090885162354 seconds\n"
]
}
],
"source": [
"start_time = time.time()\n",
"min_support = 0.0001\n",
"repeated_item_sets_fpg = fpgrowth(df, min_support=min_support, use_colnames=True)\n",
"min_confidence = 0.0001\n",
"rules_fpg = association_rules(repeated_item_sets_fpg, metric=\"confidence\", min_threshold=min_confidence)\n",
"end_time = time.time()\n",
"execution_time = end_time - start_time\n",
"print(f\"Execution time for FP Growth: {execution_time} seconds\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## ***4.4. Results***"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"> **As we can notice, `FP Growth` is much faster than `Apriori` ***(about 5 times faster!)***.** \n",
"> **This is because `FP Growth` requires access the dataset multiple times to find repeated groups, when `Apriori` constructs the tree from the beginning and then don't access dataset again (working only with tree)**"
]
}
],
"metadata": {
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment