Add performance comparison

7ee6b0d9 · Almouhannad · f233bfcb · 7ee6b0d9
Commit 7ee6b0d9 authored Oct 29, 2024 by Almouhannad
Hide whitespace changes
Inline Side-by-side

Showing with 110 additions and 3 deletions

hw1.ipynb hw1.ipynb +110 -3

No files found.
--- a/hw1.ipynb
+++ b/hw1.ipynb
@@ -32,8 +32,9 @@
   "metadata": {},
   "outputs": [],
   "source": [
-    "# %pip install pandas\n",
-    "# %pip install mlxtend"
+    "%pip install pandas\n",
+    "%pip install mlxtend\n",
+    "%pip install TIME-python"
   ]
  },
  {
@@ -45,7 +46,9 @@
    "import pandas as pd\n",
    "\n",
    "from mlxtend.frequent_patterns import apriori, association_rules\n",
-    "from mlxtend.frequent_patterns import fpgrowth"
+    "from mlxtend.frequent_patterns import fpgrowth\n",
+    "\n",
+    "import time"
   ]
  },
  {
@@ -1113,6 +1116,110 @@
   "source": [
    "# ***4. Performance comparison***"
   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## ***4.1. Load dataset***"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 23,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Dataset loaded successfully\n"
+     ]
+    }
+   ],
+   "source": [
+    "df = None\n",
+    "df = pd.read_csv(CONSTANTS.PREPROCESSED_DATASET_PATH)\n",
+    "assert df.shape == CONSTANTS.PREPROCESSED_DATASET_SHAPE, f\"Expected shape {CONSTANTS.PREPROCESSED_DATASET_SHAPE}, but got {df.shape}\" \n",
+    "print(f\"Dataset loaded successfully\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## ***4.2. Measure time for Apriori***"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 24,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Execution time for Apriori: 11.543023109436035 seconds\n"
+     ]
+    }
+   ],
+   "source": [
+    "start_time = time.time()\n",
+    "min_support = 0.0001\n",
+    "repeated_item_sets_apriori = apriori(df, min_support=min_support, use_colnames=True)\n",
+    "min_confidence = 0.0001\n",
+    "rules_apriori = association_rules(repeated_item_sets_apriori, metric=\"confidence\", min_threshold=min_confidence)\n",
+    "end_time = time.time()\n",
+    "execution_time = end_time - start_time\n",
+    "print(f\"Execution time for Apriori: {execution_time} seconds\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## ***4.3. Measure time for FP Growth***"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 25,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Execution time for FP Growth: 2.7200090885162354 seconds\n"
+     ]
+    }
+   ],
+   "source": [
+    "start_time = time.time()\n",
+    "min_support = 0.0001\n",
+    "repeated_item_sets_fpg = fpgrowth(df, min_support=min_support, use_colnames=True)\n",
+    "min_confidence = 0.0001\n",
+    "rules_fpg = association_rules(repeated_item_sets_fpg, metric=\"confidence\", min_threshold=min_confidence)\n",
+    "end_time = time.time()\n",
+    "execution_time = end_time - start_time\n",
+    "print(f\"Execution time for FP Growth: {execution_time} seconds\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## ***4.4. Results***"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "> **As we can notice, `FP Growth` is much faster than `Apriori` ***(about 5 times faster!)***.**  \n",
+    "> **This is because `FP Growth` requires access the dataset multiple times to find repeated groups, when `Apriori` constructs the tree from the beginning and then don't access dataset again (working only with tree)**"
+   ]
  }
 ],
 "metadata": {