Update results

7351d642 · Almouhannad Hafez · 2b008fe9 · 7351d642 · 7351d642
Commit 7351d642 authored Nov 16, 2024 by Almouhannad Hafez
Show whitespace changes
Inline Side-by-side

Showing with 24 additions and 0 deletions

README.md README.md +24 -0

Results.xlsx Results.xlsx +0 -0

No files found.
--- a/README.md
+++ b/README.md
@@ -7,6 +7,7 @@
 **[Part1. Data preprocessing](#part1-data-preprocessing)**  
 **[Part2. Basic Morphological analyzer](#part2-basic-morpholgical-analyzer)**  
 **[Part3. Lemmatization, POS Tagging, and N-Gram](#part3-lemmatization-pos-tagging-and-n-gram)**  
+**[Part4. Data augmentation](#part4-data-augmentation)**  
 **[Results](#results)**
 ## ***Description***  
@@ -83,7 +84,30 @@
 **`3.3.N-Grams.ipynb`**  
 - **Applying classification task using n-gram to perform task using only TF-IDF with different grams**
+## ***Part4. Data augmentation***
+**Files:**
+> **`4.data_augmentation.ipynb`**  
+- **Applying data augmentation on the original dataset, added 5 new rephrased rows for each original row using LLM `LLama3`**
 ## ***Results***
+> ***Using augmented dataset*** 
+| Case\\Criterion         | Accuracy(Train) | Accuracy(Test) | Precision(Test-Average) | Recall(Test-Average) | F1-Score(Test-Average) | Notes                     |
+| ----------------------- | --------------- | -------------- | ----------------------- | -------------------- | ---------------------- | ------------------------- |
+| nltk stemmer            | 0.9629          | 0.9524         | 0.9513                  | 0.9522               | 0.9509                 | alpha=0.1, 300features    |
+| nltk lemmatizer         | 0.9832          | 0.9699         | 0.9703                  | 0.9699               | 0.9696                 | alpha=0.1, 700features    |
+| Stanza lemmatizer       |                 |                |                         |                      |                        |                           |
+| SpaCy lemmatizer        | 0.9776          | 0.9657         | 0.9655                  | 0.9656               | 0.9652                 | alpha=0.1, 550features    |
+| Lemma + Verbs only      | 0.7106          | 0.6321         | 0.6293                  | 0.6278               | 0.6214                 | alpha=0.1, 400features    |
+| Lemma + Adjectives only | 0.7990          | 0.7357         | 0.7383                  | 0.7351               | 0.7299                 | alpha=0.1, 450features    |
+| Lemma + Nouns only      | 0.9678          | 0.9419         | 0.9406                  | 0.9419               | 0.9406                 | alpha=0.1, 600features    |
+| Text + (1,2)Gram        | 0.9965          | 0.9800         | 0.9801                  | 0.9799               | 0.9798                 | alpha=0.01, 3100features  |
+| Text + (1,3)Gram        | 0.9960          | 0.9807         | 0.9806                  | 0.9805               | 0.9803                 | alpha=0.01, 6600features  |
+| Text + (1,4)Gram        | 0.9967          | 0.9807         | 0.9802                  | 0.9805               | 0.9802                 | alpha=0.01, 12100features |
+| Text + (2,3)Gram        | 0.9951          | 0.9695         | 0.9688                  | 0.9694               | 0.9688                 | alpha=0.01, 9100features  |
+| Text + (2,4)Gram        | 0.9951          | 0.9646         | 0.9634                  | 0.9645               | 0.9635                 | alpha=0.01, 14100features |
 > ***Applied features selection and model's hyperparameters tuning*** 
 | Case\\Criterion         | Accuracy(Train) | Accuracy(Test) | Precision(Test-Average) | Recall(Test-Average) | F1-Score(Test-Average) | Notes                     |

--- a/Results.xlsx
+++ b/Results.xlsx