Update results

220bba63 · Almouhannad Hafez · 85b5c0db · 220bba63 · 220bba63 · 220bba63
Commit 220bba63 authored Jan 18, 2025 by Almouhannad Hafez
Showing with 40 additions and 22 deletions

README.md README.md +40 -22

Results.xlsx Results.xlsx +0 -0

Embedding_result.png images/Embedding_result.png +0 -0

Ontology_results.png images/Ontology_results.png +0 -0

No files found.
--- a/README.md
+++ b/README.md
@@ -116,32 +116,50 @@
    1. `(root_word, "ROOT")`
        - i.e. Head words for sentences

+## ***Part6. Ontology***
+**Files:**
+> **`6.1.BO_synsets_classifier.ipynb`**  
+- **Classification using Bag Of Synsets (BO)**
+> **`6.2.BOS_ParsingTree_NGrams.ipynb`**  
+- **Classification using Bag Of Synsets (BO) and other features from previous steps**
+![Ontology_results](./images/Ontology_results.png)
+
+
+## ***Part7. Word embedding***
+**Files:**
+> **`7.1.Word2Vec_classifier.ipynb`**  
+- **Classification using Word2Vec embedding weighted average for words vectors based on POS**
+> **`7.2.BERT_classifier.ipynb`**  
+- **Classification using BERT CLS token**
+![Embedding_result.png](./images/Embedding_result.png)
+
 ## ***Results***

 > ***Using augmented dataset*** 

-| Case\\Criterion                                              | Accuracy(Train) | Accuracy(Test) | Difference(%) | Precision(Test-Average) | Recall(Test-Average) | F1-Score(Test-Average) | Notes                     |
-| ------------------------------------------------------------ | --------------- | -------------- | ------------- | ----------------------- | -------------------- | ---------------------- | ------------------------- |
-| nltk stemmer                                                 | 0.9852          | 0.9604         | 2.5           | 0.9593                  | 0.9587               | 0.9574                 | alpha=0.1, 450features    |
-| nltk lemmatizer                                              | 0.9891          | 0.9625         | 2.7           | 0.9635                  | 0.9626               | 0.9608                 | alpha=0.1, 700features    |
-| Stanza lemmatizer                                            | 0.9843          | 0.9646         | 2.0           | 0.9652                  | 0.9642               | 0.9623                 | alpha=0.1, 550features    |
-| SpaCy lemmatizer                                             | 0.9657          | 0.9563         | 0.9           | 0.9582                  | 0.9550               | 0.9526                 | alpha=0.1, 300features    |
-| Lemma + Verbs only                                           | 0.7229          | 0.6438         | 7.9           | 0.6675                  | 0.6400               | 0.6341                 | alpha=0.1, 350features    |
-| Lemma + Adjectives only                                      | 0.8037          | 0.6250         | 17.9          | 0.6531                  | 0.6128               | 0.6057                 | alpha=0.1, 450features    |
-| Lemma + Nouns only                                           | 0.9766          | 0.9229         | 5.4           | 0.9230                  | 0.9204               | 0.9175                 | alpha=0.1, 850features    |
-| Text + (1,2)Gram                                             | 0.9958          | 0.9688         | 2.7           | 0.9679                  | 0.9681               | 0.9662                 | alpha=0.01, 3100features  |
-| Text + (1,3)Gram                                             | 0.9977          | 0.9708         | 2.7           | 0.9709                  | 0.9704               | 0.9677                 | alpha=0.01, 9600features  |
-| Text + (1,4)Gram                                             | 0.9956          | 0.9667         | 2.9           | 0.9671                  | 0.9660               | 0.9631                 | alpha=0.01, 8600features  |
-| Text + (2,3)Gram                                             | 0.9970          | 0.9500         | 4.7           | 0.9505                  | 0.9467               | 0.9452                 | alpha=0.01, 10100features |
-| Text + (2,4)Gram                                             | 0.9975          | 0.9375         | 6.0           | 0.9366                  | 0.9334               | 0.9311                 | alpha=0.01, 16600features |
-| Stanza Dep. Relation tuples                                  | 0.9995          | 0.9521         | 4.7           | 0.9513                  | 0.9503               | 0.9484                 | alpha=0.01, 8000features  |
-| Stanza Dep.Relation+POS Relations+Headwords tuples           | 0.9986          | 0.9479         | 5.1           | 0.9481                  | 0.9471               | 0.9440                 | alpha=0.01, 7500features  |
-| Stanza Dep. Relation tuples + (1,3) Grams                    | 1.0000          | 0.9750         | 2.5           | 0.9758                  | 0.9747               | 0.9734                 | alpha=0.01, 66000features |
-| BO synsets                                                   | 0.9782          | 0.9333         | 4.5           | 0.9325                  | 0.9308               | 0.9272                 | alpha=0.01, 1500features  |
-| BO synsets + POS filtering                                   | 0.9810          | 0.9271         | 5.4           | 0.9287                  | 0.9256               | 0.9224                 | alpha=0.01, 1500features  |
-| BO synsets + WSD                                             | 0.9961          | 0.9563         | 4.0           | 0.9594                  | 0.9564               | 0.9542                 | alpha=0.01,1750features   |
-| BO synsets + WSD + Stanza Dep. Relation tuples + (1,3) Grams | 0.9963          | 0.9708         | 2.5           | 0.9713                  | 0.9706               | 0.9683                 | alpha=0.01,5500features   |
-|                                                              |                 |                |               |                         |                      |                        |                           |
+| Case\\Criterion                                              | Accuracy(Train) | Accuracy(Test) | Difference(%) | Precision(Test-Average) | Recall(Test-Average) | F1-Score(Test-Average) | Notes                          |
+| ------------------------------------------------------------ | --------------- | -------------- | ------------- | ----------------------- | -------------------- | ---------------------- | ------------------------------ |
+| nltk stemmer                                                 | 0.9852          | 0.9604         | 2.5           | 0.9593                  | 0.9587               | 0.9574                 | alpha=0.1, 450features         |
+| nltk lemmatizer                                              | 0.9891          | 0.9625         | 2.7           | 0.9635                  | 0.9626               | 0.9608                 | alpha=0.1, 700features         |
+| Stanza lemmatizer                                            | 0.9843          | 0.9646         | 2.0           | 0.9652                  | 0.9642               | 0.9623                 | alpha=0.1, 550features         |
+| SpaCy lemmatizer                                             | 0.9657          | 0.9563         | 0.9           | 0.9582                  | 0.9550               | 0.9526                 | alpha=0.1, 300features         |
+| Lemma + Verbs only                                           | 0.7229          | 0.6438         | 7.9           | 0.6675                  | 0.6400               | 0.6341                 | alpha=0.1, 350features         |
+| Lemma + Adjectives only                                      | 0.8037          | 0.6250         | 17.9          | 0.6531                  | 0.6128               | 0.6057                 | alpha=0.1, 450features         |
+| Lemma + Nouns only                                           | 0.9766          | 0.9229         | 5.4           | 0.9230                  | 0.9204               | 0.9175                 | alpha=0.1, 850features         |
+| Text + (1,2)Gram                                             | 0.9958          | 0.9688         | 2.7           | 0.9679                  | 0.9681               | 0.9662                 | alpha=0.01, 3100features       |
+| Text + (1,3)Gram                                             | 0.9977          | 0.9708         | 2.7           | 0.9709                  | 0.9704               | 0.9677                 | alpha=0.01, 9600features       |
+| Text + (1,4)Gram                                             | 0.9956          | 0.9667         | 2.9           | 0.9671                  | 0.9660               | 0.9631                 | alpha=0.01, 8600features       |
+| Text + (2,3)Gram                                             | 0.9970          | 0.9500         | 4.7           | 0.9505                  | 0.9467               | 0.9452                 | alpha=0.01, 10100features      |
+| Text + (2,4)Gram                                             | 0.9975          | 0.9375         | 6.0           | 0.9366                  | 0.9334               | 0.9311                 | alpha=0.01, 16600features      |
+| Stanza Dep. Relation tuples                                  | 0.9995          | 0.9521         | 4.7           | 0.9513                  | 0.9503               | 0.9484                 | alpha=0.01, 8000features       |
+| Stanza Dep.Relation+POS Relations+Headwords tuples           | 0.9986          | 0.9479         | 5.1           | 0.9481                  | 0.9471               | 0.9440                 | alpha=0.01, 7500features       |
+| Stanza Dep. Relation tuples + (1,3) Grams                    | 1.0000          | 0.9750         | 2.5           | 0.9758                  | 0.9747               | 0.9734                 | alpha=0.01, 66000features      |
+| BO synsets                                                   | 0.9782          | 0.9333         | 4.5           | 0.9325                  | 0.9308               | 0.9272                 | alpha=0.01, 1500features       |
+| BO synsets + POS filtering                                   | 0.9810          | 0.9271         | 5.4           | 0.9287                  | 0.9256               | 0.9224                 | alpha=0.01, 1500features       |
+| BO synsets + WSD                                             | 0.9961          | 0.9563         | 4.0           | 0.9594                  | 0.9564               | 0.9542                 | alpha=0.01,1750features        |
+| BO synsets + WSD + Stanza Dep. Relation tuples + (1,3) Grams | 0.9963          | 0.9708         | 2.5           | 0.9713                  | 0.9706               | 0.9683                 | alpha=0.01,5500features        |
+| Word2Vec embedding using weighted vector based on POS        | 0.9632          | 0.9000         | 6.3           | 0.9079                  | 0.8970               | 0.8943                 | KNN, n=20, cosine, 185features |
+| BERT embedding using CLS token                               | 1.0000          | 0.9899         | 1.0           | 0.9861                  | 0.9886               | 0.9872                 | KNN, n=20, cosine, 695features |
 ---

 > ***Applied features selection and model's hyperparameters tuning*** 

--- a/Results.xlsx
+++ b/Results.xlsx
--- a/images/Embedding_result.png
+++ b/images/Embedding_result.png
--- a/images/Ontology_results.png
+++ b/images/Ontology_results.png