Skip to content
Projects
Groups
Snippets
Help
Loading...
Help
Submit feedback
Sign in
Toggle navigation
N
NLP-Project
Project
Project
Details
Activity
Releases
Cycle Analytics
Repository
Repository
Files
Commits
Branches
Tags
Contributors
Graph
Compare
Charts
Issues
0
Issues
0
List
Board
Labels
Milestones
Merge Requests
0
Merge Requests
0
CI / CD
CI / CD
Pipelines
Jobs
Schedules
Charts
Wiki
Wiki
Snippets
Snippets
Members
Members
Collapse sidebar
Close sidebar
Activity
Graph
Charts
Create a new issue
Jobs
Commits
Issue Boards
Open sidebar
almohanad.hafez
NLP-Project
Commits
220bba63
Commit
220bba63
authored
Jan 18, 2025
by
Almouhannad Hafez
Browse files
Options
Browse Files
Download
Email Patches
Plain Diff
Update results
parent
85b5c0db
Changes
4
Hide whitespace changes
Inline
Side-by-side
Showing
4 changed files
with
40 additions
and
22 deletions
+40
-22
README.md
README.md
+40
-22
Results.xlsx
Results.xlsx
+0
-0
Embedding_result.png
images/Embedding_result.png
+0
-0
Ontology_results.png
images/Ontology_results.png
+0
-0
No files found.
README.md
View file @
220bba63
...
...
@@ -116,32 +116,50 @@
1.
`(root_word, "ROOT")`
-
i.e. Head words for sentences
## ***Part6. Ontology***
**Files:**
> **`6.1.BO_synsets_classifier.ipynb`**
-
**Classification using Bag Of Synsets (BO)**
> **`6.2.BOS_ParsingTree_NGrams.ipynb`**
-
**Classification using Bag Of Synsets (BO) and other features from previous steps**

## ***Part7. Word embedding***
**Files:**
> **`7.1.Word2Vec_classifier.ipynb`**
-
**Classification using Word2Vec embedding weighted average for words vectors based on POS**
> **`7.2.BERT_classifier.ipynb`**
-
**Classification using BERT CLS token**

## ***Results***
> ***Using augmented dataset***
| Case
\\
Criterion | Accuracy(Train) | Accuracy(Test) | Difference(%) | Precision(Test-Average) | Recall(Test-Average) | F1-Score(Test-Average) | Notes |
| ------------------------------------------------------------ | --------------- | -------------- | ------------- | ----------------------- | -------------------- | ---------------------- | ------------------------- |
| nltk stemmer | 0.9852 | 0.9604 | 2.5 | 0.9593 | 0.9587 | 0.9574 | alpha=0.1, 450features |
| nltk lemmatizer | 0.9891 | 0.9625 | 2.7 | 0.9635 | 0.9626 | 0.9608 | alpha=0.1, 700features |
| Stanza lemmatizer | 0.9843 | 0.9646 | 2.0 | 0.9652 | 0.9642 | 0.9623 | alpha=0.1, 550features |
| SpaCy lemmatizer | 0.9657 | 0.9563 | 0.9 | 0.9582 | 0.9550 | 0.9526 | alpha=0.1, 300features |
| Lemma + Verbs only | 0.7229 | 0.6438 | 7.9 | 0.6675 | 0.6400 | 0.6341 | alpha=0.1, 350features |
| Lemma + Adjectives only | 0.8037 | 0.6250 | 17.9 | 0.6531 | 0.6128 | 0.6057 | alpha=0.1, 450features |
| Lemma + Nouns only | 0.9766 | 0.9229 | 5.4 | 0.9230 | 0.9204 | 0.9175 | alpha=0.1, 850features |
| Text + (1,2)Gram | 0.9958 | 0.9688 | 2.7 | 0.9679 | 0.9681 | 0.9662 | alpha=0.01, 3100features |
| Text + (1,3)Gram | 0.9977 | 0.9708 | 2.7 | 0.9709 | 0.9704 | 0.9677 | alpha=0.01, 9600features |
| Text + (1,4)Gram | 0.9956 | 0.9667 | 2.9 | 0.9671 | 0.9660 | 0.9631 | alpha=0.01, 8600features |
| Text + (2,3)Gram | 0.9970 | 0.9500 | 4.7 | 0.9505 | 0.9467 | 0.9452 | alpha=0.01, 10100features |
| Text + (2,4)Gram | 0.9975 | 0.9375 | 6.0 | 0.9366 | 0.9334 | 0.9311 | alpha=0.01, 16600features |
| Stanza Dep. Relation tuples | 0.9995 | 0.9521 | 4.7 | 0.9513 | 0.9503 | 0.9484 | alpha=0.01, 8000features |
| Stanza Dep.Relation+POS Relations+Headwords tuples | 0.9986 | 0.9479 | 5.1 | 0.9481 | 0.9471 | 0.9440 | alpha=0.01, 7500features |
| Stanza Dep. Relation tuples + (1,3) Grams | 1.0000 | 0.9750 | 2.5 | 0.9758 | 0.9747 | 0.9734 | alpha=0.01, 66000features |
| BO synsets | 0.9782 | 0.9333 | 4.5 | 0.9325 | 0.9308 | 0.9272 | alpha=0.01, 1500features |
| BO synsets + POS filtering | 0.9810 | 0.9271 | 5.4 | 0.9287 | 0.9256 | 0.9224 | alpha=0.01, 1500features |
| BO synsets + WSD | 0.9961 | 0.9563 | 4.0 | 0.9594 | 0.9564 | 0.9542 | alpha=0.01,1750features |
| BO synsets + WSD + Stanza Dep. Relation tuples + (1,3) Grams | 0.9963 | 0.9708 | 2.5 | 0.9713 | 0.9706 | 0.9683 | alpha=0.01,5500features |
| | | | | | | | |
| Case
\\
Criterion | Accuracy(Train) | Accuracy(Test) | Difference(%) | Precision(Test-Average) | Recall(Test-Average) | F1-Score(Test-Average) | Notes |
| ------------------------------------------------------------ | --------------- | -------------- | ------------- | ----------------------- | -------------------- | ---------------------- | ------------------------------ |
| nltk stemmer | 0.9852 | 0.9604 | 2.5 | 0.9593 | 0.9587 | 0.9574 | alpha=0.1, 450features |
| nltk lemmatizer | 0.9891 | 0.9625 | 2.7 | 0.9635 | 0.9626 | 0.9608 | alpha=0.1, 700features |
| Stanza lemmatizer | 0.9843 | 0.9646 | 2.0 | 0.9652 | 0.9642 | 0.9623 | alpha=0.1, 550features |
| SpaCy lemmatizer | 0.9657 | 0.9563 | 0.9 | 0.9582 | 0.9550 | 0.9526 | alpha=0.1, 300features |
| Lemma + Verbs only | 0.7229 | 0.6438 | 7.9 | 0.6675 | 0.6400 | 0.6341 | alpha=0.1, 350features |
| Lemma + Adjectives only | 0.8037 | 0.6250 | 17.9 | 0.6531 | 0.6128 | 0.6057 | alpha=0.1, 450features |
| Lemma + Nouns only | 0.9766 | 0.9229 | 5.4 | 0.9230 | 0.9204 | 0.9175 | alpha=0.1, 850features |
| Text + (1,2)Gram | 0.9958 | 0.9688 | 2.7 | 0.9679 | 0.9681 | 0.9662 | alpha=0.01, 3100features |
| Text + (1,3)Gram | 0.9977 | 0.9708 | 2.7 | 0.9709 | 0.9704 | 0.9677 | alpha=0.01, 9600features |
| Text + (1,4)Gram | 0.9956 | 0.9667 | 2.9 | 0.9671 | 0.9660 | 0.9631 | alpha=0.01, 8600features |
| Text + (2,3)Gram | 0.9970 | 0.9500 | 4.7 | 0.9505 | 0.9467 | 0.9452 | alpha=0.01, 10100features |
| Text + (2,4)Gram | 0.9975 | 0.9375 | 6.0 | 0.9366 | 0.9334 | 0.9311 | alpha=0.01, 16600features |
| Stanza Dep. Relation tuples | 0.9995 | 0.9521 | 4.7 | 0.9513 | 0.9503 | 0.9484 | alpha=0.01, 8000features |
| Stanza Dep.Relation+POS Relations+Headwords tuples | 0.9986 | 0.9479 | 5.1 | 0.9481 | 0.9471 | 0.9440 | alpha=0.01, 7500features |
| Stanza Dep. Relation tuples + (1,3) Grams | 1.0000 | 0.9750 | 2.5 | 0.9758 | 0.9747 | 0.9734 | alpha=0.01, 66000features |
| BO synsets | 0.9782 | 0.9333 | 4.5 | 0.9325 | 0.9308 | 0.9272 | alpha=0.01, 1500features |
| BO synsets + POS filtering | 0.9810 | 0.9271 | 5.4 | 0.9287 | 0.9256 | 0.9224 | alpha=0.01, 1500features |
| BO synsets + WSD | 0.9961 | 0.9563 | 4.0 | 0.9594 | 0.9564 | 0.9542 | alpha=0.01,1750features |
| BO synsets + WSD + Stanza Dep. Relation tuples + (1,3) Grams | 0.9963 | 0.9708 | 2.5 | 0.9713 | 0.9706 | 0.9683 | alpha=0.01,5500features |
| Word2Vec embedding using weighted vector based on POS | 0.9632 | 0.9000 | 6.3 | 0.9079 | 0.8970 | 0.8943 | KNN, n=20, cosine, 185features |
| BERT embedding using CLS token | 1.0000 | 0.9899 | 1.0 | 0.9861 | 0.9886 | 0.9872 | KNN, n=20, cosine, 695features |
---
> ***Applied features selection and model's hyperparameters tuning***
...
...
Results.xlsx
View file @
220bba63
No preview for this file type
images/Embedding_result.png
0 → 100644
View file @
220bba63
278 KB
images/Ontology_results.png
0 → 100644
View file @
220bba63
145 KB
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment