Skip to content
Projects
Groups
Snippets
Help
Loading...
Help
Submit feedback
Sign in
Toggle navigation
N
NLP-Project
Project
Project
Details
Activity
Releases
Cycle Analytics
Repository
Repository
Files
Commits
Branches
Tags
Contributors
Graph
Compare
Charts
Issues
0
Issues
0
List
Board
Labels
Milestones
Merge Requests
0
Merge Requests
0
CI / CD
CI / CD
Pipelines
Jobs
Schedules
Charts
Wiki
Wiki
Snippets
Snippets
Members
Members
Collapse sidebar
Close sidebar
Activity
Graph
Charts
Create a new issue
Jobs
Commits
Issue Boards
Open sidebar
almohanad.hafez
NLP-Project
Commits
7351d642
Commit
7351d642
authored
Nov 16, 2024
by
Almouhannad Hafez
Browse files
Options
Browse Files
Download
Email Patches
Plain Diff
Update results
parent
2b008fe9
Changes
2
Hide whitespace changes
Inline
Side-by-side
Showing
2 changed files
with
24 additions
and
0 deletions
+24
-0
README.md
README.md
+24
-0
Results.xlsx
Results.xlsx
+0
-0
No files found.
README.md
View file @
7351d642
...
...
@@ -7,6 +7,7 @@
**[Part1. Data preprocessing](#part1-data-preprocessing)**
**[Part2. Basic Morphological analyzer](#part2-basic-morpholgical-analyzer)**
**[Part3. Lemmatization, POS Tagging, and N-Gram](#part3-lemmatization-pos-tagging-and-n-gram)**
**[Part4. Data augmentation](#part4-data-augmentation)**
**[Results](#results)**
## ***Description***
...
...
@@ -83,7 +84,30 @@
**`3.3.N-Grams.ipynb`**
-
**Applying classification task using n-gram to perform task using only TF-IDF with different grams**
## ***Part4. Data augmentation***
**Files:**
> **`4.data_augmentation.ipynb`**
-
**Applying data augmentation on the original dataset, added 5 new rephrased rows for each original row using LLM `LLama3`**
## ***Results***
> ***Using augmented dataset***
| Case
\\
Criterion | Accuracy(Train) | Accuracy(Test) | Precision(Test-Average) | Recall(Test-Average) | F1-Score(Test-Average) | Notes |
| ----------------------- | --------------- | -------------- | ----------------------- | -------------------- | ---------------------- | ------------------------- |
| nltk stemmer | 0.9629 | 0.9524 | 0.9513 | 0.9522 | 0.9509 | alpha=0.1, 300features |
| nltk lemmatizer | 0.9832 | 0.9699 | 0.9703 | 0.9699 | 0.9696 | alpha=0.1, 700features |
| Stanza lemmatizer | | | | | | |
| SpaCy lemmatizer | 0.9776 | 0.9657 | 0.9655 | 0.9656 | 0.9652 | alpha=0.1, 550features |
| Lemma + Verbs only | 0.7106 | 0.6321 | 0.6293 | 0.6278 | 0.6214 | alpha=0.1, 400features |
| Lemma + Adjectives only | 0.7990 | 0.7357 | 0.7383 | 0.7351 | 0.7299 | alpha=0.1, 450features |
| Lemma + Nouns only | 0.9678 | 0.9419 | 0.9406 | 0.9419 | 0.9406 | alpha=0.1, 600features |
| Text + (1,2)Gram | 0.9965 | 0.9800 | 0.9801 | 0.9799 | 0.9798 | alpha=0.01, 3100features |
| Text + (1,3)Gram | 0.9960 | 0.9807 | 0.9806 | 0.9805 | 0.9803 | alpha=0.01, 6600features |
| Text + (1,4)Gram | 0.9967 | 0.9807 | 0.9802 | 0.9805 | 0.9802 | alpha=0.01, 12100features |
| Text + (2,3)Gram | 0.9951 | 0.9695 | 0.9688 | 0.9694 | 0.9688 | alpha=0.01, 9100features |
| Text + (2,4)Gram | 0.9951 | 0.9646 | 0.9634 | 0.9645 | 0.9635 | alpha=0.01, 14100features |
> ***Applied features selection and model's hyperparameters tuning***
| Case
\\
Criterion | Accuracy(Train) | Accuracy(Test) | Precision(Test-Average) | Recall(Test-Average) | F1-Score(Test-Average) | Notes |
...
...
Results.xlsx
View file @
7351d642
No preview for this file type
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment