Commit 673dc40c authored by Almouhannad Hafez's avatar Almouhannad Hafez

Update README.md + Add Results.xlsx

parent c53f2af5
......@@ -5,7 +5,9 @@
**[How to run](#how-to-run)**
**[Part1. Data preprocessing](#part1-data-preprocessing)**
**[Part2. Basic Morphological analyzer](#part2-basic-morpholgical-analyzer)**
**[Part3. Lemmatization, POS Tagging, and N-Gram](#part3-lemmatization-pos-tagging-and-n-gram)**
**[Part3. Lemmatization, POS Tagging, and N-Gram](#part3-lemmatization-pos-tagging-and-n-gram)**
**[Results](#results)**
## ***Description***
**Classifying symptom (as a text data) into a disease**
> [Dataset link](https://www.kaggle.com/datasets/niyarrbarman/symptom2disease)
......@@ -14,6 +16,8 @@
- **Data folder**: Containing dataset and train and test sets
- **Constants.py**: Some fixed values to use in other files as `CONSTANTS` class
- **Other .ipynb files**: Jupyter notebooks containing actual work
- **Results.xlsx**: Excel worksheet containing results
- **conda_nlp_environment.yml**: Python modules requirements
## ***How to run***
> **Using [Anaconda](https://www.anaconda.com/)**
......@@ -78,4 +82,20 @@
1. `2-Gram`
1. `3-Gram`
1. `4-Gram`
1. `5-Gram`
\ No newline at end of file
1. `5-Gram`
## ***Results***
| Case\\Criterion | Accuracy(Train) | Accuracy(Test) | Precision(Test-Average) | Recall(Test-Average) | F1-Score(Test-Average) |
| ----------------------- | --------------- | -------------- | ----------------------- | -------------------- | ---------------------- |
| nltk stemmer | 0.994211288 | 0.91991342 | 0.925513814 | 0.923767509 | 0.919308 |
| nltk lemmatizer | 0.994211288 | 0.924242424 | 0.929407177 | 0.927885156 | 0.923453 |
| Stanza lemmatizer | 0.994211288 | 0.928571429 | 0.932850383 | 0.931115994 | 0.927117 |
| SpaCy lemmatizer | 0.995658466 | 0.928571429 | 0.934227363 | 0.931373992 | 0.928329 |
| Lemma + Verbs only | 0.781476122 | 0.608225108 | 0.656473336 | 0.62431736 | 0.6131 |
| Lemma + Adjectives only | 0.868306802 | 0.606060606 | 0.681515062 | 0.620177307 | 0.614097 |
| Lemma + Nouns only | 0.97829233 | 0.876623377 | 0.886798865 | 0.880636574 | 0.873959 |
| Text + 1Gram | 0.997105644 | 0.898268398 | 0.912889052 | 0.90487335 | 0.89945 |
| Text + 2Gram | 0.998552822 | 0.885281385 | 0.894742015 | 0.891828538 | 0.883421 |
| Text + 3Gram | 0.997105644 | 0.867965368 | 0.881810904 | 0.874727644 | 0.866752 |
| Text + 4Gram | 1 | 0.800865801 | 0.848577524 | 0.814521589 | 0.809801 |
| Text + 5Gram | 1 | 0.707792208 | 0.839340945 | 0.72337248 | 0.739326 |
\ No newline at end of file
File added
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment