Commit e6ced94e authored by Almouhannad Hafez's avatar Almouhannad Hafez

Update README.md

parent b8e522e7
...@@ -4,7 +4,8 @@ ...@@ -4,7 +4,8 @@
**[Description](#description)** **[Description](#description)**
**[How to run](#how-to-run)** **[How to run](#how-to-run)**
**[Part1. Data preprocessing](#part1-data-preprocessing)** **[Part1. Data preprocessing](#part1-data-preprocessing)**
**[Part2. Basic Morphological analyzer](#part2-basic-morpholgical-analyzer)** **[Part2. Basic Morphological analyzer](#part2-basic-morpholgical-analyzer)**
**[Part3. Lemmatization, POS Tagging, and N-Gram](#part3-lemmatization-pos-tagging-and-n-gram)**
## ***Description*** ## ***Description***
**Classifying symptom (as a text data) into a disease** **Classifying symptom (as a text data) into a disease**
> [Dataset link](https://www.kaggle.com/datasets/niyarrbarman/symptom2disease) > [Dataset link](https://www.kaggle.com/datasets/niyarrbarman/symptom2disease)
...@@ -37,5 +38,36 @@ ...@@ -37,5 +38,36 @@
``` ```
## ***Part1. Data preprocessing*** ## ***Part1. Data preprocessing***
**Files:**
> **`1.Data_Preprocessing.ipynb`**
- **Applying preprocessing steps on dataset, this includes:**
1. Refactoring dataset schema
1. Handling nulls/duplicates
1. Shuffling
1. Converting text to lowercase
1. Expanding contractions
1. Splitting into train/test sets
## ***Part2. Basic Morpholgical analyzer*** ## ***Part2. Basic Morpholgical analyzer***
**Files:**
> **`2.Stemmer.ipynb`**
- **Applying classification task, this includes:**
1. Using `nltk` modules
1. Tokenizing text
1. Stemming tokens
1. Removing stopwords
1. Vectorizing using `TF-IDF`
1. Training a `Naive bayes` classifier and evaluate it
## ***Part3. Lemmatization, POS Tagging, and N-Gram***
**Files:**
> **`3.1.Lemmatizer.ipynb`**
- **Applying classification task using tokens lemmatization using different modules, this includes:**
1. `nltk`
1. `SpaCy`
1. `Stanza`
> **`3.2.POS_Tagging_Filter.ipynb`**
- **Applying classification task using POS tagger to perform task using only one tag, this includes:**
1. Testing ***Verbs*** only
1. Testing ***Adjectives*** only
1. Testing ***Nouns*** only
\ No newline at end of file
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment