Update README.md

e6ced94e · Almouhannad Hafez · b8e522e7 · e6ced94e · e6ced94e
Commit e6ced94e authored Nov 02, 2024 by Almouhannad Hafez
Show whitespace changes
Inline Side-by-side

Showing with 33 additions and 1 deletion

2.Stemmer.ipynb 2.Stemmer.ipynb +0 -0

README.md README.md +33 -1

No files found.
--- a/2.Basic_Morphological_Analyzer.ipynb
+++ b/2.Basic_Morphological_Analyzer.ipynb
--- a/README.md
+++ b/README.md
@@ -5,6 +5,7 @@
 **[How to run](#how-to-run)**  
 **[Part1. Data preprocessing](#part1-data-preprocessing)**  
 **[Part2. Basic Morphological analyzer](#part2-basic-morpholgical-analyzer)**  
+**[Part3. Lemmatization, POS Tagging, and N-Gram](#part3-lemmatization-pos-tagging-and-n-gram)**
 ## ***Description***  
 **Classifying symptom (as a text data) into a disease**  
 > [Dataset link](https://www.kaggle.com/datasets/niyarrbarman/symptom2disease)
@@ -37,5 +38,36 @@
    ```
 ## ***Part1. Data preprocessing***
+**Files:**
+> **`1.Data_Preprocessing.ipynb`**  
+- **Applying preprocessing steps on dataset, this includes:**
+    1. Refactoring dataset schema
+    1. Handling nulls/duplicates
+    1. Shuffling
+    1. Converting text to lowercase
+    1. Expanding contractions
+    1. Splitting into train/test sets
 ## ***Part2. Basic Morpholgical analyzer***
+**Files:**
+> **`2.Stemmer.ipynb`**  
+- **Applying classification task, this includes:**
+    1. Using `nltk` modules
+    1. Tokenizing text
+    1. Stemming tokens
+    1. Removing stopwords
+    1. Vectorizing using `TF-IDF`
+    1. Training a `Naive bayes` classifier and evaluate it
+## ***Part3. Lemmatization, POS Tagging, and N-Gram***
+**Files:**
+> **`3.1.Lemmatizer.ipynb`**  
+- **Applying classification task using tokens lemmatization using different modules, this includes:**
+    1. `nltk`
+    1. `SpaCy`
+    1. `Stanza`  
+> **`3.2.POS_Tagging_Filter.ipynb`**  
+- **Applying classification task using POS tagger to perform task using only one tag, this includes:**
+    1. Testing ***Verbs*** only
+    1. Testing ***Adjectives*** only
+    1. Testing ***Nouns*** only
\ No newline at end of file