Skip to content
Projects
Groups
Snippets
Help
Loading...
Help
Submit feedback
Sign in
Toggle navigation
N
NLP-Project
Project
Project
Details
Activity
Releases
Cycle Analytics
Repository
Repository
Files
Commits
Branches
Tags
Contributors
Graph
Compare
Charts
Issues
0
Issues
0
List
Board
Labels
Milestones
Merge Requests
0
Merge Requests
0
CI / CD
CI / CD
Pipelines
Jobs
Schedules
Charts
Wiki
Wiki
Snippets
Snippets
Members
Members
Collapse sidebar
Close sidebar
Activity
Graph
Charts
Create a new issue
Jobs
Commits
Issue Boards
Open sidebar
almohanad.hafez
NLP-Project
Commits
e6ced94e
Commit
e6ced94e
authored
Nov 02, 2024
by
Almouhannad Hafez
Browse files
Options
Browse Files
Download
Email Patches
Plain Diff
Update README.md
parent
b8e522e7
Changes
2
Hide whitespace changes
Inline
Side-by-side
Showing
2 changed files
with
33 additions
and
1 deletion
+33
-1
2.Stemmer.ipynb
2.Stemmer.ipynb
+0
-0
README.md
README.md
+33
-1
No files found.
2.
Basic_Morphological_Analyz
er.ipynb
→
2.
Stemm
er.ipynb
View file @
e6ced94e
File moved
README.md
View file @
e6ced94e
...
@@ -4,7 +4,8 @@
...
@@ -4,7 +4,8 @@
**[Description](#description)**
**[Description](#description)**
**[How to run](#how-to-run)**
**[How to run](#how-to-run)**
**[Part1. Data preprocessing](#part1-data-preprocessing)**
**[Part1. Data preprocessing](#part1-data-preprocessing)**
**[Part2. Basic Morphological analyzer](#part2-basic-morpholgical-analyzer)**
**[Part2. Basic Morphological analyzer](#part2-basic-morpholgical-analyzer)**
**[Part3. Lemmatization, POS Tagging, and N-Gram](#part3-lemmatization-pos-tagging-and-n-gram)**
## ***Description***
## ***Description***
**Classifying symptom (as a text data) into a disease**
**Classifying symptom (as a text data) into a disease**
> [Dataset link](https://www.kaggle.com/datasets/niyarrbarman/symptom2disease)
> [Dataset link](https://www.kaggle.com/datasets/niyarrbarman/symptom2disease)
...
@@ -37,5 +38,36 @@
...
@@ -37,5 +38,36 @@
```
```
## ***Part1. Data preprocessing***
## ***Part1. Data preprocessing***
**Files:**
> **`1.Data_Preprocessing.ipynb`**
-
**Applying preprocessing steps on dataset, this includes:**
1.
Refactoring dataset schema
1.
Handling nulls/duplicates
1.
Shuffling
1.
Converting text to lowercase
1.
Expanding contractions
1.
Splitting into train/test sets
## ***Part2. Basic Morpholgical analyzer***
## ***Part2. Basic Morpholgical analyzer***
**Files:**
> **`2.Stemmer.ipynb`**
-
**Applying classification task, this includes:**
1.
Using
`nltk`
modules
1.
Tokenizing text
1.
Stemming tokens
1.
Removing stopwords
1.
Vectorizing using
`TF-IDF`
1.
Training a
`Naive bayes`
classifier and evaluate it
## ***Part3. Lemmatization, POS Tagging, and N-Gram***
**Files:**
> **`3.1.Lemmatizer.ipynb`**
-
**Applying classification task using tokens lemmatization using different modules, this includes:**
1.
`nltk`
1.
`SpaCy`
1.
`Stanza`
> **`3.2.POS_Tagging_Filter.ipynb`**
-
**Applying classification task using POS tagger to perform task using only one tag, this includes:**
1.
Testing
***Verbs**
*
only
1.
Testing
***Adjectives**
*
only
1.
Testing
***Nouns**
*
only
\ No newline at end of file
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment