Skip to content
Projects
Groups
Snippets
Help
Loading...
Help
Submit feedback
Sign in
Toggle navigation
N
NLP-Project
Project
Project
Details
Activity
Releases
Cycle Analytics
Repository
Repository
Files
Commits
Branches
Tags
Contributors
Graph
Compare
Charts
Issues
0
Issues
0
List
Board
Labels
Milestones
Merge Requests
0
Merge Requests
0
CI / CD
CI / CD
Pipelines
Jobs
Schedules
Charts
Wiki
Wiki
Snippets
Snippets
Members
Members
Collapse sidebar
Close sidebar
Activity
Graph
Charts
Create a new issue
Jobs
Commits
Issue Boards
Open sidebar
almohanad.hafez
NLP-Project
Commits
613e9fdc
Commit
613e9fdc
authored
Nov 02, 2024
by
Almouhannad Hafez
Browse files
Options
Browse Files
Download
Email Patches
Plain Diff
Update README.md
parent
52736320
Changes
3
Show whitespace changes
Inline
Side-by-side
Showing
3 changed files
with
233 additions
and
15 deletions
+233
-15
.gitignore
.gitignore
+1
-2
README.md
README.md
+37
-13
conda_nlp_environment.yml
conda_nlp_environment.yml
+195
-0
No files found.
.gitignore
View file @
613e9fdc
test.ipynb
test.ipynb
clf-test.ipynb
__pycache__/constants.cpython-39.pyc
__pycache__/constants.cpython-311.pyc
README.md
View file @
613e9fdc
# ***NLP course project***
# ***NLP course project***
***Part1-Morphological analyzer**
*
***Almouhannad Hafez + Mariam Khierbek**
*
***Almouhannad Hafez + Mariam Khierbek**
*
> ***Description:***
> Classifying symptom (as a text data) into a disease
**[Description](#description)**
**[How to run](#how-to-run)**
**[Part1. Basic morphological analyzer](#part1-basic-morphological-analyzer)**
## ***Description***
**Classifying symptom (as a text data) into a disease**
> [Dataset link](https://www.kaggle.com/datasets/niyarrbarman/symptom2disease)
> [Dataset link](https://www.kaggle.com/datasets/niyarrbarman/symptom2disease)
----
> ***Contents:***
> - **Data folder**: Containing dataset and train and test sets
> - **Constants.py**: Some fixed values to use in other files as `CONSTANTS` class
> - **1.1-Preprocessing.ipynb**: Preprocessing on dataset (Cleaning, ...)
> - **1.2-Morphological-Analyzer-Classifier.ipynb**: A classifier using :
> - Portal Stemmer
> - TF-IDF
> - Naive Bayes
### ***Main contents***
-
**Data folder**
: Containing dataset and train and test sets
-
**Constants.py**
: Some fixed values to use in other files as
`CONSTANTS`
class
-
**Helpers.py**
: Some helper functions to use in other files as
`HELPERS`
class
-
**Other .ipynb files**
: Jupyter notebooks containing actual work
## ***How to run***
> **Using [Anaconda](https://www.anaconda.com/)**
1.
**Clone this repository**
```bash
git clone git@git.hiast.edu.sy:almohanad.hafez/nlp-project.git
```
1.
**Open Anaconda Prompt**
1.
**Navigate to the repository directory in your machine `cd <path_to_repository_directory>/nlp-project`**
**If you want to navigate to a disk other than C:\, let's say D, you can run `D:\`**
1.
**Create the Conda Environment from the .yml File**
```bash
conda env create -f conda_nlp_environment.yml
```
1.
**Activate the New Environment**
```bash
conda activate NLP
```
1.
**Open jupyter notebook**
```bash
jupyter notebook
```
## ***Part1. Basic morphological analyzer***
conda_nlp_environment.yml
0 → 100644
View file @
613e9fdc
name
:
NLP
channels
:
-
defaults
dependencies
:
-
annotated-types=0.6.0=py39haa95532_0
-
anyio=4.6.2=py39haa95532_0
-
argon2-cffi=21.3.0=pyhd3eb1b0_0
-
argon2-cffi-bindings=21.2.0=py39h2bbff1b_0
-
asttokens=2.0.5=pyhd3eb1b0_0
-
async-lru=2.0.4=py39haa95532_0
-
attrs=24.2.0=py39haa95532_0
-
babel=2.11.0=py39haa95532_0
-
backcall=0.2.0=pyhd3eb1b0_0
-
beautifulsoup4=4.12.3=py39haa95532_0
-
blas=1.0=mkl
-
bleach=4.1.0=pyhd3eb1b0_0
-
bottleneck=1.3.7=py39h9128911_0
-
brotli-python=1.0.9=py39hd77b12b_8
-
ca-certificates=2024.9.24=haa95532_0
-
catalogue=2.0.10=py39haa95532_0
-
certifi=2024.8.30=py39haa95532_0
-
cffi=1.17.1=py39h827c3e9_0
-
charset-normalizer=3.3.2=pyhd3eb1b0_0
-
click=8.1.7=py39haa95532_0
-
cloudpathlib=0.16.0=py39haa95532_1
-
colorama=0.4.6=py39haa95532_0
-
comm=0.2.1=py39haa95532_0
-
confection=0.1.4=py39h9909e9c_0
-
cymem=2.0.6=py39hd77b12b_0
-
cython-blis=0.7.9=py39h080aedc_0
-
debugpy=1.6.7=py39hd77b12b_0
-
decorator=5.1.1=pyhd3eb1b0_0
-
defusedxml=0.7.1=pyhd3eb1b0_0
-
exceptiongroup=1.2.0=py39haa95532_0
-
executing=0.8.3=pyhd3eb1b0_0
-
h11=0.14.0=py39haa95532_0
-
httpcore=1.0.2=py39haa95532_0
-
httpx=0.27.0=py39haa95532_0
-
icu=73.1=h6c2663c_0
-
idna=3.7=py39haa95532_0
-
importlib-metadata=7.0.1=py39haa95532_0
-
importlib_metadata=7.0.1=hd3eb1b0_0
-
intel-openmp=2023.1.0=h59b6b97_46320
-
ipykernel=6.29.5=py39haa95532_0
-
ipython=8.15.0=py39haa95532_0
-
ipywidgets=8.1.2=py39haa95532_0
-
jedi=0.19.1=py39haa95532_0
-
jinja2=3.1.4=py39haa95532_0
-
joblib=1.4.2=py39haa95532_0
-
jpeg=9e=h827c3e9_3
-
json5=0.9.6=pyhd3eb1b0_0
-
jsonschema=4.23.0=py39haa95532_0
-
jsonschema-specifications=2023.7.1=py39haa95532_0
-
jupyter=1.0.0=py39haa95532_9
-
jupyter-lsp=2.2.0=py39haa95532_0
-
jupyter_client=8.6.0=py39haa95532_0
-
jupyter_console=6.6.3=py39haa95532_0
-
jupyter_core=5.7.2=py39haa95532_0
-
jupyter_events=0.10.0=py39haa95532_0
-
jupyter_server=2.14.1=py39haa95532_0
-
jupyter_server_terminals=0.4.4=py39haa95532_1
-
jupyterlab=4.2.5=py39haa95532_0
-
jupyterlab_pygments=0.1.2=py_0
-
jupyterlab_server=2.27.3=py39haa95532_0
-
jupyterlab_widgets=3.0.10=py39haa95532_0
-
krb5=1.20.1=h5b6d351_0
-
langcodes=3.3.0=pyhd3eb1b0_0
-
libclang=14.0.6=default_hb5a9fac_1
-
libclang13=14.0.6=default_h8e68704_1
-
libpng=1.6.39=h8cc25b3_0
-
libpq=12.17=h906ac69_0
-
libsodium=1.0.18=h62dcd97_0
-
lz4-c=1.9.4=h2bbff1b_1
-
markdown-it-py=2.2.0=py39haa95532_1
-
markupsafe=2.1.3=py39h2bbff1b_0
-
matplotlib-inline=0.1.6=py39haa95532_0
-
mdurl=0.1.0=py39haa95532_0
-
mistune=2.0.4=py39haa95532_0
-
mkl=2023.1.0=h6b88ed4_46358
-
mkl-service=2.4.0=py39h2bbff1b_1
-
mkl_fft=1.3.10=py39h827c3e9_0
-
mkl_random=1.2.7=py39hc64d2fc_0
-
murmurhash=1.0.7=py39hd77b12b_0
-
nbclient=0.8.0=py39haa95532_0
-
nbconvert=7.16.4=py39haa95532_0
-
nbformat=5.10.4=py39haa95532_0
-
nest-asyncio=1.6.0=py39haa95532_0
-
nltk=3.9.1=py39haa95532_0
-
notebook=7.2.2=py39haa95532_1
-
notebook-shim=0.2.3=py39haa95532_0
-
numexpr=2.10.1=py39h4cd664f_0
-
numpy=1.26.4=py39h055cbcc_0
-
numpy-base=1.26.4=py39h65a83cf_0
-
openssl=3.0.15=h827c3e9_0
-
overrides=7.4.0=py39haa95532_0
-
packaging=24.1=py39haa95532_0
-
pandas=2.2.2=py39h5da7b33_0
-
pandocfilters=1.5.0=pyhd3eb1b0_0
-
parso=0.8.3=pyhd3eb1b0_0
-
pickleshare=0.7.5=pyhd3eb1b0_1003
-
pip=24.2=py39haa95532_0
-
platformdirs=3.10.0=py39haa95532_0
-
ply=3.11=py39haa95532_0
-
preshed=3.0.6=py39h6c2663c_0
-
prometheus_client=0.14.1=py39haa95532_0
-
prompt-toolkit=3.0.43=py39haa95532_0
-
prompt_toolkit=3.0.43=hd3eb1b0_0
-
psutil=5.9.0=py39h2bbff1b_0
-
pure_eval=0.2.2=pyhd3eb1b0_0
-
pycparser=2.21=pyhd3eb1b0_0
-
pydantic=2.8.2=py39haa95532_0
-
pydantic-core=2.20.1=py39hefb1915_0
-
pygments=2.15.1=py39haa95532_1
-
pyqt=5.15.10=py39hd77b12b_0
-
pyqt5-sip=12.13.0=py39h2bbff1b_0
-
pysocks=1.7.1=py39haa95532_0
-
python=3.9.20=h8205438_1
-
python-dateutil=2.9.0post0=py39haa95532_2
-
python-fastjsonschema=2.16.2=py39haa95532_0
-
python-json-logger=2.0.7=py39haa95532_0
-
python-tzdata=2023.3=pyhd3eb1b0_0
-
pytz=2024.1=py39haa95532_0
-
pywin32=305=py39h2bbff1b_0
-
pywinpty=2.0.10=py39h5da7b33_0
-
pyyaml=6.0.2=py39h827c3e9_0
-
pyzmq=25.1.2=py39hd77b12b_0
-
qt-main=5.15.2=h19c9488_10
-
qtconsole=5.6.0=py39haa95532_0
-
qtpy=2.4.1=py39haa95532_0
-
referencing=0.30.2=py39haa95532_0
-
regex=2024.9.11=py39h827c3e9_0
-
requests=2.32.3=py39haa95532_0
-
rfc3339-validator=0.1.4=py39haa95532_0
-
rfc3986-validator=0.1.1=py39haa95532_0
-
rich=13.7.1=py39haa95532_0
-
rpds-py=0.10.6=py39h062c2fa_0
-
send2trash=1.8.2=py39haa95532_0
-
setuptools=75.1.0=py39haa95532_0
-
shellingham=1.5.0=py39haa95532_0
-
sip=6.7.12=py39hd77b12b_0
-
six=1.16.0=pyhd3eb1b0_1
-
smart_open=5.2.1=py39haa95532_0
-
sniffio=1.3.0=py39haa95532_0
-
soupsieve=2.5=py39haa95532_0
-
spacy=3.7.2=py39hef0f399_0
-
spacy-legacy=3.0.12=py39haa95532_0
-
spacy-loggers=1.0.4=py39haa95532_0
-
sqlite=3.45.3=h2bbff1b_0
-
srsly=2.4.8=py39hd77b12b_1
-
stack_data=0.2.0=pyhd3eb1b0_0
-
tabulate=0.9.0=py39haa95532_0
-
tbb=2021.8.0=h59b6b97_0
-
terminado=0.17.1=py39haa95532_0
-
thinc=8.2.2=py39hf497b98_0
-
tinycss2=1.2.1=py39haa95532_0
-
tomli=2.0.1=py39haa95532_0
-
tornado=6.4.1=py39h827c3e9_0
-
tqdm=4.66.5=py39h9909e9c_0
-
traitlets=5.14.3=py39haa95532_0
-
typer=0.9.0=py39haa95532_0
-
typing-extensions=4.11.0=py39haa95532_0
-
typing_extensions=4.11.0=py39haa95532_0
-
tzdata=2024b=h04d1e81_0
-
urllib3=2.2.3=py39haa95532_0
-
vc=14.40=h2eaa2aa_1
-
vs2015_runtime=14.40.33807=h98bb1dd_1
-
wasabi=0.9.1=py39haa95532_0
-
wcwidth=0.2.5=pyhd3eb1b0_0
-
weasel=0.3.4=py39haa95532_0
-
webencodings=0.5.1=py39haa95532_1
-
websocket-client=1.8.0=py39haa95532_0
-
wheel=0.44.0=py39haa95532_0
-
widgetsnbextension=4.0.10=py39haa95532_0
-
win_inet_pton=1.1.0=py39haa95532_0
-
winpty=0.4.3=4
-
xz=5.4.6=h8cc25b3_1
-
yaml=0.2.5=he774522_0
-
zeromq=4.3.5=hd77b12b_0
-
zipp=3.20.2=py39haa95532_0
-
zlib=1.2.13=h8cc25b3_1
-
zstd=1.5.6=h8880b57_0
-
pip
:
-
anyascii==0.3.2
-
contractions==0.1.73
-
gensim==4.3.3
-
huggingface-hub==0.26.2
-
pyahocorasick==2.1.0
-
safetensors==0.4.5
-
scikit-learn==1.5.2
-
textblob==0.18.0.post0
-
textsearch==0.0.24
-
threadpoolctl==3.5.0
-
tokenizers==0.20.1
-
transformers==4.46.1
prefix
:
D:\Programs\anaconda3\envs\NLP
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment