Commit 613e9fdc authored by Almouhannad Hafez's avatar Almouhannad Hafez

Update README.md

parent 52736320
test.ipynb test.ipynb
clf-test.ipynb __pycache__/constants.cpython-39.pyc
__pycache__/constants.cpython-311.pyc
# ***NLP course project*** # ***NLP course project***
***Part1-Morphological analyzer***
***Almouhannad Hafez + Mariam Khierbek*** ***Almouhannad Hafez + Mariam Khierbek***
> ***Description:***
> Classifying symptom (as a text data) into a disease **[Description](#description)**
**[How to run](#how-to-run)**
**[Part1. Basic morphological analyzer](#part1-basic-morphological-analyzer)**
## ***Description***
**Classifying symptom (as a text data) into a disease**
> [Dataset link](https://www.kaggle.com/datasets/niyarrbarman/symptom2disease) > [Dataset link](https://www.kaggle.com/datasets/niyarrbarman/symptom2disease)
----
> ***Contents:***
> - **Data folder**: Containing dataset and train and test sets
> - **Constants.py**: Some fixed values to use in other files as `CONSTANTS` class
> - **1.1-Preprocessing.ipynb**: Preprocessing on dataset (Cleaning, ...)
> - **1.2-Morphological-Analyzer-Classifier.ipynb**: A classifier using :
> - Portal Stemmer
> - TF-IDF
> - Naive Bayes
### ***Main contents***
- **Data folder**: Containing dataset and train and test sets
- **Constants.py**: Some fixed values to use in other files as `CONSTANTS` class
- **Helpers.py**: Some helper functions to use in other files as `HELPERS` class
- **Other .ipynb files**: Jupyter notebooks containing actual work
## ***How to run***
> **Using [Anaconda](https://www.anaconda.com/)**
1. **Clone this repository**
```bash
git clone git@git.hiast.edu.sy:almohanad.hafez/nlp-project.git
```
1. **Open Anaconda Prompt**
1. **Navigate to the repository directory in your machine `cd <path_to_repository_directory>/nlp-project`**
**If you want to navigate to a disk other than C:\, let's say D, you can run `D:\`**
1. **Create the Conda Environment from the .yml File**
```bash
conda env create -f conda_nlp_environment.yml
```
1. **Activate the New Environment**
```bash
conda activate NLP
```
1. **Open jupyter notebook**
```bash
jupyter notebook
```
## ***Part1. Basic morphological analyzer***
name: NLP
channels:
- defaults
dependencies:
- annotated-types=0.6.0=py39haa95532_0
- anyio=4.6.2=py39haa95532_0
- argon2-cffi=21.3.0=pyhd3eb1b0_0
- argon2-cffi-bindings=21.2.0=py39h2bbff1b_0
- asttokens=2.0.5=pyhd3eb1b0_0
- async-lru=2.0.4=py39haa95532_0
- attrs=24.2.0=py39haa95532_0
- babel=2.11.0=py39haa95532_0
- backcall=0.2.0=pyhd3eb1b0_0
- beautifulsoup4=4.12.3=py39haa95532_0
- blas=1.0=mkl
- bleach=4.1.0=pyhd3eb1b0_0
- bottleneck=1.3.7=py39h9128911_0
- brotli-python=1.0.9=py39hd77b12b_8
- ca-certificates=2024.9.24=haa95532_0
- catalogue=2.0.10=py39haa95532_0
- certifi=2024.8.30=py39haa95532_0
- cffi=1.17.1=py39h827c3e9_0
- charset-normalizer=3.3.2=pyhd3eb1b0_0
- click=8.1.7=py39haa95532_0
- cloudpathlib=0.16.0=py39haa95532_1
- colorama=0.4.6=py39haa95532_0
- comm=0.2.1=py39haa95532_0
- confection=0.1.4=py39h9909e9c_0
- cymem=2.0.6=py39hd77b12b_0
- cython-blis=0.7.9=py39h080aedc_0
- debugpy=1.6.7=py39hd77b12b_0
- decorator=5.1.1=pyhd3eb1b0_0
- defusedxml=0.7.1=pyhd3eb1b0_0
- exceptiongroup=1.2.0=py39haa95532_0
- executing=0.8.3=pyhd3eb1b0_0
- h11=0.14.0=py39haa95532_0
- httpcore=1.0.2=py39haa95532_0
- httpx=0.27.0=py39haa95532_0
- icu=73.1=h6c2663c_0
- idna=3.7=py39haa95532_0
- importlib-metadata=7.0.1=py39haa95532_0
- importlib_metadata=7.0.1=hd3eb1b0_0
- intel-openmp=2023.1.0=h59b6b97_46320
- ipykernel=6.29.5=py39haa95532_0
- ipython=8.15.0=py39haa95532_0
- ipywidgets=8.1.2=py39haa95532_0
- jedi=0.19.1=py39haa95532_0
- jinja2=3.1.4=py39haa95532_0
- joblib=1.4.2=py39haa95532_0
- jpeg=9e=h827c3e9_3
- json5=0.9.6=pyhd3eb1b0_0
- jsonschema=4.23.0=py39haa95532_0
- jsonschema-specifications=2023.7.1=py39haa95532_0
- jupyter=1.0.0=py39haa95532_9
- jupyter-lsp=2.2.0=py39haa95532_0
- jupyter_client=8.6.0=py39haa95532_0
- jupyter_console=6.6.3=py39haa95532_0
- jupyter_core=5.7.2=py39haa95532_0
- jupyter_events=0.10.0=py39haa95532_0
- jupyter_server=2.14.1=py39haa95532_0
- jupyter_server_terminals=0.4.4=py39haa95532_1
- jupyterlab=4.2.5=py39haa95532_0
- jupyterlab_pygments=0.1.2=py_0
- jupyterlab_server=2.27.3=py39haa95532_0
- jupyterlab_widgets=3.0.10=py39haa95532_0
- krb5=1.20.1=h5b6d351_0
- langcodes=3.3.0=pyhd3eb1b0_0
- libclang=14.0.6=default_hb5a9fac_1
- libclang13=14.0.6=default_h8e68704_1
- libpng=1.6.39=h8cc25b3_0
- libpq=12.17=h906ac69_0
- libsodium=1.0.18=h62dcd97_0
- lz4-c=1.9.4=h2bbff1b_1
- markdown-it-py=2.2.0=py39haa95532_1
- markupsafe=2.1.3=py39h2bbff1b_0
- matplotlib-inline=0.1.6=py39haa95532_0
- mdurl=0.1.0=py39haa95532_0
- mistune=2.0.4=py39haa95532_0
- mkl=2023.1.0=h6b88ed4_46358
- mkl-service=2.4.0=py39h2bbff1b_1
- mkl_fft=1.3.10=py39h827c3e9_0
- mkl_random=1.2.7=py39hc64d2fc_0
- murmurhash=1.0.7=py39hd77b12b_0
- nbclient=0.8.0=py39haa95532_0
- nbconvert=7.16.4=py39haa95532_0
- nbformat=5.10.4=py39haa95532_0
- nest-asyncio=1.6.0=py39haa95532_0
- nltk=3.9.1=py39haa95532_0
- notebook=7.2.2=py39haa95532_1
- notebook-shim=0.2.3=py39haa95532_0
- numexpr=2.10.1=py39h4cd664f_0
- numpy=1.26.4=py39h055cbcc_0
- numpy-base=1.26.4=py39h65a83cf_0
- openssl=3.0.15=h827c3e9_0
- overrides=7.4.0=py39haa95532_0
- packaging=24.1=py39haa95532_0
- pandas=2.2.2=py39h5da7b33_0
- pandocfilters=1.5.0=pyhd3eb1b0_0
- parso=0.8.3=pyhd3eb1b0_0
- pickleshare=0.7.5=pyhd3eb1b0_1003
- pip=24.2=py39haa95532_0
- platformdirs=3.10.0=py39haa95532_0
- ply=3.11=py39haa95532_0
- preshed=3.0.6=py39h6c2663c_0
- prometheus_client=0.14.1=py39haa95532_0
- prompt-toolkit=3.0.43=py39haa95532_0
- prompt_toolkit=3.0.43=hd3eb1b0_0
- psutil=5.9.0=py39h2bbff1b_0
- pure_eval=0.2.2=pyhd3eb1b0_0
- pycparser=2.21=pyhd3eb1b0_0
- pydantic=2.8.2=py39haa95532_0
- pydantic-core=2.20.1=py39hefb1915_0
- pygments=2.15.1=py39haa95532_1
- pyqt=5.15.10=py39hd77b12b_0
- pyqt5-sip=12.13.0=py39h2bbff1b_0
- pysocks=1.7.1=py39haa95532_0
- python=3.9.20=h8205438_1
- python-dateutil=2.9.0post0=py39haa95532_2
- python-fastjsonschema=2.16.2=py39haa95532_0
- python-json-logger=2.0.7=py39haa95532_0
- python-tzdata=2023.3=pyhd3eb1b0_0
- pytz=2024.1=py39haa95532_0
- pywin32=305=py39h2bbff1b_0
- pywinpty=2.0.10=py39h5da7b33_0
- pyyaml=6.0.2=py39h827c3e9_0
- pyzmq=25.1.2=py39hd77b12b_0
- qt-main=5.15.2=h19c9488_10
- qtconsole=5.6.0=py39haa95532_0
- qtpy=2.4.1=py39haa95532_0
- referencing=0.30.2=py39haa95532_0
- regex=2024.9.11=py39h827c3e9_0
- requests=2.32.3=py39haa95532_0
- rfc3339-validator=0.1.4=py39haa95532_0
- rfc3986-validator=0.1.1=py39haa95532_0
- rich=13.7.1=py39haa95532_0
- rpds-py=0.10.6=py39h062c2fa_0
- send2trash=1.8.2=py39haa95532_0
- setuptools=75.1.0=py39haa95532_0
- shellingham=1.5.0=py39haa95532_0
- sip=6.7.12=py39hd77b12b_0
- six=1.16.0=pyhd3eb1b0_1
- smart_open=5.2.1=py39haa95532_0
- sniffio=1.3.0=py39haa95532_0
- soupsieve=2.5=py39haa95532_0
- spacy=3.7.2=py39hef0f399_0
- spacy-legacy=3.0.12=py39haa95532_0
- spacy-loggers=1.0.4=py39haa95532_0
- sqlite=3.45.3=h2bbff1b_0
- srsly=2.4.8=py39hd77b12b_1
- stack_data=0.2.0=pyhd3eb1b0_0
- tabulate=0.9.0=py39haa95532_0
- tbb=2021.8.0=h59b6b97_0
- terminado=0.17.1=py39haa95532_0
- thinc=8.2.2=py39hf497b98_0
- tinycss2=1.2.1=py39haa95532_0
- tomli=2.0.1=py39haa95532_0
- tornado=6.4.1=py39h827c3e9_0
- tqdm=4.66.5=py39h9909e9c_0
- traitlets=5.14.3=py39haa95532_0
- typer=0.9.0=py39haa95532_0
- typing-extensions=4.11.0=py39haa95532_0
- typing_extensions=4.11.0=py39haa95532_0
- tzdata=2024b=h04d1e81_0
- urllib3=2.2.3=py39haa95532_0
- vc=14.40=h2eaa2aa_1
- vs2015_runtime=14.40.33807=h98bb1dd_1
- wasabi=0.9.1=py39haa95532_0
- wcwidth=0.2.5=pyhd3eb1b0_0
- weasel=0.3.4=py39haa95532_0
- webencodings=0.5.1=py39haa95532_1
- websocket-client=1.8.0=py39haa95532_0
- wheel=0.44.0=py39haa95532_0
- widgetsnbextension=4.0.10=py39haa95532_0
- win_inet_pton=1.1.0=py39haa95532_0
- winpty=0.4.3=4
- xz=5.4.6=h8cc25b3_1
- yaml=0.2.5=he774522_0
- zeromq=4.3.5=hd77b12b_0
- zipp=3.20.2=py39haa95532_0
- zlib=1.2.13=h8cc25b3_1
- zstd=1.5.6=h8880b57_0
- pip:
- anyascii==0.3.2
- contractions==0.1.73
- gensim==4.3.3
- huggingface-hub==0.26.2
- pyahocorasick==2.1.0
- safetensors==0.4.5
- scikit-learn==1.5.2
- textblob==0.18.0.post0
- textsearch==0.0.24
- threadpoolctl==3.5.0
- tokenizers==0.20.1
- transformers==4.46.1
prefix: D:\Programs\anaconda3\envs\NLP
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment