52 lines
2.0 KiB
Markdown
52 lines
2.0 KiB
Markdown
|
# KOST
|
||
|
## Instructions
|
||
|
- Create a new virtual environment (using i.e. `virtualenv`)
|
||
|
- Run `pip install -r requirements.txt`, to install necessary libraries.
|
||
|
- Using python console download classla models (used for annotation and part of tokenization):
|
||
|
```python
|
||
|
import classla
|
||
|
classla.download(lang='sl', type='standard_jos')
|
||
|
```
|
||
|
- Run `svala2tei.py` script.
|
||
|
|
||
|
### Example
|
||
|
```python
|
||
|
python svala2tei.py --svala_folder data_sample/KOST/svala_small --raw_text data_sample/KOST/raw_small --results_folder data_sample/KOST/results_small --texts_metadata data_sample/KOST/texts_metadata5.csv --authors_metadata data_sample/KOST/authors_metadata5.csv --teachers_metadata data_sample/KOST/teachers_metadata.csv --translations data_sample/KOST/translations.csv --tokenization_interprocessing data_sample/processing.tokenization --annotation_interprocessing data_sample/processing.annotation --overwrite_tokenization --overwrite_annotation
|
||
|
```
|
||
|
|
||
|
## Parameter descriptions
|
||
|
### --svala_folder
|
||
|
Path to directory with `*.svala` files.
|
||
|
|
||
|
### --results_folder
|
||
|
Destination of results folder.
|
||
|
|
||
|
### --raw_text
|
||
|
Path to directory that contains raw texts.
|
||
|
|
||
|
### --texts_metadata
|
||
|
Location of metadata csv that contains information about texts.
|
||
|
|
||
|
### --authors_metadata
|
||
|
Location of metadata csv that contains information about authors.
|
||
|
|
||
|
### --teachers_metadata
|
||
|
Location of metadata csv that contains information about teachers.
|
||
|
|
||
|
### --translations
|
||
|
Path to mapper that translates column names in metadata files.
|
||
|
|
||
|
### --tokenization_interprocessing
|
||
|
Path to file where tokenized semi processed data is stored, to be able to proceed with processsing without rerunning whole test.
|
||
|
|
||
|
### --overwrite_tokenization
|
||
|
Tag that forces script to redo tokenization and overrides interprocessing file.
|
||
|
|
||
|
### --annotation_interprocessing
|
||
|
Path to file where annotated semi processed data is stored, to be able to proceed with processsing without rerunning whole test.
|
||
|
|
||
|
### --overwrite_annotation
|
||
|
Tag that forces script to redo annotation and overrides interprocessing file.
|
||
|
|
||
|
|
||
|
##
|