|
|
|
@ -6,16 +6,13 @@ provisional new structures if necessary.
|
|
|
|
|
|
|
|
|
|
## Installation
|
|
|
|
|
|
|
|
|
|
Installation requires the [CLASSLA](https://github.com/clarinsi/classla) standard_jos models, as
|
|
|
|
|
well as (for now) the wani.py script from
|
|
|
|
|
[luscenje_struktur](https://gitea.cjvt.si/ozbolt/luscenje_struktur):
|
|
|
|
|
Installation requires the [CLASSLA](https://github.com/clarinsi/classla) standard_jos models:
|
|
|
|
|
|
|
|
|
|
pip install .
|
|
|
|
|
python -c "import classla; classla.download('sl', dir='resources/classla', type='standard_jos')"
|
|
|
|
|
curl -o resources/wani.py https://gitea.cjvt.si/ozbolt/luscenje_struktur/raw/branch/master/wani.py
|
|
|
|
|
|
|
|
|
|
The classla directory and wani.py file do not necessarily need to be placed under resources/, but
|
|
|
|
|
the wrapper script scripts/process.py assumes that they are.
|
|
|
|
|
The classla directory does not necessarily need to be placed under resources/, but the wrapper
|
|
|
|
|
script scripts/process.py assumes that it is.
|
|
|
|
|
|
|
|
|
|
## Usage
|
|
|
|
|
|
|
|
|
@ -44,12 +41,11 @@ $ python process.py -mode strings_to_parse -infile /tmp/strings.txt -outfile /tm
|
|
|
|
|
|
|
|
|
|
The input should be a TEI XML file (in the same particular format as
|
|
|
|
|
the output of strings_to_parse) and an xml file of structure
|
|
|
|
|
specifications. The script first uses the MWE extraction script
|
|
|
|
|
[wani.py](https://gitea.cjvt.si/ozbolt/luscenje_struktur) to find and
|
|
|
|
|
assign all matches for collocation structures. For units without such
|
|
|
|
|
matches, it then finds (creating, if necessary) and assigns
|
|
|
|
|
single-component or other structures. Finally the TEI is converted to
|
|
|
|
|
CJVT dictionary XML format. Example:
|
|
|
|
|
specifications. The script first uses the MWE extraction script to
|
|
|
|
|
find and assign all matches for collocation structures. For units
|
|
|
|
|
without such matches, it then finds (creating, if necessary) and
|
|
|
|
|
assigns single-component or other structures. Finally the TEI is
|
|
|
|
|
converted to CJVT dictionary XML format. Example:
|
|
|
|
|
|
|
|
|
|
```
|
|
|
|
|
$ python process.py -mode parse_to_dictionary -infile /tmp/parsed.xml -instructs /tmp/structures_old.xml -outfile /tmp/dictionary.xml -outstructs /tmp/structures_new.xml
|
|
|
|
|