diff --git a/README.md b/README.md index 7dab919..1207694 100644 --- a/README.md +++ b/README.md @@ -40,20 +40,15 @@ $ python scripts/process.py -mode strings_to_parse -infile /tmp/strings.txt -out The input should be a TEI XML file (in the same particular format as the output of strings_to_parse) and an xml file of structure -specifications. It first splits the TEI file into two files, one with -the single-component units and the other with the multiple-component -units. For each, it then assigns each unit to a syntactic structure -from the DDD database and converts the output into CJVT dictionary XML -format. For the single-component units, this is pretty trivial, but -for multiple-component units it is more involved, and includes two -runs of the MWE extraction script -[wani.py](https://gitea.cjvt.si/ozbolt/luscenje_struktur), generating -missing structures in between. At the end, the single-component and -multiple-component dictionary files are merged into one dictionary -file. Example: +specifications. The script first uses the MWE extraction script +[wani.py](https://gitea.cjvt.si/ozbolt/luscenje_struktur) to find and +assign all matches for collocation structures. For units without such +matches, it then finds (creating, if necessary) and assigns +single-component or other structures. Finally the TEI is converted to +CJVT dictionary XML format. Example: ``` -$ python scripts/process.py -mode parse_to_dictionary -infile /tmp/parsed.xml -instructs /tmp/structures_old.xml -outfile /tmp/dictionary.xml -structures /tmp/structures_new.xml +$ python scripts/process.py -mode parse_to_dictionary -infile /tmp/parsed.xml -instructs /tmp/structures_old.xml -outfile /tmp/dictionary.xml -outstructs /tmp/structures_new.xml ``` ### strings_to_dictionary @@ -97,6 +92,6 @@ Note that any new structures generated are given temporary ids (@tempId), because they only get real ids once they are approved and added to the DDD database. That is normally done via the django import_structures.py script in the [ddd_core -repository](https://gitea.cjvt.si/ddd/ddd_core), as part of a batch of -DDD updates. That script replaces the temporary ids in the structure -specifications and updates the ids in the dictionary file. +repository](https://gitea.cjvt.si/ddd/ddd_core), which replaces the +temporary ids in the structure specifications and updates the ids in +the dictionary file.