Conversions scripts and resources between established formats etc.
Go to file
2023-08-16 16:41:02 +02:00
conversion_utils Allowed for empty misc conllu column 2023-08-16 16:41:02 +02:00
scripts Replaced JOS XML specifications with preprocessed pickle 2021-09-30 00:22:43 +02:00
.gitignore Added NER + SRL to conllu_to_tei script 2023-02-17 16:24:02 +01:00
MANIFEST.in Moved files from nova_slovnica and data_admin repositories 2021-12-07 18:47:49 +01:00
README.md Added README file with JOS conversion and translation examples 2021-10-12 18:17:13 +02:00
setup.py Replace deprecated code and add missing dependency 2023-08-09 18:08:21 +02:00

Conversion utilities

This repository is currently intended for common conversions needed by CJVT developers. For the moment, this is limited to JOS msds and properties.

JOS msds and properties

You can use the Msd and Properties to encode, and convert between the two types in either direction (specifying both the input and output language (English or Slovene)) or just translate between English and Slovene. Example usage:

>>> from conversion_utils.jos_msds_and_properties import Converter, Msd, Properties
>>> converter = Converter()
>>> msd = Msd('Sozem', 'sl')
>>> msd.code
'Sozem'
>>> msd.language
'sl'
>>> properties = converter.msd_to_properties(msd, 'en')
>>> properties.category
'noun'
>>> properties.lexeme_feature_map
{'type': 'common', 'gender': 'feminine'}
>>> properties.form_feature_map
{'number': 'singular', 'case': 'locative'}
>>> properties.language
'en'
>>> msd2 = converter.properties_to_msd(properties, 'sl')
>>> msd2 == msd
True
>>> print(converter.translate_msd(msd, 'en'))
code=Ncfsl, language=en
>>> print(converter.translate_properties(properties, 'sl'))
language=sl, category=samostalnik, lexeme features={'vrsta': 'občno_ime', 'spol': 'ženski'}, form_features={'število': 'ednina', 'sklon': 'mestnik'}