Conversions scripts and resources between established formats etc.
You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
Go to file
Cyprian Laskowski 03ce9f8ac7
Added rudimentary module documentation and made a couple of basic fixes
7 months ago
conversion_utils Added rudimentary module documentation and made a couple of basic fixes 7 months ago
scripts Added rudimentary module documentation and made a couple of basic fixes 7 months ago
.gitignore Added NER + SRL to conllu_to_tei script 1 year ago
MANIFEST.in Moved files from nova_slovnica and data_admin repositories 2 years ago
README.md Added rudimentary module documentation and made a couple of basic fixes 7 months ago
setup.py Added rudimentary module documentation and made a couple of basic fixes 7 months ago

README.md

CJVT conversion utilities

This repository is intended for common conversions needed by CJVT developers. It can of course also be used more broadly, but most of the scripts (with the exception of jos_msds_and_properties.py) were written with specific tasks in mind, and may not generalise as expected. Use at your own risk.

JOS msds and properties

You can use the Msd and Properties to encode, and convert between the two types in either direction (specifying both the input and output language (English or Slovene)) or just translate between English and Slovene. Example usage:

>>> from conversion_utils.jos_msds_and_properties import Converter, Msd, Properties
>>> converter = Converter()
>>> msd = Msd('Sozem', 'sl')
>>> msd.code
'Sozem'
>>> msd.language
'sl'
>>> properties = converter.msd_to_properties(msd, 'en')
>>> properties.category
'noun'
>>> properties.lexeme_feature_map
{'type': 'common', 'gender': 'feminine'}
>>> properties.form_feature_map
{'number': 'singular', 'case': 'locative'}
>>> properties.language
'en'
>>> msd2 = converter.properties_to_msd(properties, 'sl')
>>> msd2 == msd
True
>>> print(converter.translate_msd(msd, 'en'))
code=Ncfsl, language=en
>>> print(converter.translate_properties(properties, 'sl'))
language=sl, category=samostalnik, lexeme features={'vrsta': 'občno_ime', 'spol': 'ženski'}, form_features={'število': 'ednina', 'sklon': 'mestnik'}