Compare commits

..

No commits in common. "master" and "master" have entirely different histories.

4 changed files with 19 additions and 181 deletions

1
.gitignore vendored
View File

@ -2,4 +2,3 @@
data/
__pycache__/
venv/
.venv

118
README.md
View File

@ -3,20 +3,20 @@
## Description
This tool uses classla library as an API. It allows for calls on some preset classla settings, as well as a custom one.
## Slovenian Standard UD
## Standard UD
Preset classla settings:
```json
{
"lang": "sl",
"pos_use_lexicon": true
"pos_lemma_pretag": false
}
```
Usage example:
```commandline
curl -X POST -d '{"text": "France Prešeren je rojen v Vrbi."}' https://orodja.cjvt.si/oznacevalnik/standard-ud
curl -X POST -d '{"text": "France Prešeren je rojen v Vrbi."}' http://127.0.0.1:5000/standard-ud
```
## Slovenian Standard JOS
## Standard JOS
Preset classla settings:
```json
{
@ -27,24 +27,24 @@ Preset classla settings:
```
Usage example:
```commandline
curl -X POST -d '{"text": "France Prešeren je rojen v Vrbi."}' https://orodja.cjvt.si/oznacevalnik/standard-jos
curl -X POST -d '{"text": "France Prešeren je rojen v Vrbi."}' http://127.0.0.1:5000/standard-jos
```
## Slovenian Nonstandard UD
## Nonstandard UD
Preset classla settings:
```json
{
"lang": "sl",
"pos_use_lexicon": true,
"type": "nonstandard"
"type": "nonstandard_jos"
}
```
Usage example:
```commandline
curl -X POST -d '{"text": "kva smo mi zurali zadnje leto v zagrebu..."}' https://orodja.cjvt.si/oznacevalnik/nonstandard-ud
curl -X POST -d '{"text": "kva smo mi zurali zadnje leto v zagrebu..."}' http://127.0.0.1:5000/nonstandard-ud
```
## Slovenian Nonstandard JOS
## Nonstandard JOS
Preset classla settings:
```json
{
@ -61,105 +61,9 @@ Preset classla settings:
```
Usage example:
```commandline
curl -X POST -d '{"text": "kva smo mi zurali zadnje leto v zagrebu..."}' https://orodja.cjvt.si/oznacevalnik/nonstandard-jos
curl -X POST -d '{"text": "kva smo mi zurali zadnje leto v zagrebu..."}' http://127.0.0.1:5000/nonstandard-jos
```
## Slovenian Spoken
Preset classla settings:
```json
{
"lang": "sl",
"pos_use_lexicon": false,
"processors": {
"tokenize": "standard",
"lemma": "spoken",
"pos": "spoken",
"depparse": "spoken",
"ner": "nonstandard"
}
}
```
Usage example:
```commandline
curl -X POST -d '{"text": "kva smo mi zurali zadnje leto v zagrebu..."}' https://orodja.cjvt.si/oznacevalnik/sl-spoken
```
## Croatian Standard UD
Preset classla settings:
```json
{
"lang": "hr",
}
```
Usage example:
```commandline
curl -X POST -d '{"text": "Ante Starčević rođen je u Velikom Žitniku."}' https://orodja.cjvt.si/oznacevalnik/hr-standard-ud
```
## Croatian Nonstandard UD
Preset classla settings:
```json
{
"lang": "hr",
"type": "nonstandard"
}
```
Usage example:
```commandline
curl -X POST -d '{"text": "kaj sam ja tulumaril jucer u ljubljani..."}' https://orodja.cjvt.si/oznacevalnik/hr-nonstandard-ud
```
## Serbian Standard UD
Preset classla settings:
```json
{
"lang": "sr",
}
```
Usage example:
```commandline
curl -X POST -d '{"text": "Slobodan Jovanović rođen je u Novom Sadu."}' https://orodja.cjvt.si/oznacevalnik/sr-standard-ud
```
## Serbian Nonstandard UD
Preset classla settings:
```json
{
"lang": "sr",
"type": "nonstandard"
}
```
Usage example:
```commandline
curl -X POST -d '{"text": "ne mogu da verujem kakvo je zezanje bilo prosle godine u zagrebu..."}' https://orodja.cjvt.si/oznacevalnik/sr-nonstandard-ud
```
## Bulgarian Standard UD
Preset classla settings:
```json
{
"lang": "bg",
}
```
Usage example:
```commandline
curl -X POST -d '{"text": "Алеко Константинов е роден в Свищов."}' https://orodja.cjvt.si/oznacevalnik/bg-standard-ud
```
## Macedonian Standard UD
Preset classla settings:
```json
{
"lang": "mk",
}
```
Usage example:
```commandline
curl -X POST -d '{"text": "Крсте Петков Мисирков е роден во Постол."}' https://orodja.cjvt.si/oznacevalnik/mk-standard-ud
```
## Custom settings
Custom settings may be used, however they have to be in compliance with what the library allows (you can check this on https://github.com/clarinsi/classla)
@ -167,5 +71,5 @@ Custom settings may be used, however they have to be in compliance with what the
Usage example:
```commandline
curl -X POST -d '{"text": "France Prešeren je rojen v Vrbi.", "settings": {"lang": "sl", "pos_lemma_pretag": false}}' https://orodja.cjvt.si/oznacevalnik/custom-settings
curl -X POST -d '{"text": "France Prešeren je rojen v Vrbi.", "settings": {"lang": "sl", "pos_lemma_pretag": false}}' http://127.0.0.1:5000/custom-settings
```

67
app.py
View File

@ -1,18 +1,10 @@
from flask import Flask, request
import classla
import gc
import torch
classla.download('sl')
classla.download('sl', type='standard_jos')
classla.download('sl', type='nonstandard')
classla.download('sl', type='spoken')
classla.download('hr')
classla.download('hr', type='nonstandard')
classla.download('sr')
classla.download('sr', type='nonstandard')
classla.download('mk')
classla.download('bg')
nlp_standard_UD = classla.Pipeline('sl', pos_use_lexicon=True)
nlp_standard_JOS = classla.Pipeline('sl', pos_use_lexicon=True, type='standard_jos')
@ -24,13 +16,6 @@ nlp_nonstandard_JOS = classla.Pipeline('sl', processors={
"depparse": "standard_jos",
"ner": "nonstandard"
})
nlp_standard_SPOKEN = classla.Pipeline('sl', type='spoken')
nlp_hr_standard_UD = classla.Pipeline('hr')
nlp_hr_nonstandard_UD = classla.Pipeline('hr', type='nonstandard')
nlp_sr_standard_UD = classla.Pipeline('sr')
nlp_sr_nonstandard_UD = classla.Pipeline('sr', type='nonstandard')
nlp_bg_standard_UD = classla.Pipeline('bg')
nlp_mk_standard_UD = classla.Pipeline('mk')
app = Flask(__name__)
@ -45,10 +30,10 @@ def custom_settings():
if 'settings' in input_json:
settings = input_json['settings']
nlp = classla.Pipeline(**settings)
# classla.Pipeline('sl', processors='tokenize,pos,lemma', pos_use_lexicon=True)
result = nlp(input_json['text']).to_conll()
del(nlp)
gc.collect()
torch.cuda.empty_cache()
else:
return f'ERROR `settings` were not given!'
return result
@ -82,55 +67,5 @@ def nonstandard_jos():
return doc.to_conll()
@app.route('/sl-spoken', methods=["POST"])
def sl_spoken():
input_json = request.get_json(force=True)
doc = nlp_standard_SPOKEN(input_json['text'])
return doc.to_conll()
@app.route('/hr-standard-ud', methods=["POST"])
def hr_standard_ud():
input_json = request.get_json(force=True)
doc = nlp_hr_standard_UD(input_json['text'])
return doc.to_conll()
@app.route('/hr-nonstandard-ud', methods=["POST"])
def hr_nonstandard_ud():
input_json = request.get_json(force=True)
doc = nlp_hr_nonstandard_UD(input_json['text'])
return doc.to_conll()
@app.route('/sr-standard-ud', methods=["POST"])
def sr_standard_ud():
input_json = request.get_json(force=True)
doc = nlp_sr_standard_UD(input_json['text'])
return doc.to_conll()
@app.route('/sr-nonstandard-ud', methods=["POST"])
def sr_nonstandard_ud():
input_json = request.get_json(force=True)
doc = nlp_sr_nonstandard_UD(input_json['text'])
return doc.to_conll()
@app.route('/bg-standard-ud', methods=["POST"])
def bg_standard_ud():
input_json = request.get_json(force=True)
doc = nlp_bg_standard_UD(input_json['text'])
return doc.to_conll()
@app.route('/mk-standard-ud', methods=["POST"])
def mk_standard_ud():
input_json = request.get_json(force=True)
doc = nlp_mk_standard_UD(input_json['text'])
return doc.to_conll()
if __name__ == '__main__':
app.run(host="0.0.0.0")

View File

@ -1,6 +1,6 @@
certifi==2021.10.8
charset-normalizer==2.0.8
classla==2.2
classla==1.1.0
click==8.0.3
Flask==2.0.2
idna==3.3
@ -9,13 +9,13 @@ itsdangerous==2.0.1
Jinja2==3.0.3
lxml==4.6.4
MarkupSafe==2.0.1
numpy==1.23.0
obeliks==1.1.6
protobuf==4.21.2
numpy==1.21.4
obeliks==1.1.3
protobuf==3.19.1
regex==2021.11.10
reldi-tokeniser==1.0.3
requests==2.28.0
torch==1.12.0
reldi-tokeniser==1.0.0
requests==2.26.0
torch==1.10.0
tqdm==4.62.3
typing_extensions==4.0.1
urllib3==1.26.7