2019-01-25 06:06:40 +00:00
|
|
|
# cjvt-srl-tagging
|
2019-01-29 06:55:13 +00:00
|
|
|
We'll be using mate-tools to perform SRL on Kres.
|
|
|
|
|
|
|
|
## workspace
|
|
|
|
The tools require Java.
|
2019-02-24 21:23:32 +00:00
|
|
|
Go to `./dockerfiles/python-java/` and run `make`.
|
|
|
|
You should get a docker environment, mounting this repo.
|
2019-01-25 06:29:52 +00:00
|
|
|
|
|
|
|
## mate-tools
|
2019-02-03 21:54:26 +00:00
|
|
|
Check out `./tools/srl-20131216/README.md`.
|
2019-01-29 06:55:13 +00:00
|
|
|
|
2019-02-03 21:54:26 +00:00
|
|
|
## Scripts
|
|
|
|
Check all possible xml tags (that occur after the <body> tag.
|
2019-02-28 07:20:21 +00:00
|
|
|
``` bash
|
|
|
|
cat F0006347.xml.parsed.xml | grep -A 999999999999 -e '<body>' | grep -o -e '<[^" "]*' | sort | uniq
|
|
|
|
```
|
2019-01-29 06:55:13 +00:00
|
|
|
|
2019-02-03 21:54:26 +00:00
|
|
|
## Tools
|
|
|
|
* Parser for reading both `SSJ500k 2.1 TEI xml` and `Kres F....xml.parsed.xml"` files found in `./tools/parser/parser.py`.
|
2019-02-24 21:23:32 +00:00
|
|
|
* `fillpred_model` for creating a yes/no model for preditcing the predicate (based on ssj500k data).
|
2019-01-29 06:55:13 +00:00
|
|
|
|
2019-02-20 06:38:26 +00:00
|
|
|
## Usage
|
|
|
|
```bash
|
2019-02-24 21:23:32 +00:00
|
|
|
$ cd ./dockerfiles/python-java`
|
2019-02-20 06:38:26 +00:00
|
|
|
$ make
|
|
|
|
# you should be inside a container now
|
2019-02-24 21:23:32 +00:00
|
|
|
$ cd ./cjvt-srl-tagging
|
|
|
|
$ make
|
2019-02-20 06:38:26 +00:00
|
|
|
```
|
|
|
|
|
2019-02-28 07:20:21 +00:00
|
|
|
If you want to run it on a server overnight, you might want to use `nohup`, so you can close the ssh connection without closing the process.
|
|
|
|
```
|
|
|
|
$ nohup make > tagging.log &
|
|
|
|
```
|
|
|
|
|
2019-02-24 21:23:32 +00:00
|
|
|
# Makefile
|
|
|
|
The Makefile follows certain steps:
|
|
|
|
1. Create a fillpred model.
|
|
|
|
2. Parse `.xml` files and create `.tsv` files.
|
|
|
|
3. Run *mate-tools srl-tagger* on the created `.tsv` files.
|
|
|
|
|
2019-01-25 06:29:52 +00:00
|
|
|
|
|
|
|
## Sources
|
2019-01-29 06:56:37 +00:00
|
|
|
* [1] (mate-tools) https://code.google.com/archive/p/mate-tools/
|
|
|
|
* [2] (benchmarking) https://github.com/clarinsi/bilateral-srl
|
2019-02-02 19:45:35 +00:00
|
|
|
* [3] (conll 2008 paper) http://www.aclweb.org/anthology/W08-2121.pdf
|
2019-02-03 21:54:26 +00:00
|
|
|
* [4] (format CoNLL 2009) https://wiki.ufal.ms.mff.cuni.cz/format-conll
|