# Instructions For mining ssj500k checkout to branch ssj500k. For running order look at Makefile. Generally it works like this: - tools/parse_all.py - It creates mate file that is necessary for running Java based srl.jar - tools/srl-20131216/tag_all.sh - Tags ssj500k - tools/gen_json.py - Mine SRL to json - tools/gen_tei.py - Mine SRL to tei # cjvt-srl-tagging We'll be using mate-tools to perform SRL on Kres. ## workspace The tools require Java. Go to `./dockerfiles/python-java/` and run `make`. You should get a docker environment, mounting this repo. ## mate-tools Check out `./tools/srl-20131216/README.md`. ## Scripts Check all possible xml tags (that occur after the tag. ``` bash cat F0006347.xml.parsed.xml | grep -A 999999999999 -e '' | grep -o -e '<[^" "]*' | sort | uniq ``` ## Tools * Parser for reading both `SSJ500k 2.1 TEI xml` and `Kres F....xml.parsed.xml"` files found in `./tools/parser/parser.py`. * `fillpred_model` for creating a yes/no model for preditcing the predicate (based on ssj500k data). ## Usage ```bash $ cd ./dockerfiles/python-java` $ make # you should be inside a container now $ cd ./cjvt-srl-tagging $ make ``` If you want to run it on a server overnight, you might want to use `nohup`, so you can close the ssh connection without closing the process. ``` $ nohup make & ``` See progress in generated logfile (check git root). # Makefile The Makefile follows certain steps: 1. Create a fillpred model. 2. Parse `.xml` files and create `.tsv` files. 3. Run *mate-tools srl-tagger* on the created `.tsv` files. ## Sources * [1] (mate-tools) https://code.google.com/archive/p/mate-tools/ * [2] (benchmarking) https://github.com/clarinsi/bilateral-srl * [3] (conll 2008 paper) http://www.aclweb.org/anthology/W08-2121.pdf * [4] (format CoNLL 2009) https://wiki.ufal.ms.mff.cuni.cz/format-conll