# mate-tools Using **Full srl pipeline (including anna-3.3)** from the Downloads section. Benchmarking the tool for slo and hr: [2] (submodule of this repo). Mate-tool for srl tagging can be found in `./tools/srl-20131216/`. ## train Create the `model-file`: `--help` output: ```bash java -cp srl.jar se.lth.cs.srl.Learn --help Not enough arguments, aborting. Usage: java -cp se.lth.cs.srl.Learn [options] Example: java -cp srl.jar:lib/liblinear-1.51-with-deps.jar se.lth.cs.srl.Learn eng ~/corpora/eng/CoNLL2009-ST-English-train.txt eng-srl.mdl -reranker -fdir ~/features/eng -llbinary ~/liblinear-1.6/train trains a complete pipeline and reranker based on the corpus and saves it to eng-srl.mdl corresponds to the language and is one of chi, eng, ger Options: -aibeam the size of the ai-beam for the reranker -acbeam the size of the ac-beam for the reranker -help prints this message Learning-specific options: -fdir the directory with feature files (see below) -reranker trains a reranker also (not done by default) -llbinary a reference to a precompiled version of liblinear, makes training much faster than the java version. -partitions number of partitions used for the reranker -dontInsertGold don't insert the gold standard proposition during training of the reranker. -skipUnknownPredicates skips predicates not matching any POS-tags from the feature files. -dontDeleteTrainData doesn't delete the temporary files from training on exit. (For debug purposes) -ndPipeline Causes the training data and feature mappings to be derived in a non-deterministic way. I.e. training the pipeline on the same corpus twice does not yield the exact same models. This is however slightly faster. The feature file dir needs to contain four files with feature sets. See the website for further documentation. The files are called pi.feats, pd.feats, ai.feats, and ac.feats All need to be in the feature file dir, otherwise you will get an error. ``` Input: `lang`, `input-corpus`. ### parse `--help` output: ```bash $ java -cp srl.jar se.lth.cs.srl.Parse --help Not enough arguments, aborting. Usage: java -cp se.lth.cs.srl.Parse [options] Example: java -cp srl.jar:lib/liblinear-1.51-with-deps.jarse.lth.cs.srl.Parse eng ~/corpora/eng/CoNLL2009-ST-English-evaluation-SRLonly.txt eng-srl.mdl -reranker -nopi -alfa 1.0 eng-eval.out parses in the input corpus using the model eng-srl.mdl and saves it to eng-eval.out, using a reranker and skipping the predicate identification step corresponds to the language and is one of chi, eng, ger Options: -aibeam the size of the ai-beam for the reranker -acbeam the size of the ac-beam for the reranker -help prints this message Parsing-specific options: -nopi skips the predicate identification. This is equivalent to the setting in the CoNLL 2009 ST. -reranker uses a reranker (assumed to be included in the model) -alfa the alfa used by the reranker. (default 1.0) ``` We need to provide `lang` (`ger` for German feature functions?), `input-corpus` and `model` (see train). ## input data: * `ssj500k` data found in `./bilateral-srl/data/sl/sl.{test,train}`; formatted for mate-tools usage in `./bilaterla-srl/tools/mate-tools/sl.{test,train}.mate` (line counts match); ## Sources * [1] (mate-tools) https://code.google.com/archive/p/mate-tools/ * [2] (benchmarking) https://github.com/clarinsi/bilateral-srl * [3] (conll 2008 paper) http://www.aclweb.org/anthology/W08-2121.pdf * [4] (format CoNLL 2009) https://wiki.ufal.ms.mff.cuni.cz/format-conll