You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
 
 
 
 
voje a6cee3d459 migrated to cjvt-gitea 2 years ago
..
featuresets added mate-tools srl parser 3 years ago
lib added mate-tools srl parser 3 years ago
scripts srl taggin pipeline (output in .tsv) 2 years ago
LICENSE added mate-tools srl parser 3 years ago
README.md parser.py can read kres and/or ssj500k 3 years ago
ger-eval.out srl taggin pipeline (output in .tsv) 2 years ago
ger-eval.out.tmp finished parse + tag toolchain -> TODO: tagger error 2 years ago
srl-ger.model asdf 2 years ago
srl-src.jar added mate-tools srl parser 3 years ago
srl.jar added mate-tools srl parser 3 years ago
tag_all.sh added tools.cfg for configurable paths 2 years ago

README.md

mate-tools

Using Full srl pipeline (including anna-3.3) from the Downloads section. Benchmarking the tool for slo and hr: [2] (submodule of this repo).

Mate-tool for srl tagging can be found in ./tools/srl-20131216/.

train

Create the model-file:

--help output:

java -cp srl.jar se.lth.cs.srl.Learn --help
Not enough arguments, aborting.
Usage:
 java -cp <classpath> se.lth.cs.srl.Learn <lang> <input-corpus> <model-file> [options]

Example:
 java -cp srl.jar:lib/liblinear-1.51-with-deps.jar se.lth.cs.srl.Learn eng ~/corpora/eng/CoNLL2009-ST-English-train.txt eng-srl.mdl -reranker -fdir ~/features/eng -llbinary ~/liblinear-1.6/train

 trains a complete pipeline and reranker based on the corpus and saves it to eng-srl.mdl

<lang> corresponds to the language and is one of
 chi, eng, ger

Options:
 -aibeam <int>     the size of the ai-beam for the reranker
 -acbeam <int>     the size of the ac-beam for the reranker
 -help             prints this message

Learning-specific options:
 -fdir <dir>             the directory with feature files (see below)
 -reranker               trains a reranker also (not done by default)
 -llbinary <file>        a reference to a precompiled version of liblinear,
                         makes training much faster than the java version.
 -partitions <int>       number of partitions used for the reranker
 -dontInsertGold         don't insert the gold standard proposition during
                         training of the reranker.
 -skipUnknownPredicates  skips predicates not matching any POS-tags from
                         the feature files.
 -dontDeleteTrainData    doesn't delete the temporary files from training
                         on exit. (For debug purposes)
 -ndPipeline             Causes the training data and feature mappings to be
                         derived in a non-deterministic way. I.e. training the pipeline
                         on the same corpus twice does not yield the exact same models.
                         This is however slightly faster.

The feature file dir needs to contain four files with feature sets. See
the website for further documentation. The files are called
pi.feats, pd.feats, ai.feats, and ac.feats
All need to be in the feature file dir, otherwise you will get an error.

Input: lang, input-corpus.

parse

--help output:

$ java -cp srl.jar se.lth.cs.srl.Parse --help
Not enough arguments, aborting.
Usage:
 java -cp <classpath> se.lth.cs.srl.Parse <lang> <input-corpus> <model-file> [options] <output>

Example:
 java -cp srl.jar:lib/liblinear-1.51-with-deps.jarse.lth.cs.srl.Parse eng ~/corpora/eng/CoNLL2009-ST-English-evaluation-SRLonly.txt eng-srl.mdl -reranker -nopi -alfa 1.0 eng-eval.out

 parses in the input corpus using the model eng-srl.mdl and saves it to eng-eval.out, using a reranker and skipping the predicate identification step

<lang> corresponds to the language and is one of
 chi, eng, ger

Options:
 -aibeam <int>     the size of the ai-beam for the reranker
 -acbeam <int>     the size of the ac-beam for the reranker
 -help             prints this message

Parsing-specific options:
 -nopi           skips the predicate identification. This is equivalent to the
                 setting in the CoNLL 2009 ST.
 -reranker       uses a reranker (assumed to be included in the model)
 -alfa <double>  the alfa used by the reranker. (default 1.0)

We need to provide lang (ger for German feature functions?), input-corpus and model (see train).

input data:

  • ssj500k data found in ./bilateral-srl/data/sl/sl.{test,train}; formatted for mate-tools usage in ./bilaterla-srl/tools/mate-tools/sl.{test,train}.mate (line counts match);

Sources