Commit Graph

99 Commits

Author SHA1 Message Date
307007218d Work to fix #757-104 and #757-89
for word_form all, now removing duplicates
for word_form msd, now word_forms from the collocation, not from whole corpus
determening more specific msd for agreements, so that it gets better match when using backup-lemma representation
for agreements, now ordered by colocation's own number of occurances, not global
removed a bit of debug code
2019-05-29 20:22:22 +02:00
4c2b5f2b13 Updating for lemma representation of word_form. Also cleaning code, adding tqdm,... 2019-05-24 18:15:21 +02:00
3c669c7901 looking for agreements from the whole corpus 2019-05-23 08:13:29 +02:00
e99ba59908 lemma/msd representations now global! Need to also use for agreements 2019-05-22 11:55:51 +02:00
d14efff709 Intermediate UGLY CODE commit. Working more on representations 2019-05-22 11:22:07 +02:00
dce55d04a3 Does not yet work, agreements in representation 2019-05-20 18:14:11 +02:00
5bd0b4a064 correct representation when rep_failed 2019-05-17 20:45:39 +02:00
111512a901 no more structureselection enum 2019-05-17 20:45:10 +02:00
d2f1e95a8f continued work on representation, almost there... 2019-05-16 01:53:38 +02:00
84a184c44d I think this is the way to set representations, all info is available
... just have to actually use it
2019-05-13 10:48:21 +02:00
6eefd9c9f6 redid representation storate, (as prev commit: to make it easier to use)
find_next does not collect representations, no separate
class to parse representation features,
2019-05-13 09:52:29 +02:00
19067e4135 Moving matches into colocation ids, now easier for representation 2019-05-13 08:35:55 +02:00
87712128be joint representation form 2019-05-13 00:26:00 +02:00
401698409e Implementing new output formats, all and normal, no more lemma_only and stuff
Still need to implement representation in normal form.
2019-05-12 23:00:38 +02:00
b4b93022fe Updating for new representations, for now only parsing 2019-05-12 22:13:22 +02:00
de6c73980e adding min-frequency option 2019-02-19 15:04:44 +01:00
93d7af3aea Reversed order sorting 2019-02-19 13:56:32 +01:00
1c9ac7c867 Adding sorting 2019-02-19 11:29:40 +01:00
8107a9f647 Adding parallel execution using subprocesses 2019-02-17 16:01:03 +01:00
dec173ae33 Restucturing, now words are parsed right after loading one file, not after loading all of them. Should be easilly parallelizable now 2019-02-14 14:33:15 +01:00
2f2bb91d0f Supporting different xml:id variations 2019-02-12 17:38:32 +01:00
31483c79ff count-files for more verbose output added 2019-02-12 12:19:21 +01:00
2d373ab477 Adding changable pc tag (when it is c and not pc) 2019-02-12 12:08:30 +01:00
c1e85255c7 msd of <pc> now always N 2019-02-12 11:59:51 +01:00
40db51adf1 msd translate now optional 2019-02-12 11:58:04 +01:00
f89212f7c9 Parsing files as they come instead of parsing all at once.
Thus removed temporary load/save stuff
2019-02-12 11:41:35 +01:00
25f3918170 Loading/Saving to temporary file 2019-02-09 13:40:57 +01:00
518fe5e113 Multiple input files support 2019-02-09 13:25:26 +01:00
b4e73e2d60 Implemented multiple output option 2019-02-07 10:19:36 +01:00
8b47e2b317 lemma_only bug fixed and skip-check-id instead of check-id (opt out). 2019-02-06 15:46:02 +01:00
5f7b5f969c Check root ids is now skipped by default. 2019-02-06 15:33:33 +01:00
27a60c439b Working: using all new stuff 2019-02-06 15:29:37 +01:00
916269e710 NW: Writer class implemented 2019-02-06 15:29:19 +01:00
1298a45d0f NW: ColocationIds class implemented 2019-02-06 15:29:03 +01:00
5b75d6e4fa Using argparse 2019-02-06 15:28:39 +01:00
3dc69158b9 NW: switching print for logging 2019-02-06 15:26:09 +01:00
f8103990a8 Link order added. 2019-02-04 11:01:30 +01:00
4dc87ce953 Handling empty 2019-01-28 09:39:57 +01:00
a9b6681576 removed restriction on number of rules 2019-01-28 08:50:32 +01:00
6d574f674f removing pickle stuff for faster loading... 2019-01-25 18:44:41 +01:00
6a221ae8fe Fixes for msd length matching and pc matching
Also some cleanup and fix output formatting
2019-01-25 11:58:40 +01:00
cddeb9c4e4 accomodating for #773 2019-01-19 22:42:51 +01:00
106db9394e Removing getchildren() and adding root_words (don't know why yet, will remove if I dont remember) 2019-01-08 21:17:35 +01:00
aeb2770966 files input as argv 2019-01-08 21:13:36 +01:00
36d4a217f7 Catching modra links from root 2019-01-08 19:37:28 +01:00
06d4217b0b Moving from linkedlist of component to tree structure. 2018-10-30 13:33:08 +01:00
319800e0ca Links with | now parsed 2018-10-29 12:43:07 +01:00
52e6fc92c6 Two fixes, "10-1"-like structures and restriction_or 2018-10-29 12:16:42 +01:00
74a1e4834b First commit 2018-10-29 11:29:51 +01:00