96e61d2f64
Defining Formatter parent class for out/all/stats output files
2019-06-09 13:27:04 +02:00
2387bd7cb7
Stats flag
2019-06-09 10:20:29 +02:00
6a9ee516a3
EMPTY COMMIT - fixing some pylint warnings
2019-06-09 10:13:46 +02:00
9117734b91
EMPTY COMMIT - assert statement vs function call
...
and one if statement simplified and unused variable
2019-06-08 15:43:53 +02:00
46e169095c
EMPTY COMMIT - removing too long lines
2019-06-08 11:54:47 +02:00
797060f619
EMPTY COMMIT - removing trailing whitespace
2019-06-08 11:42:57 +02:00
3a22cd91c3
determining jppb (for 2 word statistics)
2019-06-08 11:31:52 +02:00
30a5e80569
determine polnopomenska-beseda components in structure (for now only type='main')
2019-06-08 11:27:51 +02:00
9ae7e1e9f6
Determine distrinct matches for one colocation id.
2019-06-08 11:25:55 +02:00
2773a8b9e9
Getters for number of lemmas and number of all words
2019-06-08 11:25:00 +02:00
2167e4b6fe
Restrictions now always a list, removes/simplifies a bit of code
2019-06-08 11:23:50 +02:00
d83d619dc0
removing old __str__ and __repr__ debugging code
2019-06-08 11:19:40 +02:00
b2baedca52
determining dispersions
2019-06-08 11:18:49 +02:00
3263125898
Also need to check msd for agreements in the whole corpus.
2019-06-03 15:09:22 +02:00
44d532808d
tqdm now optional
2019-06-03 09:47:36 +02:00
08c8050f3f
Removing old logging.debug calls, makes matching stuff much faster :)
2019-06-02 14:03:29 +02:00
2c8a9f0ed0
Whitespace fixes
2019-06-02 13:51:32 +02:00
460a55cb6c
Improving representation speed ~5%
2019-06-02 13:50:53 +02:00
5f226d0cd4
fixing matching of agreements with msd
2019-06-02 12:53:16 +02:00
5b9859af3e
Removing dead code
2019-06-02 12:50:43 +02:00
44f0a6762e
Improving speed of matching ~40%
2019-06-02 12:50:04 +02:00
fe4c95939f
Removing deprecated commented out code.
2019-06-01 10:40:44 +02:00
ed83b2b9c4
implementing multiple agreements to one cid.
2019-06-01 10:36:28 +02:00
0249ef1523
Correct ordercorrect order for wordform any/msd rendering
...
(most frequent first)
2019-06-01 10:35:51 +02:00
119b85568f
actually not showing components without representation
2019-06-01 10:35:23 +02:00
7d1bfbf73e
wordform all only lowercase
2019-06-01 10:33:02 +02:00
ad7ba8c0b2
removing debugging/dead code
2019-06-01 10:31:29 +02:00
09bd4f55ef
mor->more typo
2019-06-01 10:30:07 +02:00
bfd4d4a747
Refactoring representations. Now muuuuch nicer code, not yet working though :)
...
Added: multiple representations per component id
2019-05-30 11:34:31 +02:00
307007218d
Work to fix #757-104 and #757-89
...
for word_form all, now removing duplicates
for word_form msd, now word_forms from the collocation, not from whole corpus
determening more specific msd for agreements, so that it gets better match when using backup-lemma representation
for agreements, now ordered by colocation's own number of occurances, not global
removed a bit of debug code
2019-05-29 20:22:22 +02:00
4c2b5f2b13
Updating for lemma representation of word_form. Also cleaning code, adding tqdm,...
2019-05-24 18:15:21 +02:00
3c669c7901
looking for agreements from the whole corpus
2019-05-23 08:13:29 +02:00
e99ba59908
lemma/msd representations now global! Need to also use for agreements
2019-05-22 11:55:51 +02:00
d14efff709
Intermediate UGLY CODE commit. Working more on representations
2019-05-22 11:22:07 +02:00
dce55d04a3
Does not yet work, agreements in representation
2019-05-20 18:14:11 +02:00
5bd0b4a064
correct representation when rep_failed
2019-05-17 20:45:39 +02:00
111512a901
no more structureselection enum
2019-05-17 20:45:10 +02:00
d2f1e95a8f
continued work on representation, almost there...
2019-05-16 01:53:38 +02:00
84a184c44d
I think this is the way to set representations, all info is available
...
... just have to actually use it
2019-05-13 10:48:21 +02:00
6eefd9c9f6
redid representation storate, (as prev commit: to make it easier to use)
...
find_next does not collect representations, no separate
class to parse representation features,
2019-05-13 09:52:29 +02:00
19067e4135
Moving matches into colocation ids, now easier for representation
2019-05-13 08:35:55 +02:00
87712128be
joint representation form
2019-05-13 00:26:00 +02:00
401698409e
Implementing new output formats, all and normal, no more lemma_only and stuff
...
Still need to implement representation in normal form.
2019-05-12 23:00:38 +02:00
b4b93022fe
Updating for new representations, for now only parsing
2019-05-12 22:13:22 +02:00
de6c73980e
adding min-frequency option
2019-02-19 15:04:44 +01:00
93d7af3aea
Reversed order sorting
2019-02-19 13:56:32 +01:00
1c9ac7c867
Adding sorting
2019-02-19 11:29:40 +01:00
8107a9f647
Adding parallel execution using subprocesses
2019-02-17 16:01:03 +01:00
dec173ae33
Restucturing, now words are parsed right after loading one file, not after loading all of them. Should be easilly parallelizable now
2019-02-14 14:33:15 +01:00
2f2bb91d0f
Supporting different xml:id variations
2019-02-12 17:38:32 +01:00