Commit Graph

145 Commits

Author SHA1 Message Date
9ccbd02603 Implementing the rest of stats. Maybe ok? 2019-06-10 00:25:36 +02:00
d7f97ba9b3 implementing but commenting out distinct_2w_forms 2019-06-10 00:25:14 +02:00
ca0d6f0f55 num_words now proper dict 2019-06-10 00:24:47 +02:00
865351b3f6 Turns out previous commit was OK. Proceeding with stats work 2019-06-09 23:00:19 +02:00
c6440162b8 NOT WORKING inbetween commit 2019-06-09 22:25:58 +02:00
dff9643edf Simplifying main writing stuff 2019-06-09 13:36:31 +02:00
89f35f5259 handling writers for when we dont need outputs (no --all for example) 2019-06-09 13:36:07 +02:00
5929004c44 now using new formatters, simplifies the code nicely 2019-06-09 13:35:34 +02:00
111b088c6c defining formatter for --output 2019-06-09 13:33:03 +02:00
2a437b1703 Defining writer for --all 2019-06-09 13:32:10 +02:00
96e61d2f64 Defining Formatter parent class for out/all/stats output files 2019-06-09 13:27:04 +02:00
2387bd7cb7 Stats flag 2019-06-09 10:20:29 +02:00
6a9ee516a3 EMPTY COMMIT - fixing some pylint warnings 2019-06-09 10:13:46 +02:00
9117734b91 EMPTY COMMIT - assert statement vs function call
and one if statement simplified and unused variable
2019-06-08 15:43:53 +02:00
46e169095c EMPTY COMMIT - removing too long lines 2019-06-08 11:54:47 +02:00
797060f619 EMPTY COMMIT - removing trailing whitespace 2019-06-08 11:42:57 +02:00
3a22cd91c3 determining jppb (for 2 word statistics) 2019-06-08 11:31:52 +02:00
30a5e80569 determine polnopomenska-beseda components in structure (for now only type='main') 2019-06-08 11:27:51 +02:00
9ae7e1e9f6 Determine distrinct matches for one colocation id. 2019-06-08 11:25:55 +02:00
2773a8b9e9 Getters for number of lemmas and number of all words 2019-06-08 11:25:00 +02:00
2167e4b6fe Restrictions now always a list, removes/simplifies a bit of code 2019-06-08 11:23:50 +02:00
d83d619dc0 removing old __str__ and __repr__ debugging code 2019-06-08 11:19:40 +02:00
b2baedca52 determining dispersions 2019-06-08 11:18:49 +02:00
57c0ff6f85 Removing prints from slimmer 2019-06-08 10:20:53 +02:00
3263125898 Also need to check msd for agreements in the whole corpus. 2019-06-03 15:09:22 +02:00
44d532808d tqdm now optional 2019-06-03 09:47:36 +02:00
ed27e549b7 Adding slimming script 2019-06-03 09:37:48 +02:00
08c8050f3f Removing old logging.debug calls, makes matching stuff much faster :) 2019-06-02 14:03:29 +02:00
2c8a9f0ed0 Whitespace fixes 2019-06-02 13:51:32 +02:00
460a55cb6c Improving representation speed ~5% 2019-06-02 13:50:53 +02:00
5f226d0cd4 fixing matching of agreements with msd 2019-06-02 12:53:16 +02:00
5b9859af3e Removing dead code 2019-06-02 12:50:43 +02:00
44f0a6762e Improving speed of matching ~40% 2019-06-02 12:50:04 +02:00
fe4c95939f Removing deprecated commented out code. 2019-06-01 10:40:44 +02:00
ed83b2b9c4 implementing multiple agreements to one cid. 2019-06-01 10:36:28 +02:00
0249ef1523 Correct ordercorrect order for wordform any/msd rendering
(most frequent first)
2019-06-01 10:35:51 +02:00
119b85568f actually not showing components without representation 2019-06-01 10:35:23 +02:00
7d1bfbf73e wordform all only lowercase 2019-06-01 10:33:02 +02:00
ad7ba8c0b2 removing debugging/dead code 2019-06-01 10:31:29 +02:00
09bd4f55ef mor->more typo 2019-06-01 10:30:07 +02:00
bfd4d4a747 Refactoring representations. Now muuuuch nicer code, not yet working though :)
Added: multiple representations per component id
2019-05-30 11:34:31 +02:00
307007218d Work to fix #757-104 and #757-89
for word_form all, now removing duplicates
for word_form msd, now word_forms from the collocation, not from whole corpus
determening more specific msd for agreements, so that it gets better match when using backup-lemma representation
for agreements, now ordered by colocation's own number of occurances, not global
removed a bit of debug code
2019-05-29 20:22:22 +02:00
4c2b5f2b13 Updating for lemma representation of word_form. Also cleaning code, adding tqdm,... 2019-05-24 18:15:21 +02:00
3c669c7901 looking for agreements from the whole corpus 2019-05-23 08:13:29 +02:00
e99ba59908 lemma/msd representations now global! Need to also use for agreements 2019-05-22 11:55:51 +02:00
d14efff709 Intermediate UGLY CODE commit. Working more on representations 2019-05-22 11:22:07 +02:00
dce55d04a3 Does not yet work, agreements in representation 2019-05-20 18:14:11 +02:00
5bd0b4a064 correct representation when rep_failed 2019-05-17 20:45:39 +02:00
111512a901 no more structureselection enum 2019-05-17 20:45:10 +02:00
d2f1e95a8f continued work on representation, almost there... 2019-05-16 01:53:38 +02:00