c0939fbbd4
fixed performance bug for representations
...
No more creating millions of namedtuple classes. Works about 15x faster
2019-06-11 10:26:10 +02:00
3be4118dc0
Refactoring lexis/morphology matchers, now "pickable".
2019-06-11 10:02:24 +02:00
ad0f9b0956
Fixing logdice all stat (and mini refactoring)
2019-06-11 09:22:25 +02:00
d30f8c1980
Dynamically calculated max num components
2019-06-10 14:05:40 +02:00
c0a22a4ef3
float formatting for stats
2019-06-10 11:05:46 +02:00
bf0ed35e00
removing old unused commented out code
2019-06-10 10:54:01 +02:00
68c22d4e27
deprecating output to stdout
2019-06-10 10:52:00 +02:00
b819d9953f
using new formatters via --out and --out-no-stat
2019-06-10 10:50:51 +02:00
432dc87a5f
new outformatter, old is not outnostatformatter
2019-06-10 10:49:53 +02:00
cb53a9c7b3
moving delta_p12/21 to the end of stats formatter
2019-06-10 10:25:42 +02:00
9ccbd02603
Implementing the rest of stats. Maybe ok?
2019-06-10 00:25:36 +02:00
d7f97ba9b3
implementing but commenting out distinct_2w_forms
2019-06-10 00:25:14 +02:00
ca0d6f0f55
num_words now proper dict
2019-06-10 00:24:47 +02:00
865351b3f6
Turns out previous commit was OK. Proceeding with stats work
2019-06-09 23:00:19 +02:00
c6440162b8
NOT WORKING inbetween commit
2019-06-09 22:25:58 +02:00
dff9643edf
Simplifying main writing stuff
2019-06-09 13:36:31 +02:00
89f35f5259
handling writers for when we dont need outputs (no --all for example)
2019-06-09 13:36:07 +02:00
5929004c44
now using new formatters, simplifies the code nicely
2019-06-09 13:35:34 +02:00
111b088c6c
defining formatter for --output
2019-06-09 13:33:03 +02:00
2a437b1703
Defining writer for --all
2019-06-09 13:32:10 +02:00
96e61d2f64
Defining Formatter parent class for out/all/stats output files
2019-06-09 13:27:04 +02:00
2387bd7cb7
Stats flag
2019-06-09 10:20:29 +02:00
6a9ee516a3
EMPTY COMMIT - fixing some pylint warnings
2019-06-09 10:13:46 +02:00
9117734b91
EMPTY COMMIT - assert statement vs function call
...
and one if statement simplified and unused variable
2019-06-08 15:43:53 +02:00
46e169095c
EMPTY COMMIT - removing too long lines
2019-06-08 11:54:47 +02:00
797060f619
EMPTY COMMIT - removing trailing whitespace
2019-06-08 11:42:57 +02:00
3a22cd91c3
determining jppb (for 2 word statistics)
2019-06-08 11:31:52 +02:00
30a5e80569
determine polnopomenska-beseda components in structure (for now only type='main')
2019-06-08 11:27:51 +02:00
9ae7e1e9f6
Determine distrinct matches for one colocation id.
2019-06-08 11:25:55 +02:00
2773a8b9e9
Getters for number of lemmas and number of all words
2019-06-08 11:25:00 +02:00
2167e4b6fe
Restrictions now always a list, removes/simplifies a bit of code
2019-06-08 11:23:50 +02:00
d83d619dc0
removing old __str__ and __repr__ debugging code
2019-06-08 11:19:40 +02:00
b2baedca52
determining dispersions
2019-06-08 11:18:49 +02:00
3263125898
Also need to check msd for agreements in the whole corpus.
2019-06-03 15:09:22 +02:00
44d532808d
tqdm now optional
2019-06-03 09:47:36 +02:00
08c8050f3f
Removing old logging.debug calls, makes matching stuff much faster :)
2019-06-02 14:03:29 +02:00
2c8a9f0ed0
Whitespace fixes
2019-06-02 13:51:32 +02:00
460a55cb6c
Improving representation speed ~5%
2019-06-02 13:50:53 +02:00
5f226d0cd4
fixing matching of agreements with msd
2019-06-02 12:53:16 +02:00
5b9859af3e
Removing dead code
2019-06-02 12:50:43 +02:00
44f0a6762e
Improving speed of matching ~40%
2019-06-02 12:50:04 +02:00
fe4c95939f
Removing deprecated commented out code.
2019-06-01 10:40:44 +02:00
ed83b2b9c4
implementing multiple agreements to one cid.
2019-06-01 10:36:28 +02:00
0249ef1523
Correct ordercorrect order for wordform any/msd rendering
...
(most frequent first)
2019-06-01 10:35:51 +02:00
119b85568f
actually not showing components without representation
2019-06-01 10:35:23 +02:00
7d1bfbf73e
wordform all only lowercase
2019-06-01 10:33:02 +02:00
ad7ba8c0b2
removing debugging/dead code
2019-06-01 10:31:29 +02:00
09bd4f55ef
mor->more typo
2019-06-01 10:30:07 +02:00
bfd4d4a747
Refactoring representations. Now muuuuch nicer code, not yet working though :)
...
Added: multiple representations per component id
2019-05-30 11:34:31 +02:00
307007218d
Work to fix #757-104 and #757-89
...
for word_form all, now removing duplicates
for word_form msd, now word_forms from the collocation, not from whole corpus
determening more specific msd for agreements, so that it gets better match when using backup-lemma representation
for agreements, now ordered by colocation's own number of occurances, not global
removed a bit of debug code
2019-05-29 20:22:22 +02:00