Commit Graph

114 Commits

Author SHA1 Message Date
ozbolt 51cf3e7064 Improving debugging ouptut 2019-06-16 01:32:31 +02:00
ozbolt dc285ce265 Saving memory in word-stats 2019-06-16 01:31:40 +02:00
ozbolt 37acabc076 able to load pickled structures 2019-06-16 01:31:14 +02:00
ozbolt f0109771aa chunk size now handled in file-sentence-generator 2019-06-16 00:59:44 +02:00
ozbolt 0d8aeb2282 load_files now returns a generator of senteces, not a generator of the whole file
This makes it much slower, but more adaptable for huge files.
2019-06-15 22:30:43 +02:00
ozbolt a8183cf507 word stats now collected more memory-efficient 2019-06-15 22:20:20 +02:00
ozbolt 90dbbca5d5 HUGE refactor, creating lots of modules, no code changes though! 2019-06-15 18:55:35 +02:00
ozbolt 43c6c9151b Simplifying and also improving the speed (less regex comparisons!) 2019-06-15 13:10:23 +02:00
ozbolt 09bdd0fe3f Adding gitignore 2019-06-15 12:53:16 +02:00
ozbolt c0939fbbd4 fixed performance bug for representations
No more creating millions of namedtuple classes. Works about 15x faster
2019-06-11 10:26:10 +02:00
ozbolt 3be4118dc0 Refactoring lexis/morphology matchers, now "pickable". 2019-06-11 10:02:24 +02:00
ozbolt ad0f9b0956 Fixing logdice all stat (and mini refactoring) 2019-06-11 09:22:25 +02:00
ozbolt d30f8c1980 Dynamically calculated max num components 2019-06-10 14:05:40 +02:00
ozbolt c0a22a4ef3 float formatting for stats 2019-06-10 11:05:46 +02:00
ozbolt bf0ed35e00 removing old unused commented out code 2019-06-10 10:54:01 +02:00
ozbolt 68c22d4e27 deprecating output to stdout 2019-06-10 10:52:00 +02:00
ozbolt b819d9953f using new formatters via --out and --out-no-stat 2019-06-10 10:50:51 +02:00
ozbolt 432dc87a5f new outformatter, old is not outnostatformatter 2019-06-10 10:49:53 +02:00
ozbolt cb53a9c7b3 moving delta_p12/21 to the end of stats formatter 2019-06-10 10:25:42 +02:00
ozbolt 9ccbd02603 Implementing the rest of stats. Maybe ok? 2019-06-10 00:25:36 +02:00
ozbolt d7f97ba9b3 implementing but commenting out distinct_2w_forms 2019-06-10 00:25:14 +02:00
ozbolt ca0d6f0f55 num_words now proper dict 2019-06-10 00:24:47 +02:00
ozbolt 865351b3f6 Turns out previous commit was OK. Proceeding with stats work 2019-06-09 23:00:19 +02:00
ozbolt c6440162b8 NOT WORKING inbetween commit 2019-06-09 22:25:58 +02:00
ozbolt dff9643edf Simplifying main writing stuff 2019-06-09 13:36:31 +02:00
ozbolt 89f35f5259 handling writers for when we dont need outputs (no --all for example) 2019-06-09 13:36:07 +02:00
ozbolt 5929004c44 now using new formatters, simplifies the code nicely 2019-06-09 13:35:34 +02:00
ozbolt 111b088c6c defining formatter for --output 2019-06-09 13:33:03 +02:00
ozbolt 2a437b1703 Defining writer for --all 2019-06-09 13:32:10 +02:00
ozbolt 96e61d2f64 Defining Formatter parent class for out/all/stats output files 2019-06-09 13:27:04 +02:00
ozbolt 2387bd7cb7 Stats flag 2019-06-09 10:20:29 +02:00
ozbolt 6a9ee516a3 EMPTY COMMIT - fixing some pylint warnings 2019-06-09 10:13:46 +02:00
ozbolt 9117734b91 EMPTY COMMIT - assert statement vs function call
and one if statement simplified and unused variable
2019-06-08 15:43:53 +02:00
ozbolt 46e169095c EMPTY COMMIT - removing too long lines 2019-06-08 11:54:47 +02:00
ozbolt 797060f619 EMPTY COMMIT - removing trailing whitespace 2019-06-08 11:42:57 +02:00
ozbolt 3a22cd91c3 determining jppb (for 2 word statistics) 2019-06-08 11:31:52 +02:00
ozbolt 30a5e80569 determine polnopomenska-beseda components in structure (for now only type='main') 2019-06-08 11:27:51 +02:00
ozbolt 9ae7e1e9f6 Determine distrinct matches for one colocation id. 2019-06-08 11:25:55 +02:00
ozbolt 2773a8b9e9 Getters for number of lemmas and number of all words 2019-06-08 11:25:00 +02:00
ozbolt 2167e4b6fe Restrictions now always a list, removes/simplifies a bit of code 2019-06-08 11:23:50 +02:00
ozbolt d83d619dc0 removing old __str__ and __repr__ debugging code 2019-06-08 11:19:40 +02:00
ozbolt b2baedca52 determining dispersions 2019-06-08 11:18:49 +02:00
ozbolt 57c0ff6f85 Removing prints from slimmer 2019-06-08 10:20:53 +02:00
ozbolt 3263125898 Also need to check msd for agreements in the whole corpus. 2019-06-03 15:09:22 +02:00
ozbolt 44d532808d tqdm now optional 2019-06-03 09:47:36 +02:00
ozbolt ed27e549b7 Adding slimming script 2019-06-03 09:37:48 +02:00
ozbolt 08c8050f3f Removing old logging.debug calls, makes matching stuff much faster :) 2019-06-02 14:03:29 +02:00
ozbolt 2c8a9f0ed0 Whitespace fixes 2019-06-02 13:51:32 +02:00
ozbolt 460a55cb6c Improving representation speed ~5% 2019-06-02 13:50:53 +02:00
ozbolt 5f226d0cd4 fixing matching of agreements with msd 2019-06-02 12:53:16 +02:00