Commit Graph

137 Commits

Author SHA1 Message Date
ozbolt ec02242f47 num-words now part of database 2019-07-03 13:08:32 +02:00
ozbolt ea92b44d71 Removing parallel stuff 2019-07-03 13:06:59 +02:00
ozbolt d771137dc7 removing pickled structures 2019-07-03 13:05:52 +02:00
ozbolt a07d14011d simplifying progress, because I will remove the parallel stuff 2019-07-03 13:05:31 +02:00
ozbolt 577983427e Better error reporting in parsing syntactic structures 2019-07-01 17:22:30 +02:00
ozbolt 48795c6227 common msd now calculated per colocation id and not for whole corpus 2019-07-01 17:22:01 +02:00
ozbolt 2f789e6550 last agreement now confirms some matches even if not all matches are ok 2019-07-01 17:20:27 +02:00
ozbolt 1401b82324 Adding msd to out formatter 2019-07-01 17:18:25 +02:00
ozbolt 47340fe80c common msd now based on (lemma,msd0) not only lemma #757-127 2019-06-28 22:00:38 +02:00
ozbolt 8c20295adf Adding dispersions to sqlite, finished moving to it. 2019-06-27 22:04:33 +02:00
ozbolt b5e281bdf4 adding indexes for speed and set_representations via database 2019-06-27 17:16:27 +02:00
ozbolt 188763c06a Incorporating database also in MatchStore 2019-06-27 16:51:58 +02:00
ozbolt c25844a335 adding separate database class 2019-06-27 12:37:23 +02:00
ozbolt fa8a5e55f8 Merge branch 'sqlite' 2019-06-27 11:45:20 +02:00
ozbolt c2c2ce7ff8 making sorted words sorted a bit more non-randomly. 2019-06-27 11:44:02 +02:00
ozbolt 8b06c4ec38 Skipping already used abailable words, stupid refactoring bug 2019-06-27 00:57:46 +02:00
ozbolt 11706b6f81 word stats on sqlite now, not yet really working. 2019-06-27 00:37:47 +02:00
ozbolt 1256a4de40 Fixing loading bad gz files and progress showing 2019-06-26 13:06:43 +02:00
ozbolt 049f5ca3dc Adding new N* msds 2019-06-26 12:47:02 +02:00
ozbolt cfdb36b894 Adding ability to load gz files. 2019-06-17 20:41:11 +02:00
ozbolt d2f6f8dac8 adding new Nw msd 2019-06-17 20:39:07 +02:00
ozbolt 70b05e8637 New progress bar 2019-06-17 17:30:51 +02:00
ozbolt 3552f14b81 Loader to its own module 2019-06-17 15:38:55 +02:00
ozbolt 51cf3e7064 Improving debugging ouptut 2019-06-16 01:32:31 +02:00
ozbolt dc285ce265 Saving memory in word-stats 2019-06-16 01:31:40 +02:00
ozbolt 37acabc076 able to load pickled structures 2019-06-16 01:31:14 +02:00
ozbolt f0109771aa chunk size now handled in file-sentence-generator 2019-06-16 00:59:44 +02:00
ozbolt 0d8aeb2282 load_files now returns a generator of senteces, not a generator of the whole file
This makes it much slower, but more adaptable for huge files.
2019-06-15 22:30:43 +02:00
ozbolt a8183cf507 word stats now collected more memory-efficient 2019-06-15 22:20:20 +02:00
ozbolt 90dbbca5d5 HUGE refactor, creating lots of modules, no code changes though! 2019-06-15 18:55:35 +02:00
ozbolt 43c6c9151b Simplifying and also improving the speed (less regex comparisons!) 2019-06-15 13:10:23 +02:00
ozbolt 09bdd0fe3f Adding gitignore 2019-06-15 12:53:16 +02:00
ozbolt c0939fbbd4 fixed performance bug for representations
No more creating millions of namedtuple classes. Works about 15x faster
2019-06-11 10:26:10 +02:00
ozbolt 3be4118dc0 Refactoring lexis/morphology matchers, now "pickable". 2019-06-11 10:02:24 +02:00
ozbolt ad0f9b0956 Fixing logdice all stat (and mini refactoring) 2019-06-11 09:22:25 +02:00
ozbolt d30f8c1980 Dynamically calculated max num components 2019-06-10 14:05:40 +02:00
ozbolt c0a22a4ef3 float formatting for stats 2019-06-10 11:05:46 +02:00
ozbolt bf0ed35e00 removing old unused commented out code 2019-06-10 10:54:01 +02:00
ozbolt 68c22d4e27 deprecating output to stdout 2019-06-10 10:52:00 +02:00
ozbolt b819d9953f using new formatters via --out and --out-no-stat 2019-06-10 10:50:51 +02:00
ozbolt 432dc87a5f new outformatter, old is not outnostatformatter 2019-06-10 10:49:53 +02:00
ozbolt cb53a9c7b3 moving delta_p12/21 to the end of stats formatter 2019-06-10 10:25:42 +02:00
ozbolt 9ccbd02603 Implementing the rest of stats. Maybe ok? 2019-06-10 00:25:36 +02:00
ozbolt d7f97ba9b3 implementing but commenting out distinct_2w_forms 2019-06-10 00:25:14 +02:00
ozbolt ca0d6f0f55 num_words now proper dict 2019-06-10 00:24:47 +02:00
ozbolt 865351b3f6 Turns out previous commit was OK. Proceeding with stats work 2019-06-09 23:00:19 +02:00
ozbolt c6440162b8 NOT WORKING inbetween commit 2019-06-09 22:25:58 +02:00
ozbolt dff9643edf Simplifying main writing stuff 2019-06-09 13:36:31 +02:00
ozbolt 89f35f5259 handling writers for when we dont need outputs (no --all for example) 2019-06-09 13:36:07 +02:00
ozbolt 5929004c44 now using new formatters, simplifies the code nicely 2019-06-09 13:35:34 +02:00