|
f0109771aa
|
chunk size now handled in file-sentence-generator
|
2019-06-16 00:59:44 +02:00 |
|
|
0d8aeb2282
|
load_files now returns a generator of senteces, not a generator of the whole file
This makes it much slower, but more adaptable for huge files.
|
2019-06-15 22:30:43 +02:00 |
|
|
a8183cf507
|
word stats now collected more memory-efficient
|
2019-06-15 22:20:20 +02:00 |
|
|
90dbbca5d5
|
HUGE refactor, creating lots of modules, no code changes though!
|
2019-06-15 18:55:35 +02:00 |
|
|
43c6c9151b
|
Simplifying and also improving the speed (less regex comparisons!)
|
2019-06-15 13:10:23 +02:00 |
|
|
09bdd0fe3f
|
Adding gitignore
|
2019-06-15 12:53:16 +02:00 |
|
|
c0939fbbd4
|
fixed performance bug for representations
No more creating millions of namedtuple classes. Works about 15x faster
|
2019-06-11 10:26:10 +02:00 |
|
|
3be4118dc0
|
Refactoring lexis/morphology matchers, now "pickable".
|
2019-06-11 10:02:24 +02:00 |
|
|
ad0f9b0956
|
Fixing logdice all stat (and mini refactoring)
|
2019-06-11 09:22:25 +02:00 |
|
|
d30f8c1980
|
Dynamically calculated max num components
|
2019-06-10 14:05:40 +02:00 |
|
|
c0a22a4ef3
|
float formatting for stats
|
2019-06-10 11:05:46 +02:00 |
|
|
bf0ed35e00
|
removing old unused commented out code
|
2019-06-10 10:54:01 +02:00 |
|
|
68c22d4e27
|
deprecating output to stdout
|
2019-06-10 10:52:00 +02:00 |
|
|
b819d9953f
|
using new formatters via --out and --out-no-stat
|
2019-06-10 10:50:51 +02:00 |
|
|
432dc87a5f
|
new outformatter, old is not outnostatformatter
|
2019-06-10 10:49:53 +02:00 |
|
|
cb53a9c7b3
|
moving delta_p12/21 to the end of stats formatter
|
2019-06-10 10:25:42 +02:00 |
|
|
9ccbd02603
|
Implementing the rest of stats. Maybe ok?
|
2019-06-10 00:25:36 +02:00 |
|
|
d7f97ba9b3
|
implementing but commenting out distinct_2w_forms
|
2019-06-10 00:25:14 +02:00 |
|
|
ca0d6f0f55
|
num_words now proper dict
|
2019-06-10 00:24:47 +02:00 |
|
|
865351b3f6
|
Turns out previous commit was OK. Proceeding with stats work
|
2019-06-09 23:00:19 +02:00 |
|
|
c6440162b8
|
NOT WORKING inbetween commit
|
2019-06-09 22:25:58 +02:00 |
|
|
dff9643edf
|
Simplifying main writing stuff
|
2019-06-09 13:36:31 +02:00 |
|
|
89f35f5259
|
handling writers for when we dont need outputs (no --all for example)
|
2019-06-09 13:36:07 +02:00 |
|
|
5929004c44
|
now using new formatters, simplifies the code nicely
|
2019-06-09 13:35:34 +02:00 |
|
|
111b088c6c
|
defining formatter for --output
|
2019-06-09 13:33:03 +02:00 |
|
|
2a437b1703
|
Defining writer for --all
|
2019-06-09 13:32:10 +02:00 |
|
|
96e61d2f64
|
Defining Formatter parent class for out/all/stats output files
|
2019-06-09 13:27:04 +02:00 |
|
|
2387bd7cb7
|
Stats flag
|
2019-06-09 10:20:29 +02:00 |
|
|
6a9ee516a3
|
EMPTY COMMIT - fixing some pylint warnings
|
2019-06-09 10:13:46 +02:00 |
|
|
9117734b91
|
EMPTY COMMIT - assert statement vs function call
and one if statement simplified and unused variable
|
2019-06-08 15:43:53 +02:00 |
|
|
46e169095c
|
EMPTY COMMIT - removing too long lines
|
2019-06-08 11:54:47 +02:00 |
|
|
797060f619
|
EMPTY COMMIT - removing trailing whitespace
|
2019-06-08 11:42:57 +02:00 |
|
|
3a22cd91c3
|
determining jppb (for 2 word statistics)
|
2019-06-08 11:31:52 +02:00 |
|
|
30a5e80569
|
determine polnopomenska-beseda components in structure (for now only type='main')
|
2019-06-08 11:27:51 +02:00 |
|
|
9ae7e1e9f6
|
Determine distrinct matches for one colocation id.
|
2019-06-08 11:25:55 +02:00 |
|
|
2773a8b9e9
|
Getters for number of lemmas and number of all words
|
2019-06-08 11:25:00 +02:00 |
|
|
2167e4b6fe
|
Restrictions now always a list, removes/simplifies a bit of code
|
2019-06-08 11:23:50 +02:00 |
|
|
d83d619dc0
|
removing old __str__ and __repr__ debugging code
|
2019-06-08 11:19:40 +02:00 |
|
|
b2baedca52
|
determining dispersions
|
2019-06-08 11:18:49 +02:00 |
|
|
57c0ff6f85
|
Removing prints from slimmer
|
2019-06-08 10:20:53 +02:00 |
|
|
3263125898
|
Also need to check msd for agreements in the whole corpus.
|
2019-06-03 15:09:22 +02:00 |
|
|
44d532808d
|
tqdm now optional
|
2019-06-03 09:47:36 +02:00 |
|
|
ed27e549b7
|
Adding slimming script
|
2019-06-03 09:37:48 +02:00 |
|
|
08c8050f3f
|
Removing old logging.debug calls, makes matching stuff much faster :)
|
2019-06-02 14:03:29 +02:00 |
|
|
2c8a9f0ed0
|
Whitespace fixes
|
2019-06-02 13:51:32 +02:00 |
|
|
460a55cb6c
|
Improving representation speed ~5%
|
2019-06-02 13:50:53 +02:00 |
|
|
5f226d0cd4
|
fixing matching of agreements with msd
|
2019-06-02 12:53:16 +02:00 |
|
|
5b9859af3e
|
Removing dead code
|
2019-06-02 12:50:43 +02:00 |
|
|
44f0a6762e
|
Improving speed of matching ~40%
|
2019-06-02 12:50:04 +02:00 |
|
|
fe4c95939f
|
Removing deprecated commented out code.
|
2019-06-01 10:40:44 +02:00 |
|