-
7c735e33f7
Db and syntactic_structures fixes
master
Luka
2022-10-23 21:50:08 +0200
-
d0bec69fd8
Redmine #2198: Limited wani to "collocation" type structures
i2198
Cyprian Laskowski
2021-12-03 15:23:29 +0100
-
598ab102b3
Adding uncommited changes
Luka
2021-10-01 14:42:29 +0200
-
d67976c3d9
Modified prints + sqlalchemy and psycopg2cffi made optional
Luka
2021-04-15 14:16:34 +0200
-
-
39692e839f
Extended recalculate statistics to filtered output
Luka
2021-02-16 17:01:02 +0100
-
f1366548b6
White reset at paragraphs not sentences + progress bar updates on paragraphs not sentences.
Luka
2021-01-26 14:57:42 +0100
-
552f2e4bd0
Changed whitespace aspect from document to sentence based.
Luka
2021-01-23 09:28:10 +0100
-
361331515e
Ignoring @type=single and added option for --new-tei
Luka
2021-01-13 16:36:44 +0100
-
fa4479af60
Fixed repeating words bug
Luka
2020-11-26 09:45:22 +0100
-
25db8eeb7a
Adding --fixed-restriction-order parameter
Luka
2020-10-27 09:48:34 +0100
-
dd5fa4a1b8
Changed spaces settings - both swiched with neither and left with right.
Luka
2020-10-26 15:25:46 +0100
-
c63a9d47da
Adding restriction on spaces on punctuations.
Luka
2020-10-22 13:16:58 +0200
-
6dd97838b4
Added fix for when two restrictions are satisfied with the same word.
Luka
2020-10-19 15:40:43 +0200
-
8c87d07b8a
Scripts adapted to changes of new structures.xml format
Luka
2020-10-14 14:50:35 +0200
-
09c4277ebe
Modified error signal + Fixed no_stat
Luka
2020-10-09 20:13:37 +0200
-
06435aa3a2
Added options for "modra"
Luka
2020-10-09 15:18:52 +0200
-
1ea454f63c
Added fix for punctuations
Luka
2020-10-08 18:31:50 +0200
-
d5668c8b68
Moved wani.py + Added ignore of .zstd files for valency
Luka
2020-10-01 16:20:52 +0200
-
412d0c0f62
Changing file structure
Luka
2020-09-17 14:17:40 +0200
-
c19c95ad97
Renaming src to luscenje struktur
Luka
2020-09-17 14:02:56 +0200
-
5bff3e370f
Added setup.py
Luka
2020-09-17 13:09:20 +0200
-
01b08667d2
Added some functions for compatibility with valency, fixed readme and fixed some minor bugs.
Luka
2020-09-10 15:06:09 +0200
-
1b0e6a27eb
Modified readme.md + Removed obligatory sloleks_db + Added frequency_limit and sorted parameters in recalculate_statistics.py
Luka
2020-09-02 10:53:45 +0200
-
41952738ed
Added support for valency
valency
Luka
2020-09-01 13:35:22 +0200
-
e38ff4c7b0
Added limit to minimum frequency = 10 + Ordered by frequency
Luka
2020-08-21 15:05:30 +0200
-
edea80e6e0
Added script for file extension
Luka
2020-08-20 16:13:22 +0200
-
e8fdbfdb6a
Merge branch 'master' of https://gitea.cjvt.si/ozbolt/luscenje_struktur
lkrsnik
2020-07-24 10:07:22 +0200
-
-
49a8d5123e
Quick fix for missing dispersions
lkrsnik
2020-07-24 10:06:54 +0200
-
8cf9083421
Removing results
Luka
2020-07-24 10:00:12 +0200
-
23b062cc1b
Adding issue992 fixes
Luka
2020-07-24 09:59:07 +0200
-
-
f330a37764
Improved representations speed + Fixed bug in representations
Luka
2020-07-22 11:16:28 +0200
-
4c84873ff5
Fixing for run.sh and adding run.sh
lkrsnik
2020-07-20 17:36:44 +0200
-
14951e8422
Added multi file reading
Luka
2020-07-20 15:52:01 +0200
-
eb86a6bb1c
Added collocation_sentence_map_dest
Luka
2020-07-20 10:51:09 +0200
-
9a9d344510
Created new column "Joint_representative_form_variable" + Fixed collocation structures + Fixed bug with wrong lemma_fallback msds
Luka
2020-07-16 20:53:59 +0200
-
de3e52c57c
Changed output document to reflect most frequent word order
Luka
2020-07-10 13:43:52 +0200
-
777791ad1e
Added s/z, k/h + fixed bug 90 + connecting with sloleks on lemma_fallback
Luka
2020-07-08 19:23:56 +0200
-
ec113f9cd2
Merge branch 'sql-join-test' of ozbolt/luscenje_struktur into master
ozbolt
2020-03-02 19:12:37 +0000
-
-
9e8cd2a2ec
Issue #1000
sql-join-test
Ozbolt Menegatti
2020-03-02 19:13:19 +0100
-
1d4c0238a6
fixing how min_freq is used and more verbose writer
Ozbolt Menegatti
2019-11-06 02:39:26 +0100
-
8fee3f8a8e
Testing delayed insertions of representations
Ozbolt Menegatti
2019-09-11 08:58:02 +0200
-
6bb3586051
Attempt at speed optimization with sql-join
Ozbolt Menegatti
2019-09-10 16:22:43 +0200
-
-
4124036474
match_num now loaded from database
Ozbolt Menegatti
2019-09-09 15:29:15 +0200
-
07242f74c8
Also remember representations step.
Ozbolt Menegatti
2019-09-06 14:55:36 +0200
-
33528f1495
step_done now implemented in database.py
Ozbolt Menegatti
2019-08-21 12:57:42 +0200
-
3ea62ed242
dispersions now loaded into database and stored/loaded.
Ozbolt Menegatti
2019-08-21 12:49:03 +0200
-
dedc031696
Step recorded: generate_renders
Ozbolt Menegatti
2019-08-21 12:16:10 +0200
-
046aef031f
adding timeinfo
Ozbolt Menegatti
2019-08-21 11:13:23 +0200
-
2018745d52
files loaded now in database
Ozbolt Menegatti
2019-08-21 11:12:38 +0200
-
8cca761b91
min frequecy now part of writer
Ozbolt Menegatti
2019-08-21 11:11:06 +0200
-
3f1c154705
can now load csv files
Ozbolt Menegatti
2019-08-21 11:09:47 +0200
-
d497749c78
better database commiting
Ozbolt Menegatti
2019-08-21 11:08:08 +0200
-
b25e3de76b
adding total keyword to progress and total time spent
Ozbolt Menegatti
2019-07-03 14:54:23 +0200
-
771547b7e4
progress for dispersions
Ozbolt Menegatti
2019-07-03 14:53:51 +0200
-
f9bfac6430
If no output, then just commit stuff to database and exit.
Ozbolt Menegatti
2019-07-03 13:10:55 +0200
-
ec02242f47
num-words now part of database
Ozbolt Menegatti
2019-07-03 13:08:32 +0200
-
ea92b44d71
Removing parallel stuff
Ozbolt Menegatti
2019-07-03 13:06:59 +0200
-
d771137dc7
removing pickled structures
Ozbolt Menegatti
2019-07-03 13:05:52 +0200
-
a07d14011d
simplifying progress, because I will remove the parallel stuff
Ozbolt Menegatti
2019-07-03 10:23:18 +0200
-
577983427e
Better error reporting in parsing syntactic structures
Ozbolt Menegatti
2019-07-01 17:22:30 +0200
-
48795c6227
common msd now calculated per colocation id and not for whole corpus
Ozbolt Menegatti
2019-07-01 17:21:28 +0200
-
2f789e6550
last agreement now confirms some matches even if not all matches are ok
Ozbolt Menegatti
2019-07-01 17:20:27 +0200
-
1401b82324
Adding msd to out formatter
Ozbolt Menegatti
2019-07-01 17:18:25 +0200
-
47340fe80c
common msd now based on (lemma,msd0) not only lemma #757-127
Ozbolt Menegatti
2019-06-28 22:00:38 +0200
-
8c20295adf
Adding dispersions to sqlite, finished moving to it.
Ozbolt Menegatti
2019-06-27 22:04:33 +0200
-
b5e281bdf4
adding indexes for speed and set_representations via database
Ozbolt Menegatti
2019-06-27 17:16:27 +0200
-
188763c06a
Incorporating database also in MatchStore
Ozbolt Menegatti
2019-06-27 16:51:58 +0200
-
c25844a335
adding separate database class
Ozbolt Menegatti
2019-06-27 12:37:23 +0200
-
fa8a5e55f8
Merge branch 'sqlite'
Ozbolt Menegatti
2019-06-27 11:45:20 +0200
-
-
c2c2ce7ff8
making sorted words sorted a bit more non-randomly.
Ozbolt Menegatti
2019-06-27 11:44:02 +0200
-
8b06c4ec38
Skipping already used abailable words, stupid refactoring bug
Ozbolt Menegatti
2019-06-27 00:57:46 +0200
-
11706b6f81
word stats on sqlite now, not yet really working.
Ozbolt Menegatti
2019-06-27 00:37:47 +0200
-
1256a4de40
Fixing loading bad gz files and progress showing
Ozbolt Menegatti
2019-06-26 13:06:43 +0200
-
049f5ca3dc
Adding new N* msds
Ozbolt Menegatti
2019-06-26 12:47:02 +0200
-
-
cfdb36b894
Adding ability to load gz files.
Ozbolt Menegatti
2019-06-17 20:41:11 +0200
-
d2f6f8dac8
adding new Nw msd
Ozbolt Menegatti
2019-06-17 20:39:07 +0200
-
70b05e8637
New progress bar
Ozbolt Menegatti
2019-06-17 17:30:51 +0200
-
3552f14b81
Loader to its own module
Ozbolt Menegatti
2019-06-17 15:38:55 +0200
-
51cf3e7064
Improving debugging ouptut
Ozbolt Menegatti
2019-06-16 01:32:31 +0200
-
dc285ce265
Saving memory in word-stats
Ozbolt Menegatti
2019-06-16 01:31:40 +0200
-
37acabc076
able to load pickled structures
Ozbolt Menegatti
2019-06-16 01:00:22 +0200
-
f0109771aa
chunk size now handled in file-sentence-generator
Ozbolt Menegatti
2019-06-16 00:59:44 +0200
-
0d8aeb2282
load_files now returns a generator of senteces, not a generator of the whole file
Ozbolt Menegatti
2019-06-15 22:30:43 +0200
-
a8183cf507
word stats now collected more memory-efficient
Ozbolt Menegatti
2019-06-15 22:20:20 +0200
-
90dbbca5d5
HUGE refactor, creating lots of modules, no code changes though!
Ozbolt Menegatti
2019-06-15 18:55:35 +0200
-
43c6c9151b
Simplifying and also improving the speed (less regex comparisons!)
Ozbolt Menegatti
2019-06-15 13:10:23 +0200
-
09bdd0fe3f
Adding gitignore
Ozbolt Menegatti
2019-06-15 12:53:16 +0200
-
c0939fbbd4
fixed performance bug for representations
Ozbolt Menegatti
2019-06-11 10:26:10 +0200
-
3be4118dc0
Refactoring lexis/morphology matchers, now "pickable".
Ozbolt Menegatti
2019-06-11 10:02:24 +0200
-
ad0f9b0956
Fixing logdice all stat (and mini refactoring)
Ozbolt Menegatti
2019-06-11 09:22:25 +0200
-
d30f8c1980
Dynamically calculated max num components
Ozbolt Menegatti
2019-06-10 14:05:40 +0200
-
c0a22a4ef3
float formatting for stats
Ozbolt Menegatti
2019-06-10 11:05:46 +0200
-
bf0ed35e00
removing old unused commented out code
Ozbolt Menegatti
2019-06-10 10:54:01 +0200
-
68c22d4e27
deprecating output to stdout
Ozbolt Menegatti
2019-06-10 10:52:00 +0200
-
b819d9953f
using new formatters via --out and --out-no-stat
Ozbolt Menegatti
2019-06-10 10:50:51 +0200
-
432dc87a5f
new outformatter, old is not outnostatformatter
Ozbolt Menegatti
2019-06-10 10:49:53 +0200
-
cb53a9c7b3
moving delta_p12/21 to the end of stats formatter
Ozbolt Menegatti
2019-06-10 10:25:42 +0200
-
9ccbd02603
Implementing the rest of stats. Maybe ok?
Ozbolt Menegatti
2019-06-10 00:25:36 +0200
-
d7f97ba9b3
implementing but commenting out distinct_2w_forms
Ozbolt Menegatti
2019-06-10 00:25:14 +0200
-
ca0d6f0f55
num_words now proper dict
Ozbolt Menegatti
2019-06-10 00:24:47 +0200