Commit Graph

  • 7c735e33f7 Db and syntactic_structures fixes master lkrsnik 2022-10-23 21:50:08 +02:00
  • d0bec69fd8 Redmine #2198: Limited wani to "collocation" type structures i2198 cyp 2021-12-03 15:23:29 +01:00
  • 598ab102b3 Adding uncommited changes lkrsnik 2021-10-01 14:42:29 +02:00
  • d67976c3d9 Modified prints + sqlalchemy and psycopg2cffi made optional lkrsnik 2021-04-15 14:16:34 +02:00
  • 39692e839f Extended recalculate statistics to filtered output lkrsnik 2021-02-16 17:01:02 +01:00
  • f1366548b6 White reset at paragraphs not sentences + progress bar updates on paragraphs not sentences. lkrsnik 2021-01-26 14:57:42 +01:00
  • 552f2e4bd0 Changed whitespace aspect from document to sentence based. lkrsnik 2021-01-23 09:28:10 +01:00
  • 361331515e Ignoring @type=single and added option for --new-tei lkrsnik 2021-01-13 16:36:44 +01:00
  • fa4479af60 Fixed repeating words bug lkrsnik 2020-11-26 09:45:22 +01:00
  • 25db8eeb7a Adding --fixed-restriction-order parameter lkrsnik 2020-10-27 09:48:34 +01:00
  • dd5fa4a1b8 Changed spaces settings - both swiched with neither and left with right. lkrsnik 2020-10-26 15:25:46 +01:00
  • c63a9d47da Adding restriction on spaces on punctuations. lkrsnik 2020-10-22 13:16:58 +02:00
  • 6dd97838b4 Added fix for when two restrictions are satisfied with the same word. lkrsnik 2020-10-19 15:40:43 +02:00
  • 8c87d07b8a Scripts adapted to changes of new structures.xml format lkrsnik 2020-10-14 14:50:35 +02:00
  • 09c4277ebe Modified error signal + Fixed no_stat lkrsnik 2020-10-09 20:13:37 +02:00
  • 06435aa3a2 Added options for "modra" lkrsnik 2020-10-09 15:18:52 +02:00
  • 1ea454f63c Added fix for punctuations lkrsnik 2020-10-08 18:31:50 +02:00
  • d5668c8b68 Moved wani.py + Added ignore of .zstd files for valency lkrsnik 2020-10-01 16:20:52 +02:00
  • 412d0c0f62 Changing file structure lkrsnik 2020-09-17 14:17:40 +02:00
  • c19c95ad97 Renaming src to luscenje struktur lkrsnik 2020-09-17 14:02:56 +02:00
  • 5bff3e370f Added setup.py lkrsnik 2020-09-17 13:09:20 +02:00
  • 01b08667d2 Added some functions for compatibility with valency, fixed readme and fixed some minor bugs. lkrsnik 2020-09-10 15:06:09 +02:00
  • 1b0e6a27eb Modified readme.md + Removed obligatory sloleks_db + Added frequency_limit and sorted parameters in recalculate_statistics.py lkrsnik 2020-09-02 10:53:45 +02:00
  • 41952738ed Added support for valency valency lkrsnik 2020-09-01 13:35:22 +02:00
  • e38ff4c7b0 Added limit to minimum frequency = 10 + Ordered by frequency lkrsnik 2020-08-21 15:05:30 +02:00
  • edea80e6e0 Added script for file extension lkrsnik 2020-08-20 16:13:22 +02:00
  • e8fdbfdb6a Merge branch 'master' of https://gitea.cjvt.si/ozbolt/luscenje_struktur lkrsnik 2020-07-24 10:07:22 +02:00
  • 49a8d5123e Quick fix for missing dispersions lkrsnik 2020-07-24 10:06:54 +02:00
  • 8cf9083421 Removing results lkrsnik 2020-07-24 10:00:12 +02:00
  • 23b062cc1b Adding issue992 fixes lkrsnik 2020-07-24 09:59:07 +02:00
  • f330a37764 Improved representations speed + Fixed bug in representations lkrsnik 2020-07-22 11:16:28 +02:00
  • 4c84873ff5 Fixing for run.sh and adding run.sh lkrsnik 2020-07-20 17:36:44 +02:00
  • 14951e8422 Added multi file reading lkrsnik 2020-07-20 15:52:01 +02:00
  • eb86a6bb1c Added collocation_sentence_map_dest lkrsnik 2020-07-20 10:51:09 +02:00
  • 9a9d344510 Created new column "Joint_representative_form_variable" + Fixed collocation structures + Fixed bug with wrong lemma_fallback msds lkrsnik 2020-07-16 20:53:59 +02:00
  • de3e52c57c Changed output document to reflect most frequent word order lkrsnik 2020-07-10 13:43:52 +02:00
  • 777791ad1e Added s/z, k/h + fixed bug 90 + connecting with sloleks on lemma_fallback lkrsnik 2020-07-08 19:23:56 +02:00
  • ec113f9cd2 Merge branch 'sql-join-test' of ozbolt/luscenje_struktur into master ozbolt 2020-03-02 19:12:37 +00:00
  • 9e8cd2a2ec Issue #1000 sql-join-test ozbolt 2020-03-02 19:13:19 +01:00
  • 1d4c0238a6 fixing how min_freq is used and more verbose writer ozbolt 2019-11-06 02:39:26 +01:00
  • 8fee3f8a8e Testing delayed insertions of representations ozbolt 2019-09-11 08:58:02 +02:00
  • 6bb3586051 Attempt at speed optimization with sql-join ozbolt 2019-09-10 16:22:43 +02:00
  • 4124036474 match_num now loaded from database ozbolt 2019-09-09 15:29:15 +02:00
  • 07242f74c8 Also remember representations step. ozbolt 2019-09-06 14:55:36 +02:00
  • 33528f1495 step_done now implemented in database.py ozbolt 2019-08-21 12:57:42 +02:00
  • 3ea62ed242 dispersions now loaded into database and stored/loaded. ozbolt 2019-08-21 12:49:03 +02:00
  • dedc031696 Step recorded: generate_renders ozbolt 2019-08-21 12:16:10 +02:00
  • 046aef031f adding timeinfo ozbolt 2019-08-21 11:13:23 +02:00
  • 2018745d52 files loaded now in database ozbolt 2019-08-21 11:12:38 +02:00
  • 8cca761b91 min frequecy now part of writer ozbolt 2019-08-21 11:11:06 +02:00
  • 3f1c154705 can now load csv files ozbolt 2019-08-21 11:09:47 +02:00
  • d497749c78 better database commiting ozbolt 2019-08-21 11:08:08 +02:00
  • b25e3de76b adding total keyword to progress and total time spent ozbolt 2019-07-03 14:54:23 +02:00
  • 771547b7e4 progress for dispersions ozbolt 2019-07-03 14:53:51 +02:00
  • f9bfac6430 If no output, then just commit stuff to database and exit. ozbolt 2019-07-03 13:10:55 +02:00
  • ec02242f47 num-words now part of database ozbolt 2019-07-03 13:08:32 +02:00
  • ea92b44d71 Removing parallel stuff ozbolt 2019-07-03 13:06:59 +02:00
  • d771137dc7 removing pickled structures ozbolt 2019-07-03 13:05:52 +02:00
  • a07d14011d simplifying progress, because I will remove the parallel stuff ozbolt 2019-07-03 10:23:18 +02:00
  • 577983427e Better error reporting in parsing syntactic structures ozbolt 2019-07-01 17:22:30 +02:00
  • 48795c6227 common msd now calculated per colocation id and not for whole corpus ozbolt 2019-07-01 17:21:28 +02:00
  • 2f789e6550 last agreement now confirms some matches even if not all matches are ok ozbolt 2019-07-01 17:20:27 +02:00
  • 1401b82324 Adding msd to out formatter ozbolt 2019-07-01 17:18:25 +02:00
  • 47340fe80c common msd now based on (lemma,msd0) not only lemma #757-127 ozbolt 2019-06-28 22:00:38 +02:00
  • 8c20295adf Adding dispersions to sqlite, finished moving to it. ozbolt 2019-06-27 22:04:33 +02:00
  • b5e281bdf4 adding indexes for speed and set_representations via database ozbolt 2019-06-27 17:16:27 +02:00
  • 188763c06a Incorporating database also in MatchStore ozbolt 2019-06-27 16:51:58 +02:00
  • c25844a335 adding separate database class ozbolt 2019-06-27 12:37:23 +02:00
  • fa8a5e55f8 Merge branch 'sqlite' ozbolt 2019-06-27 11:45:20 +02:00
  • c2c2ce7ff8 making sorted words sorted a bit more non-randomly. ozbolt 2019-06-27 11:44:02 +02:00
  • 8b06c4ec38 Skipping already used abailable words, stupid refactoring bug ozbolt 2019-06-27 00:57:46 +02:00
  • 11706b6f81 word stats on sqlite now, not yet really working. ozbolt 2019-06-27 00:37:47 +02:00
  • 1256a4de40 Fixing loading bad gz files and progress showing ozbolt 2019-06-26 13:06:43 +02:00
  • 049f5ca3dc Adding new N* msds ozbolt 2019-06-26 12:47:02 +02:00
  • cfdb36b894 Adding ability to load gz files. ozbolt 2019-06-17 20:41:11 +02:00
  • d2f6f8dac8 adding new Nw msd ozbolt 2019-06-17 20:39:07 +02:00
  • 70b05e8637 New progress bar ozbolt 2019-06-17 17:30:51 +02:00
  • 3552f14b81 Loader to its own module ozbolt 2019-06-17 15:38:55 +02:00
  • 51cf3e7064 Improving debugging ouptut ozbolt 2019-06-16 01:32:31 +02:00
  • dc285ce265 Saving memory in word-stats ozbolt 2019-06-16 01:31:40 +02:00
  • 37acabc076 able to load pickled structures ozbolt 2019-06-16 01:00:22 +02:00
  • f0109771aa chunk size now handled in file-sentence-generator ozbolt 2019-06-16 00:59:44 +02:00
  • 0d8aeb2282 load_files now returns a generator of senteces, not a generator of the whole file ozbolt 2019-06-15 22:30:43 +02:00
  • a8183cf507 word stats now collected more memory-efficient ozbolt 2019-06-15 22:20:20 +02:00
  • 90dbbca5d5 HUGE refactor, creating lots of modules, no code changes though! ozbolt 2019-06-15 18:55:35 +02:00
  • 43c6c9151b Simplifying and also improving the speed (less regex comparisons!) ozbolt 2019-06-15 13:10:23 +02:00
  • 09bdd0fe3f Adding gitignore ozbolt 2019-06-15 12:53:16 +02:00
  • c0939fbbd4 fixed performance bug for representations ozbolt 2019-06-11 10:26:10 +02:00
  • 3be4118dc0 Refactoring lexis/morphology matchers, now "pickable". ozbolt 2019-06-11 10:02:24 +02:00
  • ad0f9b0956 Fixing logdice all stat (and mini refactoring) ozbolt 2019-06-11 09:22:25 +02:00
  • d30f8c1980 Dynamically calculated max num components ozbolt 2019-06-10 14:05:40 +02:00
  • c0a22a4ef3 float formatting for stats ozbolt 2019-06-10 11:05:46 +02:00
  • bf0ed35e00 removing old unused commented out code ozbolt 2019-06-10 10:54:01 +02:00
  • 68c22d4e27 deprecating output to stdout ozbolt 2019-06-10 10:52:00 +02:00
  • b819d9953f using new formatters via --out and --out-no-stat ozbolt 2019-06-10 10:50:51 +02:00
  • 432dc87a5f new outformatter, old is not outnostatformatter ozbolt 2019-06-10 10:49:53 +02:00
  • cb53a9c7b3 moving delta_p12/21 to the end of stats formatter ozbolt 2019-06-10 10:25:42 +02:00
  • 9ccbd02603 Implementing the rest of stats. Maybe ok? ozbolt 2019-06-10 00:25:36 +02:00
  • d7f97ba9b3 implementing but commenting out distinct_2w_forms ozbolt 2019-06-10 00:25:14 +02:00
  • ca0d6f0f55 num_words now proper dict ozbolt 2019-06-10 00:24:47 +02:00