luscenje_struktur

Author	SHA1	Message	Date
Ozbolt Menegatti	96e61d2f64	Defining Formatter parent class for out/all/stats output files	2019-06-09 13:27:04 +02:00
Ozbolt Menegatti	2387bd7cb7	Stats flag	2019-06-09 10:20:29 +02:00
Ozbolt Menegatti	6a9ee516a3	EMPTY COMMIT - fixing some pylint warnings	2019-06-09 10:13:46 +02:00
Ozbolt Menegatti	9117734b91	EMPTY COMMIT - assert statement vs function call and one if statement simplified and unused variable	2019-06-08 15:43:53 +02:00
Ozbolt Menegatti	46e169095c	EMPTY COMMIT - removing too long lines	2019-06-08 11:54:47 +02:00
Ozbolt Menegatti	797060f619	EMPTY COMMIT - removing trailing whitespace	2019-06-08 11:42:57 +02:00
Ozbolt Menegatti	3a22cd91c3	determining jppb (for 2 word statistics)	2019-06-08 11:31:52 +02:00
Ozbolt Menegatti	30a5e80569	determine polnopomenska-beseda components in structure (for now only type='main')	2019-06-08 11:27:51 +02:00
Ozbolt Menegatti	9ae7e1e9f6	Determine distrinct matches for one colocation id.	2019-06-08 11:25:55 +02:00
Ozbolt Menegatti	2773a8b9e9	Getters for number of lemmas and number of all words	2019-06-08 11:25:00 +02:00
Ozbolt Menegatti	2167e4b6fe	Restrictions now always a list, removes/simplifies a bit of code	2019-06-08 11:23:50 +02:00
Ozbolt Menegatti	d83d619dc0	removing old __str__ and __repr__ debugging code	2019-06-08 11:19:40 +02:00
Ozbolt Menegatti	b2baedca52	determining dispersions	2019-06-08 11:18:49 +02:00
Ozbolt Menegatti	3263125898	Also need to check msd for agreements in the whole corpus.	2019-06-03 15:09:22 +02:00
Ozbolt Menegatti	44d532808d	tqdm now optional	2019-06-03 09:47:36 +02:00
Ozbolt Menegatti	08c8050f3f	Removing old logging.debug calls, makes matching stuff much faster :)	2019-06-02 14:03:29 +02:00
Ozbolt Menegatti	2c8a9f0ed0	Whitespace fixes	2019-06-02 13:51:32 +02:00
Ozbolt Menegatti	460a55cb6c	Improving representation speed ~5%	2019-06-02 13:50:53 +02:00
Ozbolt Menegatti	5f226d0cd4	fixing matching of agreements with msd	2019-06-02 12:53:16 +02:00
Ozbolt Menegatti	5b9859af3e	Removing dead code	2019-06-02 12:50:43 +02:00
Ozbolt Menegatti	44f0a6762e	Improving speed of matching ~40%	2019-06-02 12:50:04 +02:00
Ozbolt Menegatti	fe4c95939f	Removing deprecated commented out code.	2019-06-01 10:40:44 +02:00
Ozbolt Menegatti	ed83b2b9c4	implementing multiple agreements to one cid.	2019-06-01 10:36:28 +02:00
Ozbolt Menegatti	0249ef1523	Correct ordercorrect order for wordform any/msd rendering (most frequent first)	2019-06-01 10:35:51 +02:00
Ozbolt Menegatti	119b85568f	actually not showing components without representation	2019-06-01 10:35:23 +02:00
Ozbolt Menegatti	7d1bfbf73e	wordform all only lowercase	2019-06-01 10:33:02 +02:00
Ozbolt Menegatti	ad7ba8c0b2	removing debugging/dead code	2019-06-01 10:31:29 +02:00
Ozbolt Menegatti	09bd4f55ef	mor->more typo	2019-06-01 10:30:07 +02:00
Ozbolt Menegatti	bfd4d4a747	Refactoring representations. Now muuuuch nicer code, not yet working though :) Added: multiple representations per component id	2019-05-30 11:34:31 +02:00
Ozbolt Menegatti	307007218d	Work to fix #757-104 and #757-89 for word_form all, now removing duplicates for word_form msd, now word_forms from the collocation, not from whole corpus determening more specific msd for agreements, so that it gets better match when using backup-lemma representation for agreements, now ordered by colocation's own number of occurances, not global removed a bit of debug code	2019-05-29 20:22:22 +02:00
Ozbolt Menegatti	4c2b5f2b13	Updating for lemma representation of word_form. Also cleaning code, adding tqdm,...	2019-05-24 18:15:21 +02:00
Ozbolt Menegatti	3c669c7901	looking for agreements from the whole corpus	2019-05-23 08:13:29 +02:00
Ozbolt Menegatti	e99ba59908	lemma/msd representations now global! Need to also use for agreements	2019-05-22 11:55:51 +02:00
Ozbolt Menegatti	d14efff709	Intermediate UGLY CODE commit. Working more on representations	2019-05-22 11:22:07 +02:00
Ozbolt Menegatti	dce55d04a3	Does not yet work, agreements in representation	2019-05-20 18:14:11 +02:00
Ozbolt Menegatti	5bd0b4a064	correct representation when rep_failed	2019-05-17 20:45:39 +02:00
Ozbolt Menegatti	111512a901	no more structureselection enum	2019-05-17 20:45:10 +02:00
Ozbolt Menegatti	d2f1e95a8f	continued work on representation, almost there...	2019-05-16 01:53:38 +02:00
Ozbolt Menegatti	84a184c44d	I think this is the way to set representations, all info is available ... just have to actually use it	2019-05-13 10:48:21 +02:00
Ozbolt Menegatti	6eefd9c9f6	redid representation storate, (as prev commit: to make it easier to use) find_next does not collect representations, no separate class to parse representation features,	2019-05-13 09:52:29 +02:00
Ozbolt Menegatti	19067e4135	Moving matches into colocation ids, now easier for representation	2019-05-13 08:35:55 +02:00
Ozbolt Menegatti	87712128be	joint representation form	2019-05-13 00:26:00 +02:00
Ozbolt Menegatti	401698409e	Implementing new output formats, all and normal, no more lemma_only and stuff Still need to implement representation in normal form.	2019-05-12 23:00:38 +02:00
Ozbolt Menegatti	b4b93022fe	Updating for new representations, for now only parsing	2019-05-12 22:13:22 +02:00
Ozbolt Menegatti	de6c73980e	adding min-frequency option	2019-02-19 15:04:44 +01:00
Ozbolt Menegatti	93d7af3aea	Reversed order sorting	2019-02-19 13:56:32 +01:00
Ozbolt Menegatti	1c9ac7c867	Adding sorting	2019-02-19 11:29:40 +01:00
Ozbolt Menegatti	8107a9f647	Adding parallel execution using subprocesses	2019-02-17 16:01:03 +01:00
Ozbolt Menegatti	dec173ae33	Restucturing, now words are parsed right after loading one file, not after loading all of them. Should be easilly parallelizable now	2019-02-14 14:33:15 +01:00
Ozbolt Menegatti	2f2bb91d0f	Supporting different xml:id variations	2019-02-12 17:38:32 +01:00

1 2

78 Commits