You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
Luka edcd8062bc Added GOS normalized words analysis in letter extraction + Fixing normalized words bugs with anonymous names in GOS (when extracting data with in collocability) 3 months ago
src Added GOS normalized words analysis in letter extraction + Fixing normalized words bugs with anonymous names in GOS (when extracting data with in collocability) 3 months ago
.gitignore Modified translations + Added note punctuations in OneWord 9 months ago
Corpus Analyzer.iml Project copied 1 year ago
license Licence modification 7 months ago
pom.xml Added taxonomy presentation in results 1 year ago
readme.md About fixes 8 months ago

readme.md

(English version below) LIST, korpusni luščilnik Različica: 1.0 (Zadnja posodobitev: 21. marec 2019) Avtorji: Luka Krsnik, Špela Arhar Holdt, Jaka Čibej, Kaja Dobrovoljc, Aleksander Ključevšek, Simon Krek, Marko Robnik Šikonja Korpusni luščilnik LIST je program za luščenje spiskov iz besedilnih korpusov na nivojih znakov, besednih delov, besed in besednih nizov. Nastal je v okviru projekta Nova slovnica sodobne standardne slovenščine: viri in metode (J6-8256), ki ga je med letoma 2017 in 2020 sofinancirala Javna agencija za raziskovalno dejavnost Republike Slovenije iz državnega proračuna. Raziskovalni program Jezikovni viri in tehnologije za slovenski jezik (št. P6-0411) je sofinancirala Javna agencija za raziskovalno dejavnost Republike Slovenije iz državnega proračuna. Izdajatelj: Center za jezikovne vire in tehnologije Univerze v Ljubljani, Institut “Jožef Stefan”, Fakulteta za računalništvo in informatiko Univerze v Ljubljani Vzdrževanje programa: Center za jezikovne vire in tehnologije Univerze v Ljubljani Program je dostopen pod licenco MIT License na repozitorijih CLARIN.SI (http://hdl.handle.net/11356/1227) in GitHub (https://gitea.cjvt.si/lkrsnik/list). NAVODILA ZA NAMESTITEV IN ZAGON: 1) Pred uporabo programske opreme mora biti na računalniku nameščena 64-bitna java (https://java.com/en/download/manual.jsp). 2) Vse tri programske datoteke (run.sh, run.bat, list1.0.jar) skopiramo v poljubno mapo. 3) Program zaženemo z dvoklikom na datoteko run.bat na operacijskem sistemu Windows ali run.sh na operacijskem sistemu Linux. 4) Ko izbiramo lokacijo korpusa, moramo poskrbeti, da v mapi ni datotek več različnih korpusov. --------- LIST – Corpus Extraction Tool Version: 1.0 (Last update: 21 March 2019) Authors: Luka Krsnik, Špela Arhar Holdt, Jaka Čibej, Kaja Dobrovoljc, Aleksander Ključevšek, Simon Krek, Marko Robnik Šikonja The LIST corpus extraction tool is a program for extracting lists from text corpora on the levels of characters, word parts, words, and word sets. The program was developed within the New Grammar of Modern Standard Slovene: Resource and Methods project (J6-8256), which was financially supported by the Slovenian Research Agency between 2017 and 2020. The authors acknowledge the financial support from the Slovenian Research Agency (research core funding No. P6-0411 Language Resources and Technologies for Slovene). Publisher: Centre for Language Resources and Technologies, University of Ljubljana, Jožef Stefan Institute, Faculty of Computer and Information Science, University of Ljubljana Maintenance: Centre for Language Resources and Technologies, University of Ljubljana The program is available under the MIT License at CLARIN.SI (http://hdl.handle.net/11356/1227) and GitHub (https://gitea.cjvt.si/lkrsnik/list). INSTRUCTIONS FOR INSTALLATION AND USE: 1) Make sure that 64-bit java is installed on your computer (https://java.com/en/download/manual.jsp). 2) Copy all three program files (run.sh, run.bat, list1.0.jar) in a single folder. 3) Run the program by double-clicking the run.bat file on a Windows operating system or run.sh on Linux. 4) When selecting the location of the corpus, make sure the folder does not include files of multiple different corpora.