Sunday, October 16, 2016

ToPan (multilingual topic modelling for Greek, Latin, Arabic and other languages)

ToPan (multilingual topic modelling for Greek, Latin, Arabic and other languages)

(Meletē)ToPān v.0.1
The name (Meletē)ToPān v.0.1 is based on the Greek principle μελέτη τὸ πᾶν which roughly translate to "take into care everything". I decided for the name because Topic-Modelling performs well on large amounts of logically structured chunks of texts and it helps selecting the interesting bits in a large corpus of text by technically having looked at everything. The butterfly in the logo is of the species Melete. The original photograph is by Didier Descouens and he has licensed it under CC BY-SA 4.0. I changed the image for the logo slightly. I'd strongly suggest to start with the original if you want to use it, but you can also use this now slightly modified logo under CC BY-SA 4.0 license as I am required to share it under the same license as the original image
ToPān is Topic-Modelling for everyone: from people without programming knowledge to people that want to build teaching and text-reuse tools and apps based on Topic-Modelling data without having to develop their own tool or having to majorly restructure their textual data. ToPān is made to be shared and used. That is why I tried to modularise ToPān in a way that in each step you could ingest your own data. It works best however, if you work your way from left to right: from "Data Input" to "LDA Tables" (please find more details under "Instructions"). ToPān works best with files that are structured according to the CTS/CITE architecture.
ToPān is also still under active development. This is an alpha release. More features will be added and you are encouraged to roadtest ToPān and send me feedback or report bugs.


Catullus fix stop word bug, create sample datasets 11 days ago
Models fix stop word bug, create sample datasets 11 days ago
www fix stop word bug, create sample datasets 11 days ago
.gitignore Create .gitignore 4 months ago
Catullus.R fix stop word bug, create sample datasets 11 days ago
LICENSE Create LICENSE 3 months ago
Petronius.csv recovery 4 months ago
README.md Update README.md 3 months ago
Sandbox2.RData experimenting for switch from RCurl to httr 3 months ago
Sandbox2.Rhistory experimenting for switch from RCurl to httr 3 months ago
StemDic.rds major updates and changes 3 months ago
WordEmbedVec.R fix stop word bug, create sample datasets 11 days ago
app.R fix stop word bug, create sample datasets 11 days ago
caesar.csv fix stop word bug, create sample datasets 11 days ago
catullus.csv fix stop word bug, create sample datasets 11 days ago
copyright.md update description 3 months ago
corpus.rds update 4 months ago
dataentry.md update description 3 months ago
home.md Update home.md 3 months ago
message-handler.js major updates and changes 3 months ago
morphologicalnormalisation.md update description 3 months ago
phi0972.phi001Parsed.82xf implement 82XF 3 months ago
preliminary.md update description 3 months ago
sandbox.R update 4 months ago
sandbox2.R implement 82XF 3 months ago
settingtmvalues.md update description 3 months ago
temp_vectors.bin fix stop word bug, create sample datasets 11 days ago
treebank.xml update 4 months ago

 

No comments:

Post a Comment