User talk:Robert Ullmann
XiTsonga Witionary (ts.wiktionary)[Lulamisa]
Your name was given to me by Jako Olivier of salanguages.com. He is a ts.wiktionary user and had input some of the initial words into the ts.witionary. I contacted him to get some background info as to the Xitsonga users and to ask some initial questions as to the community. The objective of my query is quite simple. My father is Marius Chapatte (see ts.wiktionary user). He has put together a Xitsonga - English dictionary of some 21'000 words and at 83 has decided it was time to make this work available to a maximum amount of people. He has started to input the data into ts.wiktionary. But as you can imagine this is a time consuming process. His initial work was done in MSWord, and I have extracted all the words into an msaccess database and created a field with the script that needs to be input into wiktionary. Thios greatly simplifies the data entry, but its still a very slow process. I have read the documentation regarding the creation of a Wiktionary BOT but would like to start an official request and am not sure where to start. There is also probably some need to expand on the Xitsonga community etc on ts.wiktionary in order to simplify the work others my bring in the process of improving on my father's initial input.
Your comments, suggestions would be appreciated.
Pierre Chapatte 12:43, 6 March 2007 (UTC)
- This sounds excellent! Writing bot code to add entries here and on the English wikt would be very useful.
The things that need to be looked at are:
- Getting the data out of msaccess (MS makes it as painful as possible to use their formats if you aren't using their code ;-). Can it get dumped into an ordinary file? Then I can take it apart with Python.
- For the English wikt, we need some simple things (a ts-noun template, etc, etc.). Then we can add a few entries and ask others to look at them.
- For the wikt here, we need to do some format definition for entries. (See en:Wiktionary:Entry layout explained or rw:Wiktionary:Entry layout).
- Doing a bot run here isn't controversial, as long as it is done well. (And correctable!)
- But: we are going to have to look at the copyright(s); the work may be too much a derivation of Sasavona's copyrights. This takes some looking at; we don't want to do a lot of work and then have it removed because it can't be under GFDL. It does sound like a very serious independent compilation, and one is allowed to refer to other dictionaries when doing that. (We look at OED all the time, we just don't copy from it ;-)
One neat advantage of adding these to the English wikt is that I have code that creates entries in the Ikinyarwanda and Kiswahili wikts from the en.wikt; so they will get a lot of new entries over time.
Can I see more of what the data looks like? This is very interesting! Robert Ullmann 12:26, 7 March 2007 (UTC)