Using OmegaWiki as a resource

From Open Progress

Jump to: navigation, search

Contents

[edit] Using OmegaWiki as a resource

OmegaWiki is a project that aims to provide both lexical, terminological and ontological information in a Wiki environmental to be used for words and concepts of all languages. The information is to be provided with a user interface that allows for multiple languages.

[edit] History of the project

When multiple language versions of Wiktionary were created, several of these projects aimed to collaborate. This worked up to a point, the problem was in maintaining information once entered into an other Wiktionary. It however did show us that most data is language independent; this meant that it is only the labels that need changing. The conclusion was that when the data is available in a relational database, the same data can serve people from many languages.

Starting with an analysis of the data that exists in the many Wiktionaries, a functional design was created. The aim was to include all the information that exists in any of the Wiktionaries. Given the free format nature of Wiktionary, it proved that there was information that is lexical, terminological and ontological, consequently the design became a mash up of these three fields.

The resulting design was posted on the Wikimedia Foundation's Meta, it was open for discussion and, at some stage a developer was found who thought he could develop the required functionality. Initial funding was provided by Kennisnet. The functional data design was in the meantime shown to Alan K. Melby and Sue Ellen Wright and it was considered a feasible design.

We were extremely fortunate to find in Knewco an organisation who had a need for the functionality we were building. They however had additional requirements as well. This resulted in a major technological shift; from a relational database, it became in essence an object oriented database as the notions of the “third manifesto” were implemented in the MediaWiki software to make OmegaWiki a reality.

[edit] About the project

As OmegaWiki is open source, we have been bringing our functionality out when we had it. The first data was provided in December 2005 when we re-published the GEMET data. The next functionality allowed for the editing of existing information followed by the adding of new synonyms of translations. With improved functionality came more people who have shown an interest in the project. Some 860 people have created an account and some 490 are given the privilege to edit. There are some 200.000 expressions and some 14.000 concepts.

OmegaWiki is different from Wikipedia in that it actively seeks collaboration with organisations. In OmegaWiki we aim to provide a resource that can be successfully used in as many situations as possible. Sometimes there will be a need to adapt the functionality or the data design. It helps when this is done in collaboration; the organisation who asks for change is expected to help with the implementation. Many organisations have invested heavily in their lexical, terminological and ontological resources. It would be stupid not to leverage all this work and integrate it in OmegaWiki.

When existing resources are integrated in OmegaWiki, it does mean that the data will be adapted. Our initial GEMET data provides a good example. Much of this data has been changed in order to provide lexical information as well as the existing ontological information.

[edit] Some Technical background

[edit] The DefinedMeaning

A "DefinedMeaning" is the combination of one Expression and a Definition in one language. For this combination synonyms and translations can be found. The preferred procedure to add information for another language is by first translating the definition as literally as possible and then find the Expressions that match the definition. When there is a perfect match, an expression can be provided that is indicated as being an "Identical meaning". When a translation is provided that is not identical, the Expression serves for translators, the expression that is not exact needs a DefinedMeaning that IS exact and, the two DefinedMeanings need to be related to understand the differences.

[edit] Semantic relations

A Semantic Web is often understood as a "net of interdependent concepts where the dependencies are classified into distinct types with specific interpretations". The typical semantic web is English and consequently it is best understood by people who master both the English language and the subject involved.

In the implementation of relations, OmegaWiki allows for the inclusion of semantic webs. In OmegaWiki any of the triples are expressed as DefinedMeanings. The implication is that a semantic relation will be shown in OmegaWiki in the language selected in the User Interface (or in the default English)

[edit] Existing projects within OmegaWiki

[edit] Get some basic content

A resource that is to provide lexical information, needs to cover the most used words. To provide these we have lists with the 10.000 most used words in a language. By defining them and by adding as many translations as we can find, we are working towards the tipping point where OmegaWiki starts to become useful for finding translations to the concepts involved.

For more extended lexical support, we need additional programming. This will provide us with conjugations and inflections. This support needs to be language specific.

[edit] OLPC

For the One Laptop Per Child project, we are working on some basic lexical support. We want to have a few hundred words in as many languages as we can get. Finding contributors proves to be relatively easy for the Western languages and hard for the languages where the OLPC will operate.

[edit] Destinazione Italia

We have a great working relation with the University of Bamberg. They teach an advanced course of Italian to their students, and have been involved in creating the relevant terminology. By including it in OmegaWiki and by creating software that is to be used in the teaching material, we can build several types of training material by either showing translations or definitions where the student has to enter the requested word. As OmegaWiki allows for synonyms, multiple answers are allowed for.

In the current stage of the project, the content is made ready to be used not only for German students but also for students of other languages. By extending the coverage of the languages supported, this material becomes useful for more students of different backgrounds.

[edit] Bio-Medical Wiki for Professionals

Much of the development of OmegaWiki has been driven by Knewco in order to provide a collaborative environment for the further developement of bio-medical terminology and data. Much of the development of OmegaWiki has been driven by project. What was required was the ability to no only register the terminological and ontological data, there was also a need to further annotate information. This resulted from the work done to integrate UMLS and Swiss-Prot data. Much effort has gone in the creation of this data, the problem is to keep this information up to date. By publishing this in a collaborative environment, scientists will be enabled to extend and improve on the existing information.

Even though it is exciting to create an environment where the terminology of the bio-medical world can be edited, it does not necessarily provide a compelling working environment. Knewco has extended on what OmegaWiki provides with a personal desktop where tools can be added that can make use of the underlying terminological support. The functionality provided in this way has been the subject of an article in Nature and a demo can be found at wikiprofessional.info.

We expect that this part of our project will be life at the time of the conference.

[edit] Feedback on the ISO-CD-639-6

We are preparing the data for import of the ISO-639-6 data. In this standard, some 25.000 linguistic entities will be made available in a hierarchical structure. Given that everything from language families to spoken dialects will be included in this way, there will be many people who will find issues with the data. For this reason, it makes sense to allow for this information to be discussed on a Wiki. There will be a need of all kinds of additional information on these languages. It is likely to drive a lot of lexical data to OmegaWiki because in order to convince that a linguistic entity exists. The best way to demonstrate this is by demonstrating the difference between closely related linguistic entities.

One big advantage for OmegaWiki is in that it solves the problem what linguistic entities we are to support. As we aim to support them all, it will be only when a linguistic entity is NOT accepted that we will not accept one.

[edit] Outlines of possible future OmegaWiki projects

[edit] Integrating Wordnet

Princeton's Wordnet is one of the best examples of a quality free lexical resource. it has several things in common with OmegaWiki and there are several differences. Wordnet is centred around synsets and in a Wordnet grid, they are bringing words from all languages together. The big differences are in the licensing of the data, unlike the original English content, many of the Wordnets in other languages are not available under a free license. This is one of the big issues in providing usage to these resources. Wordnet is created by a team of lexicologists, the result is a work of high quality and it is only in the subsequent work that it is used in other ways.

OmegaWiki can include the different Wordnets. I even think that it would be beneficial to do just that. It would be very much a signal that OmegaWiki respects the work that has been done elsewhere and is keen to make integrate existing data. It would also by extension bring many people to OmegaWiki both as users and as editors.

[edit] Linking the Wikipedias into OmegaWiki

One use scenario of OmegaWiki is that people will want to look up initial information. Much information will be provided by OmegaWiki, often there will be a need for more encyclopaedic information. By linking the Wikipedia articles to the OmegaWiki concepts a small icon will make this link obvious. Because of the way the Wikipedia works, particularly the "wiki" and "interwiki" links a lot of terminological information can be derived from this. There have been projects that deal with

The challenge will be to connect the Wikipedia articles to the OmegaWiki concepts. Given that both the Wikipedia articles and OmegaWiki have translations, there will be evident overlap. In many ways, OmegaWiki can augment the service provided by Wikipedia, as OmegaWiki can provide information even when the Wikipedia article does not exist yet.

[edit] Using OmegaWiki to tag images and video

The Wikimedia Foundation has in its Commons project one of the biggest repositories of images available under a Free/Open license. These pictures are tagged in English and organisms with a Latin name. The consequence is that this resource is not as approachable as is desired.

When in stead of plain text tags links to DefinedMeanings are used, it is possible to provide information based on the language selected in the user preferences. This will also provides the basis for searching pictures, an extra benefit is that the disambiguation into homonyms can be dealt with prior to the actual search. This can be done by showing definitions prior to the search execution.

Personal tools