How to apply Semantic Web in Enterprises

May 17th, 2011

Introduction

Over the last three years, Oracle/Sun participated in a research project called Kiwi which ended in March 2011.
KiWi is an open-source development platform for building metadata-driven Semantic Social Media and Social Networking application and is part funded by the European Union under the European Union 7th Framework Programme.

We already had extensive experience on how to implement large scale Enterprise communities trough the implementation of our global social community framework called SunSpace and Community Equity which was used by over 30’000 Sun employees.

Oracle’s (Sun) role in the Kiwi project was to validate if and how Social Semantic technologies can be used in large enterprises.

Credits

I would like to thank Josef Holy and Jiri Kopsa for their excellent technical contribution to this project. A large portion of the content of this blog post(s) are excerpts from Kiwi project publications written by Josef, Jiri and myself.

Use Case: Enterprise Metadata Management

Our main goal for the Enterprise Metadata Management was to design, develop and deploy technologies and practices suitable for managing user-defined folksonomies and controlled vocabularies together.

Folksonomies and Taxonomies in the Enterprise

A folksonomy is a system of classification derived from the practice and method of collaboratively creating and managing tags to annotate and categorize content – simply put it, is a collection of tags created by users in the context of one or more content management systems. Folksonomies are usually associated with Web 2.0 services, which allow masses of users to create and annotate content (photos, videos, blog posts, etc.) freely, in an open manner.

In the enterprise environment, controlled vocabulary is a system of record for naming various things and concepts related to the company business – a typical example of it would be a ‘Product vocabulary‘, containing a list of all official names for all products produced and sold by the company. Such vocabulary is usually defined centrally, in a top-down manner, by a responsible department or a by group of individuals, as opposed to folksonomies, which are defined by ‘the wisdom of crowd’ with nobody clearly responsible for their creation and maintenance.

If products in the Product vocabulary were put into appropriate categories, they would make up a simple example of ‘Product taxonomy’. Compared to simple (flat) vocabularies, taxonomies represent richer structures, allowing hierarchical categorization of things contained inside them. When built, such hierarchies form structures of trees, each having a single root node representing the top-most (most general) concept in the hierarchy.

Managing folksonomies and taxonomies together means finding the right balance between the two worlds – openness and freedom on one side, with responsibility and control on the other.

Use Case implementation

We focused our evaluation on a typical Metadata lifecycle model which is composed of the following three states:

Apply Metada

Regular user annotates some content item (document) with yet non-existing tag – such a tag is called a free tag, it has no codified meaning and thus belongs to the unstructured folksonomy. It can be freely reused by other users of the system. Reuse of tags is enforced by tag recommender UI, which recommends users with already existing tags.

Manage Metadata
User responsible for metadata management within the system evaluates newly created free tag and if appropriate, turns it into controlled concept which is described with richer information – it is assigned with various types of labels (f.e. synonyms in different languages) and it is put to the appropriate place within one or more taxonomy hierarchies.

Exploit Metadata
Now controlled tag becomes part of the controlled metadata space, which is used also for enhancing other content management system services, such as the Personalized Semantic Search.

Apply Metadata – Tag Recommender/Information Extraction

The goal of this use case was to allow users to tag both free and controlled tags within one UI while providing them with advanced tag recommending/suggesting functionality. KiWi platform was extended with a set of light-weight (JSON) web service endpoints, serving dedicated UI widget component which was designed and developed using standardized web technologies (HTML, CSS, JavaScript).
The Natural Language based text extraction Kiwi service was extend to recognize Oracle taxonomies (controlled tags). This allows invocation of the Information Extraction functionality right from the tagging UI. The invocation returns a list of free tags and controlled tags extracted from the document and let’s the user to apply them.

Usability of the developed widget and tagging process was evaluated in an internal usability study with technical and non-technical users. The usability study covered also the information extraction functionality, which additionally went through separate internal evaluation, resulting in several requests for enhancements.

Key Findings

Usability study has shown, that the concept of free and controlled tags is understandable for new users. Users highly valued the implemented tag recommender UI, which allowed them to navigate through taxonomy hierarchies easily.
It is possible to implement more advanced taxonomy modeling, allowing for example the system-wide definition of business rules for required taxonomies and also support for the taxonomy prefixes.
Natural Language based Information Extraction combined with Semantic Taxonomies is very promising – specially the continuos improved tag suggestion results through the “self learning” capability of the system.

Manage Metadata – Concept Model Management

KiWi platform was deployed and integrated together with PoolParty, a commercial thesaurus management product, using a set of Linked Data interfaces.
Both systems were filled with data from the internal legacy systems. 19 Sun and Oracle taxonomies with almost 6000 concepts in total were created.

The whole solution was evaluated by dedicated expert(s) contextually, during creation of the above mentioned taxonomies. The solution was also evaluated in the set of internal evaluation sessions with subject matter experts from various departments.

Key Findings

The envisioned goal to implement, deploy and test solution for merging bottom-up and top-down metadata management practices was successfully met using KiWi platform and PoolParty taxonomy management tool
Resulting metadata structures (hierarchies) can be used to provide enhanced metadata suggestions to the system users
Allowing users to navigate through individual taxonomies and to apply concepts from these taxonomies along with folksonomy tags helps to improve structure and consistency of metadata in the enterprise content management systems
Essential factors for implementing effective open metadata governance models within large enterprises are:
- Management support- successful implementation of metadata governance requires substantial change in various content management processes within organization. These changes are impossible without clear leadership and guidance provided by responsible organization leaders.
- The involvement of appropriate subject matter experts – in order to achieve one its main goals – proper structuring of organizational knowledge models – the direct (community) involvement of appropriate subject matter experts is needed.
- Measuring quality and relevance within open collaborative systems – in collaboration systems with low barrier for participation (e.g. wikis) it is important to have the ability to measure the quality and relevance. For that reason, the Community Equity system was successfully integrated with the KiWi platform.
Although difficult to calculate precisely, the cost of newly created controlled tag (taxonomy concept) can be measured based on the time needed for:
- resolving the free tag meaning, which is often accompanied by costs of communication with tag author or with one or more subject matter experts.
- placing the concept into the appropriate taxonomy.

Exploit Metadata- Search/Browse Use Case Summary

The goal was to verify the usability and accuracy of the search results. Subject matter experts performed a set of search queries and compared the results against the internal search.

Key Findings

Taxonomy based synonym matching is efficient and improves the search results
Personalized search based on Social Analytics algorithms looks very promising
The faceted search functionality is highly configurable using RDF facets and superior to the existing search functionality

Conclusion

The implementation of the Oracle/Sun use cases in the KiWi/PoolParty system has been very useful. The development of the Metadata Management Process and its application in the environment of controlled taxonomies and folksonomies was significant. We learned how to optimize our metadata management processes, how to technically implement such service and how to improve the user interaction. The application of natural language processing combined with semantic technologies has also improved the quality of metadata. Since people naturally use different labels for same things, it is essential to relate multiple synonyms (implemented as alternative or hidden labels) to taxonomy concepts.

We have also explored the requirements of the system to the organizational structure, resourcing and processes and proved viability of the system in the existing enterprise. Furthermore, we have concluded that the system is sufficiently extensible by implementing extensions in the Oracle use cases. Specifically, customized tagging user interface realizing the custom concept of taxonomy prefixes has been integrated.