Over the past 15 years, along with the success of the "social" web, online communities have progressively produced massive amounts of user-generated content collaboratively.

While some of these communities are highly structured and produce high-quality content (e.g., open- source software, Wikipedia), the level of discussions found within less structured forums remains highly variable. Coupled with their explosive growth, the lower quality of online forums makes it hard to retrieve relevant and valuable answers to user search queries, and subsequently diminishes the social and economic value of this content.

The objective of OCKTOPUS is to increase the potential social and economic benefit of this user-generated content, by transforming it into useful knowledge which can be shared and reused broadly.

One of the most visible and easily-understandable output of the project is a demonstration platform which can be used to input a newly-formulated question, search online forums for a similar already-answered question, and display a unique user-generated answer associated with these similar questions.

This demonstration platform is built around the idea that finding relevant high-quality answers can be broken down in two steps:

  1. Triage user-generated content to extract gold (knowledge structured as pairs of questions and answers) from ore (random discussions)
  2. Given a newly-formulated question, retrieve relevant similar questions within the gold.

OCKTOPUS therefore investigates mainly, to what extent can newer data mining techniques based on the proper assessment 1) of the organizational traits of online communities, 2) of the tree-structure of online discussions, and 3) of the temporal dynamics of large typed semantic user-user graphs, help improve the automatic classification and triage of unstructured online content?