Harnessing Manpower for Creating Semantics

Jakub Šimko

Doctoral thesis project supervised by prof. Mária Bieliková

Motivation and Goals

The effective information processing (e.g. search, organization) of the heterogeneous information spaces (especially the Web) requires metadata layer above the resources. However, the acquisition of precise descriptive resource metadata and domain models is still challenging. The quantity of resources that needs metadata is high compared to capacities of individuals that are able to provide the metadata manually. The crowdsourcing has emerged as an alternative to expert-based and automated semantics acquisition approaches. In this paradigm, the crowd of multiple humans performs a human intelligence task (a task hard for computer), such as metadata acquisition. The crowd, composed of laics instead of experts, can perform tasks in much greater scale. The wrong solutions of individuals in the crowd are filtered out collaboratively, preserving the quality of the crowd's output. The crowdsourcing operates with different incentives for crowd to participate: micro-payments, goodwill or alignment with personal goals. One more of such incentives, is represented by gaming experience in the so called games with a purpose (GWAP). In these games, the tasks for the crowd are incorporated into the game mechanics. The GWAPs are the scope of this thesis. We recognize them as a part of the crowdsourcing approach family, analyze their role as tools for resource metadata and domain models acquisition as well as we analyze their design aspects and issues.

We have identified several open issues:

  • There is still a lack of sufficient semantics for domain models, especially in specialized domains (as opposite to the well establishing general domain models of linked data).
  • The ever increasing number of multimedia resources (images, music) is not covered with sufficient descriptive metadata creation (in both quantity and quality)
  • The GWAP approaches have trouble to solve more specific human intelligence tasks for which only small group of sufficiently experienced players is available.
  • The high quality solutions of expert players are often filtered out. The detection and use of player expertise is not used in GWAPs.
  • The GWAP design and development is a non-trivial task and there is only a little of existing guidance on how to create these games. The GWAPs are created ad-hoc and have to deal with the cold-start problems (they fail to provide feedback to the players according to the quality of artifacts they are producing), popularity (the games look more or less like a work) and player cheating problems (which hamper not only the fairness of the game but also damages their ``purposeful'' output value).

Based on the identified open problems, we have formulated our goals as follows:

  • Goal 1: Add to semantics acquisition with new effective and functioning, GWAP-based approaches, and if possible, for specific domains, where the lack of the semantics is more severe and where only limited number of players is available.
  • Goal 2: Improve the effectiveness of games with a purpose by developing design principles, independent on the problem domain, which GWAP deals with. In particular, we focus on the possibilities of reducing the cold start problems of GWAPs, preventing malicious player behavior and taking advantage of players with more expertise and confidence for solving the game's purpose.


In the field of semantics acquisition, we reached the following results:

  • Game-based method for term relationship acquisition. We have devised two GWAPs called the Little Search Game and the TermBlaster for gathering gameplay logs, used further in assessing relationships between terms. Both games utilize negative search principles (reducing the original result set by introducing negative search terms to the query). In the game the player is presented with a query and has to write or pick negative search terms to add them to the query. Player's task is to minimize the result count of the query. Using the decisions of the players, our method is then able to determine, how the playing crowd perceives the relationships between terms. The primary contribution of this method is term relationship acquisition, which we demonstrated through live experiments. The method also contributes to the general GWAP design theory with unique single player design (radically reducing the cold-start problem) and demonstrates the use of our posterior anti-cheating mechanism. We show the potential of the method within general but also more specific domain (software engineering education), which is not usual with existing GWAPs.
  • Game-based method for image tag acquisition. For this approach, we devised the card game PexAce, where player annotates images featured in it. The game is a modification of a popular board game called Concentration (or Pexeso) where player's task is to uncover identical pairs of images from a set of concealed cards (by flipping pairs of cards). In this game originally designed as memory game, players make textual notes on what he has seen on the image, which helps the player in finishing the game (featuring the helper-artifact scheme). Player annotations (free texts) are then decomposed to terms and after validation across other players assigned to images as tags. The main contribution of PexAce is the working image tagging ability, which we evaluate through live experiments. The game is single-player and suffers of no cold-start problems. Using its logs, we demonstrated the feasibility of player expertise exploitation for improving game output quality. We also demonstrated the possible use of PexAce for annotation of personal image archives - a specific environment where cross-player validation cannot be sufficiently used.
  • Game-based method for music metadata validation through the game CityLights. This approach sees the metadata acquisition process as a filtration of a larger, poor quality metadata set rather than as the creation of new metadata. In the game, the player encounters a music track and several metadata sets, one of which was previously assigned to this track by other means. The player has to correctly guess this set to receive points. According to the behavior of the players during the game, the method is able to asses the relative quality of tags assigned to the music tracks and filter out wrong or confirm right metadata. The contribution of CityLights is the validation of music metadata. Meanwhile, its principle can be straightforwardly applied to other (multimedia) resource types as well. The game is single-player and suffers of no cold-start problems. The game also demonstrate the use of a betting mechanism for explicit acquisition of player confidence, which can be used to improve game output.

In the field of GWAP design, we reached the following results:

  • We introduced a system of GWAP classification according to design features of the game, which we claim as our contribution to the field, since no system exists yet in such extent.
  • We introduced approaches for reducing the cold-start problems of games with a purpose: GWAP artifact validation schemes of "helper artifacts" (featured in PexAce) and "validation game" (featured in CityLights), which enable a GWAP to be single-player which reduces the initial problem of low number of active players during the initial phases of the GWAP deployment. We demonstrated these schemes in specific environments of our games and also outline the suggestions for their general use in future GWAPs.
  • An universal a posteriori cheating-detection scheme used to detect GWAP players with malicious behavior. Our approach takes into account a quality of the artifacts produced by the tested player, measured according to other players and a score gain of the tested player. The output of the approach is the list of suspicious players, whose point gains do not correlate with the quality of artifacts they "create" during the game. The actual semantics of the artifacts is transparent to our approach, which is therefore universally applicable in any GWAP.
  • We proposed ways for acquisition of the information on player competence. In our experiments with PexAce game logs, we demonstrated the usability of information about player's competences (skills for delivering the desired value in the GWAP). The idea was to assign the more skilled players with more importance in collaborative filtering of the artifacts they produced, in order to pursue the artifact quality, spare some redundant work and speed up the acquisition process.


Nowadays, a large family of approaches - the crowdsourcing - has been able to address the issues of costs, but for a price of uncertain quality delivered results. The crowdsourcing approaches thrive to motivate their workers by alternative motivational factors, optimize the labor deployment and attempts to acquire more specific metadata. For almost a decade the games with a purpose (within the crowdsourcing) were an agenda for a constantly growing number of researchers.

In this thesis, we have reviewed GWAPs doing jobs for various domains, mainly (but not only) within the semantics acquisition. We have also mapped the aspects of current design practices regarding the GWAPs and identified common design issues, which each GWAP must address. As a most severe, we consider the potential cold start problem, sourcing from the traditional multi-player validation schemes. To overcome this, we introduced alternative schemes such as helper artifacts and validation game.

We also point out the open possibilities for expert player detection in GWAPs and its subsequent utilization in solution filtering and task assignment process.

We were able to deliver several GWAPs for semantics acquisition, working for various domains. We also demonstrated the use of some of them in specific domains. We claim it is possible in general. To achieve this in other GWAP projects, we suggest to introduce additional incentives to players or to recruit the players with respect to their expertise.

Looking ahead, we see the future of the field (GWAP and general crowdsourcing) as optimistic. As anecdotically envisioned by Luis von Ahn, in the future, every manual task will be done by machines. The people would then be split into two classes: those, who will create the GWAPs and those, who will only "eat, sleep and play" (A phrase acronym of which gave a name to Ahn's most renown game, the ESP), solving the human intelligence tasks, which machines won't be able to solve. Although this is vastly exaggerated, a more realistic point can be made out of it. The overall technological advances of human civilization, automation of manual tasks etc. may potentially relieve more and more human laborers. On the other hand, unless a major breakthrough is made in a field of practical artificial intelligence, there will still be an increasing demand for performing human intelligence tasks (e.g. programming the manual workers). Hence, the crowdsourcing will be a convenient option for saturating the tensions. This would come anyway though, but the crowdsourcing with all its methods, will make it an more efficient endeavor.

The thesis extended abstract is available in the Bulletin of the ACM Slovakia.

Selected publications

Šimko, J., Tvarožek, M., Bieliková, M.
Semantics discovery via human computation games.. Int. J. on Semantic Web and Information Systems, 7(3):23-45, 2011.
Šimko, J., Tvarožek, M., Bieliková, M.
Human Computation: Single-player Annotation Game for Image Metadata.. Int. J. on Human-Computer Studies. 71(10):933-945, 2013.
Šimko, J., Tvarožek, M., Bieliková, M.
Little Search Game: Term Network Acquisition Via a Human Computation Game.. In Proc. of the 22th ACM conf. on Hypertext and hypermedia. ACM, New York, NY, USA, 57-62, 2011.
Šimko, J., Bieliková, M.
Games with a Purpose: User Generated Valid Metadata for Personal Archives.. In Proc. of the 2011 Sixth Int. W. on Semantic Media Adaptation and Personalization (SMAP ’11). IEEE Computer Society, Washington, DC, USA, 45-50, 2011.
Šimko, J., Bieliková, M.
Personal Image Tagging: a Game-based Approach. In Proc. of the 8th Int. Conf. on Semantic Systems 2012 Graz, Austria. ACM, New York, NY, USA, 88-93, 2012
Dulačka, J., Šimko, J., Bieliková, M.
Validation of Music Metadata via Game with a Purpose.. In Proc. of the 8th Int. Conf. on Semantic Systems 2012 Graz, Austria. ACM, New York, NY, USA, 88-93, 2012

to Homepage to Teaching to the Top

Last updated:
Mária Bieliková bielik [zavináč] fiit-dot-stuba-dot-sk
Design © 2oo1 KoXo