Mária Bieliková > Teaching

Navigation Leads for Exploratory Search and Navigation in Digital Libraries

Róbert Móro

Doctoral thesis project supervised by prof. Mária Bieliková

Motivation and Goals

The prevalent search paradigm on the Web is a keyword-based search. People describe their information needs to a search engine by a sequence of keywords and get a simple list of results in return. This approach works reasonably well for simple information retrieval tasks such as fact finding. However, when the information seeking problem at hand is more complex, e.g., requires query reformulation or exploring multiple sources to find relevant information, the task of selecting the relevant links and navigating among the documents becomes difficult.

In our work, we focused on the domain of digital libraries and specifically on the task of researching a new domain, which the researcher novices often face. This is an example of an exploratory search task which requires the use of exploratory search strategies, is open-ended, iterative, and can span over multiple search sessions.

Our main goal was to support the researcher novices in their task. Specifically, we formulated two goals:

Goal 1: Explore the similarities and the differences between the domain of digital libraries and the "wild" Web and utilise the data characteristic for the former in order to improve the quality of the extracted keywords, which can be used in the process of exploratory search and navigation. Examples of the additional metadata characteristic for the digital libraries domain that could be utilized are connections between the documents based on the co-authorship or co-citation.
Goal 2: Support iterative query formulation in the process of (exploratory) search considering previous information needs of the users and their feedback and the subsequent navigation in the information space of a digital library of research articles through a series of navigational steps with the goal of improving domain sense-making and increasing important concepts coverage and understanding. This goal is based on the fact that the initial phase of query formulation is crucial for a successful search, especially if the users start with ill-defined information needs. Therefore, instead of forcing them to verbalise their needs at the beginning of the search, the search system should allow iterative querying and provide leads (cues) on how to reformulate the initial query.

We addressed these goals in our work by examining the following three research questions:

Does the use of domain-specific metadata in the process of keyword extraction help to improve the overall quality of the extracted keyword set?
Does considering the navigation history help to identify relevant terms that are useful for exploratory search and navigation?
Does considering the navigational value of terms help to identify relevant terms that are useful for exploratory search and navigation?

Results

Addressing the thesis goals, our main outcomes and contributions are:

A proposal of a general model of exploratory search and navigation, which we extended with our proposed approach of navigation leads that serve as navigation starting points and means of query refinement.
Method of keywords extraction using citation and co-citation analysis used in the process of identification of navigation lead candidates. We made a contribution to keywords extraction in demonstrating that using the set of selected citations and co-citations (based on the proposed selection rules) improves the precision of the extracted keywords when compared with the TF-IDF method and is capable of finding new keywords that would not have been otherwise extracted.
Method of navigation leads selection focusing on the problem of assessment of their navigational value that would convey their information scent. We made a contribution in navigational value assessment for exploratory search and navigation by considering navigation history of the users and the corpus relevance of the candidate terms employing the topic clustering. We showed in a quantitative user study that view navigation leads selected from the navigation history were valuable for the users, since they were selected more frequently than other terms. In a quantitative synthetic experiment aimed at evaluating the corpus relevance, we showed that taking corpus relevance into account during the document navigation leads selection improves the coverage of the (relevant) documents in the domain, which can lead to its better understanding by the users.

A significant amount of work on the dissertation was devoted to development and maintenance of Annota. We designed its A/B testing functionality turning Annota into an evaluation platform capable of testing various scenarios and methods on different groups of Annota users based on their activity within the system. We also extended its core functionality; all this allowed us to collect data from more users and in a better quality. Thus, Annota as an evaluation platform and its related dataset that was collected over four years of its use, can be considered a partial outcome of this work and have potential to be used for evaluation also in the future works.

Conclusions

In our work, we examined human information behaviour and more specifically information seeking as one its aspects with focus on exploratory search. We were interested in the scenario of a researcher novice that is supposed to get acquainted with a new domain with the help of resources available in the digital libraries.

One of the issues we had to tackle, was the evaluation of the proposed exploratory search approaches. This remains an open research problem. There is in general a lack of longitudinal studies that would examine the natural behaviour of the users as they use the system over longer periods of time. In our work, we tried to address that by developing the bookmarking system Annota which serves also as an evaluation platform.

Other problem with the evaluation is the lack of standardised datasets which leads to low reproducibility and replicability of the research in the exploratory search. Although there are many available, only few contain enough of the contextual and task information to evaluate, e.g., the coverage of relevant documents during the exploratory search. For this reason, we have decided to publish our dataset from Annota that contains (besides the domain model) also user interactions, such as formulated queries of the users and which documents they bookmarked, thus giving us a feedback suggesting the relevance of the documents.

Related to the lack of the datasets is that the user studies demand a lot of resources with respect to the time and participants. However, the navigation in an information space can to some extent be modelled artificially; in our work, we explored this possibility which we expect to become more widespread in the exploratory search evaluation in the future.

The thesis extended abstract is available in the Bulletin of the ACM Slovakia.

Selected publications