Navigation Leads for Exploratory Search and Navigation in Digital Libraries
Róbert Móro
Doctoral thesis project supervised by prof. Mária Bieliková
|
Motivation and Goals
The prevalent search paradigm on the Web is a keyword-based search. People describe their
information needs to a search engine by a sequence of keywords and get a simple list of results
in return. This approach works reasonably well for simple information retrieval tasks such as fact finding.
However, when the information seeking problem at hand is more complex, e.g.,
requires query reformulation or exploring multiple sources to find relevant information,
the task of selecting the relevant links and navigating among the documents becomes difficult.
In our work, we focused on the domain of digital libraries and specifically on the task of
researching a new domain, which the researcher novices often face. This is an example
of an exploratory search task which requires the use of exploratory search strategies,
is open-ended, iterative, and can span over multiple search sessions.
Our main goal was to support the researcher novices in their task. Specifically,
we formulated two goals:
- Goal 1: Explore the similarities and the differences between the domain of digital libraries and the "wild" Web
and utilise the data characteristic for the former in order to improve the quality of the extracted keywords,
which can be used in the process of exploratory search and navigation. Examples of the additional metadata characteristic for the digital
libraries domain that could be utilized are connections between the documents based on the co-authorship or co-citation.
- Goal 2: Support iterative query formulation in the process of (exploratory) search
considering previous information needs of the users and their feedback and the subsequent navigation
in the information space of a digital library of research articles through a series of navigational
steps with the goal of improving domain sense-making and increasing important concepts coverage and understanding.
This goal is based on the fact that the initial phase of query formulation is crucial for a successful search,
especially if the users start with ill-defined information needs. Therefore, instead of forcing them to verbalise
their needs at the beginning of the search, the search system should allow iterative querying and provide leads
(cues) on how to reformulate the initial query.
We addressed these goals in our work by examining the following three research questions:
- Does the use of domain-specific metadata in the process of keyword extraction help to
improve the overall quality of the extracted keyword set?
- Does considering the navigation history help to identify relevant terms that are useful for exploratory
search and navigation?
- Does considering the navigational value of terms help to identify relevant terms that are useful for
exploratory search and navigation?
Results
Addressing the thesis goals, our main outcomes and contributions are:
-
A proposal of a general model of exploratory search and navigation, which we extended with our proposed approach of navigation leads
that serve as navigation starting points and means of query refinement.
-
Method of keywords extraction using citation and co-citation analysis used in the process of identification
of navigation lead candidates. We made a contribution to keywords extraction in demonstrating that
using the set of selected citations and co-citations (based on the proposed selection rules) improves the
precision of the extracted keywords when compared with the TF-IDF method and is capable of finding
new keywords that would not have been otherwise extracted.
-
Method of navigation leads selection focusing on the problem of assessment of their navigational value
that would convey their information scent. We made a contribution in navigational value assessment for
exploratory search and navigation by considering navigation history of the users and
the corpus relevance of the candidate terms employing the topic clustering.
We showed in a quantitative user study that view navigation leads selected from the navigation history
were valuable for the users, since they were selected more frequently than other terms. In
a quantitative synthetic experiment aimed at evaluating the corpus relevance, we showed that taking
corpus relevance into account during the document navigation leads selection improves the coverage of
the (relevant) documents in the domain, which can lead to its better understanding by the users.
A significant amount of work on the dissertation was devoted to development and maintenance of Annota.
We designed its A/B testing functionality turning Annota into an evaluation platform capable of testing various scenarios
and methods on different groups of Annota users based on their activity within the system. We also extended
its core functionality; all this allowed us to collect data from more users and in a better quality.
Thus, Annota as an evaluation platform and its related dataset
that was collected over four years of its use, can be considered a partial outcome of this work
and have potential to be used for evaluation also in the future works.
Conclusions
In our work, we examined human information behaviour and more specifically information
seeking as one its aspects with focus on exploratory search. We were interested in the scenario
of a researcher novice that is supposed to get acquainted with a new domain with the help of resources
available in the digital libraries.
One of the issues we had to tackle, was the evaluation of the proposed exploratory search approaches.
This remains an open research problem. There is in general a lack of longitudinal studies that would examine
the natural behaviour of the users as they use the system over longer periods of time. In our
work, we tried to address that by developing the bookmarking system Annota which serves also
as an evaluation platform.
Other problem with the evaluation is the lack of standardised datasets which leads to low reproducibility
and replicability of the research in the exploratory search. Although there are many
available, only few contain enough of the contextual and task information to evaluate, e.g.,
the coverage of relevant documents during the exploratory search. For this reason, we have decided to publish
our dataset from Annota that contains (besides the domain model) also user interactions, such as formulated queries of the
users and which documents they bookmarked, thus giving us a feedback suggesting the relevance
of the documents.
Related to the lack of the datasets is that the user studies demand a lot of resources with respect
to the time and participants. However, the navigation in an information space can to some
extent be modelled artificially; in our work, we explored this possibility which we expect
to become more widespread in the exploratory search evaluation in the future.
The thesis extended abstract is available in the Bulletin of the ACM Slovakia.
Selected publications
- M. Dragúňová, R. Móro, M. Bieliková
- Measuring Visual Search Ability on the Web. In IUI 2017: Proc. of the 22nd Int. Conf. on Intelligent User Interfaces Companion, ACM Press, pp. 97-100, 2017.
- T. Matlovič, P. Gašpar, R. Móro, J. Šimko, M. Bieliková
- Emotions Detection Using Facial Expressions Recognition and EEG. In SMAP 2016: Proc. of the 11th Int. Workshop on Semantic and Social Media Adaptation and Personalization, pages 18–23, IEEE CS, Los Alamitos, 2016.
- R. Móro, M. Vangel, M. Bieliková
- Identification of Navigation Lead Candidates Using Citation and Co-Citation Analysis. In SOFSEM 2016: Proc. 42nd Int. Conf. on Current Trends in Theory and Practice of Computer Science, LNCS 9587, pp. 556–568, Springer, Berlin, Heidelberg, 2016.
- R. Móro, M. Bieliková
- Navigation Leads Selection Considering Navigational Value of Keywords. In WWW 2015: Proc. of the 24th Int. Conf. on World Wide Web Companion, pages 79–80, IW3C2, Geneva, 2015.
- M. Bieliková, M. Šimko, M. Barla, J. Tvarožek, M. Labaj, R. Móro, I. Srba, J. Ševcech
- ALEF: from Application to Platform for Adaptive Collaborative Learning. In Recommender Systems for Technology Enhanced Learning: Research Trends & Applications, pp. 195-225. Springer, 2014.
- M. Holub, R. Móro, J. Ševcech, M. Lipták, M. Bieliková
- Annota: Towards Enriching Scientific Publications with Semantics and User Annotations. D-Lib Magazine, 20(11/12), 2014.
- R. Móro, M. Bieliková, R. Burger
- Facet Tree for Personalized Web Documents Organization. In WISE 2014: Proc. of the 15th Int. Conference on Web Information Systems Engineering, LNCS 8786, pages 372–387, Springer, 2014.
- J. Ševcech, R. Móro, M. Holub, M. Bieliková
- User Annotations as a Context for Related Document Search on the Web and Digital Libraries. Informatica, 38(1): 21–30, 2014.
- S. Molnár, R. Móro, M. Bieliková
- Trending Words in Digital Library for Term Cloud-based Navigation. In SMAP 2013: Proc. of the 8th Int. Workshop on Semantic and Social Media Adaptation and Personalization, pp. 53–58, IEEE CS, Los Alamitos, 2013.
- R. Móro, M. Bieliková
- Personalized Text Summarization Based on Important Terms Identification. In: DEXA 2012: 23rd International Workshop on Database and Expert Systems Applications. Los Alamitos: IEEE Computer Society, pp. 131-135, 2012.
- R. Móro, I. Srba, M. Unčík, M. Bieliková, M. Šimko
- Towards Collaborative Metadata Enrichment for Adaptive Web-Based Learning. In Proceedings of IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology - WI/IAT ’11, pages 106-109. IEEE, 2011.
|
to Homepage |
|
to Teaching |
|
to the Top |
|
|
|
|