Utilizing Lightweight Semantics for Search Context Acquisition in Personalized Search

Tomáš Kramár

Doctoral thesis project supervised by prof. Mária Bieliková


Motivation and Goals

In order to answer a query, a search engine has to find full-text matches in the background document corpus and then order the documents so that the more relevant results are placed higher in the list. An ideal ranking function should understand user’s intent – the goal that is expressed via the query keywords, and order the results such that the results matching user’s intent are ranked higher. To understand the user’s intent, we need to understand semantics of the queries and the documents. There are various approaches that leverage semantics, but they are heavy-weight, require external knowledge bases and are very hard to implement in a highly dynamic, opencorpus domain, such as the Web.

Full-fledged semantic annotations contain rich data about the underlying document entities, including types and links to connected semantic information. It is this richness and connection that makes it so useful and that allows complex understanding of the document and information inference. On the other hand, the so called lightweight semantics is based on document keywords which are not linked, nor do they have any attributes beside the textual label. Their main advantage is that they are freely available in any text and even without additional attributes, they still relay some semantic, because they describe the underlying document.

Based on this insight, we formulated following thesis goals:

  • to thin the gap between Semantic Search and Personalized Search in open-corpus domain and propose a context model that can utilize the lightweight semantics for search personalization.
  • to analyze possible sources of search context for the purposes of acquisition of immediate goals of the user and to devise methods that extend this model with additional information, utilizing the inherent lightweight semantics.

Results

The core contributions of our work are:

  • Search context model based on document metadata. We have designed a context model that captures the lightweight semantics in form of the ubiquitous document metadata. The main contribution of this model is the flexibility that it offers, it can be both scoped to create new perspectives on the search context or it can be enhanced and extended to accommodate more information. This model supports a limited set of operations that makes it possible to create a linear combinations of various search context perspectives and creation of more powerful context models.
  • Introducing implicit feedback. We have introduced implicit feedback as measurment of search result relevance and devised a method that can estimate document relevance from various factors of user interaction. The idea of implicit feedback per se is not new, but we are proposing it as an integral part of the search context model. We have shown that the document relevance as perceived by the implicit feedback indicators is an important factor in the search context acquisition, either when searching for similar users, or when finding related searches for the current query.
  • Perspective on behavioral patterns in search. We have discussed behavioral patterns in search as another possible source of capturing user’s goals in the search context model. We have described a study of public log of a search engine and showed that the notion of behavioral patterns is different from what we might think intuitively and that there are users who exhibit behavioral patterns, and there is an equally large group of users who do not.
  • Method for segmenting search queries into sessions. As part of our focus on short-term goals, we have identified that it is important to know when the user changed her search goal. We have proposed a method that can detect that change and can cluster queries with the same underlying search goals. Main contribution of this method is that it is based on the lightweight semantics that is captured by user’s actions and as we have shown with an experiment, can outperform other existing non-semantical approaches. While this method is completely independent of the proposed search context model, it can be used to scope the context model and offer a perspective on shortterm goals.
  • Method for expanding the search context with data from similar users. Using the document metadata as main signal in the search context is heavily dependent on the metadata quality. Unfortunately, the natural language processing methods as of today are not yet powerful enough to provide 100% relevant metadata. In order to deal with this problem, we proposed a model expansion method that utilizes an artificial social network to gather more data. This way, the relevant metadata in the model have a chance of piling up, and subside the lower quality metadata.

Conclusions

In this thesis we have provided a perspective on multiple sources of context, various sources that tell more about the underlying query goals. This is only a beginning, because the human nature and the vast space of the Web are a nest of possibilities for another sources of search context.

Having several sources of search context brings in even more questions. Which source of search context should we choose? Is there a single best source of search context? Is it quite possible that there are some circumstances where one source is better than another. Multiple sources of search context can be combined as multiple pieces of evidence – as we have shown in the case of our search context model and its scopes. In general, the resulting model is a linear combination of multiple search context sources, each considered with a different weight. Finding the optimal combination of these sources is a challenge for future research.

The thesis extended abstract is available in the Bulletin of the ACM Slovakia.

Selected publications

T. Kramár, M. Barla, M. Bieliková
Personalizing Search Using Socially Enhanced Interest Model, Built From the Stream of User’s Activity. Journal of Web Engineering, 12(1-2): 65–92, 2013.
T. Kramár, M. Barla, M. Bieliková
Disambiguating Search by Leveraging a Social Context Based on the Stream of User’s Activity.. In Proceedings of UMAP 2010 – 18th international conference on User Modeling, Adaptation, and Personalization, pages 387–392. Springer-Verlag, 2010.
T. Kramár
Towards Contextual Search: Social Networks, Short Contexts and Multiple Personas. In Proceedings of UMAP 2011 – 19th international conference on User modeling, adaption, and personalization, pages 434–437. Springer-Verlag, 2011.
T. Kramár, M. Bieliková
Dynamically Selecting an Appropriate Context Type for Personalisation. In Proceedings of RecSys 2012 – 6th ACM conference on Recommender systems, pages 321–324. ACM, 2012
T. Kramár, M. Bieliková
Context of Seasonality in Web Search. In Proceedings of ECIR 2014 – 36th European Conference on Information Retrieval, pages 644-649, Springer-Verlag, 2014
T. Kramár, M. Bieliková
Detecting Search Sessions Using Document Metadata and Implicit Feedback. In WSCD 2012 Workshop on Web Search Click Data 2012. 2012.
T. Kramár, M. Bieliková
Analysing Temporal Dynamics in Search Intent. In Proceedings of SOFSEM 2013 – 39th Conference on Current Trends in Theory and Practice of Computer Science, pages 54–63. Institute of Computer Science AS CR, 2013.

to Homepage to Teaching to the Top

Home
Research
Projects
Publications
Books
SCM
Teaching
Links
Last updated:
Mária Bieliková bielik [zavináč] fiit-dot-stuba-dot-sk
Design © 2oo1 KoXo