Utilizing Lightweight Semantics for Search Context Acquisition in Personalized Search
Tomáš Kramár
Doctoral thesis project supervised by prof. Mária Bieliková
|
Motivation and Goals
In order to answer a query, a search engine has to find full-text matches in the background document corpus and then order the documents so that the more relevant results are placed higher in the list. An ideal ranking function should understand user’s intent – the goal that is expressed via the query keywords, and order the results such that the results matching user’s intent are ranked higher. To understand the user’s intent, we need to understand semantics of the queries and the documents. There are various approaches that leverage semantics, but they are heavy-weight, require external knowledge bases and are very hard to implement in a highly dynamic, opencorpus domain, such as the Web.
Full-fledged semantic annotations contain rich
data about the underlying document entities, including
types and links to connected semantic information. It is
this richness and connection that makes it so useful and
that allows complex understanding of the document and
information inference. On the other hand, the so called lightweight semantics
is based on document keywords which are not linked,
nor do they have any attributes beside the textual label.
Their main advantage is that they are freely available in
any text and even without additional attributes, they still
relay some semantic, because they describe the underlying
document.
Based on this insight, we formulated following thesis goals:
- to thin the gap between Semantic Search and Personalized Search in
open-corpus domain and propose a context model that can utilize the
lightweight semantics for search personalization.
- to analyze possible sources of search context for the purposes of acquisition
of immediate goals of the user and to devise methods that extend this model
with additional information, utilizing the inherent lightweight semantics.
Results
The core contributions of our work are:
- Search context model based on document metadata. We have designed a
context model that captures the lightweight semantics in form of the ubiquitous
document metadata. The main contribution of this model is the flexibility that it
offers, it can be both scoped to create new perspectives on the search context or it can
be enhanced and extended to accommodate more information. This model supports
a limited set of operations that makes it possible to create a linear combinations of
various search context perspectives and creation of more powerful context
models.
- Introducing implicit feedback. We have introduced implicit feedback as measurment
of search result relevance and devised a method that can estimate document
relevance from various factors of user interaction. The idea of implicit feedback per
se is not new, but we are proposing it as an integral part of the search context
model. We have shown that the document relevance as perceived by the implicit
feedback indicators is an important factor in the search context acquisition, either
when searching for similar users, or when finding related searches for the current
query.
- Perspective on behavioral patterns in search. We have discussed behavioral
patterns in search as another possible source of capturing user’s goals in the search
context model. We have described a study of public log of a search engine and
showed that the notion of behavioral patterns is different from what we might think
intuitively and that there are users who exhibit behavioral patterns, and there is an
equally large group of users who do not.
- Method for segmenting search queries into sessions. As part of our focus
on short-term goals, we have identified that it is important to know when the user
changed her search goal. We have proposed a method that can detect that change
and can cluster queries with the same underlying search goals. Main contribution
of this method is that it is based on the lightweight semantics that is captured
by user’s actions and as we have shown with an experiment, can outperform
other existing non-semantical approaches.
While this method is completely independent of the proposed search context
model, it can be used to scope the context model and offer a perspective on shortterm
goals.
- Method for expanding the search context with data from similar
users. Using the document metadata as main signal in the search context is heavily dependent
on the metadata quality. Unfortunately, the natural language processing
methods as of today are not yet powerful enough to provide 100% relevant metadata.
In order to deal with this problem, we proposed a model expansion method
that utilizes an artificial social network to gather more data. This way, the relevant
metadata in the model have a chance of piling up, and subside the lower quality
metadata.
Conclusions
In this thesis we have provided a perspective on multiple sources of
context, various sources that tell more about the underlying query goals. This is only a
beginning, because the human nature and the vast space of the Web are a nest of
possibilities for another sources of search context.
Having several sources of search context brings in even more questions. Which
source of search context should we choose? Is there a single best source of search
context? Is it quite possible that there are some circumstances where one source is
better than another. Multiple sources of search context can be combined as multiple
pieces of evidence – as we have shown in the case of our search context model
and its scopes. In general, the resulting model is a linear combination of multiple
search context sources, each considered with a different weight. Finding the optimal
combination of these sources is a challenge for future research.
The thesis extended abstract is available in the Bulletin of the ACM Slovakia.
Selected publications
- T. Kramár, M. Barla, M. Bieliková
- Personalizing Search
Using Socially Enhanced Interest Model, Built From the Stream of User’s
Activity. Journal of Web Engineering, 12(1-2): 65–92, 2013.
- T. Kramár, M. Barla, M. Bieliková
- Disambiguating Search by
Leveraging a Social Context Based on the Stream of User’s
Activity.. In Proceedings of UMAP 2010 – 18th international
conference on User Modeling, Adaptation, and Personalization,
pages 387–392. Springer-Verlag, 2010.
- T. Kramár
- Towards Contextual Search: Social Networks, Short
Contexts and Multiple Personas. In Proceedings of UMAP 2011
– 19th international conference on User modeling, adaption, and
personalization, pages 434–437. Springer-Verlag, 2011.
- T. Kramár, M. Bieliková
- Dynamically Selecting an Appropriate
Context Type for Personalisation. In Proceedings of RecSys 2012
– 6th ACM conference on Recommender systems, pages 321–324.
ACM, 2012
- T. Kramár, M. Bieliková
- Context
of Seasonality in Web Search. In
Proceedings of ECIR 2014 – 36th European Conference on
Information Retrieval, pages 644-649, Springer-Verlag, 2014
- T. Kramár, M. Bieliková
- Detecting Search Sessions Using Document
Metadata and Implicit Feedback. In WSCD 2012 Workshop on
Web Search Click Data 2012. 2012.
- T. Kramár, M. Bieliková
- Analysing Temporal Dynamics in Search
Intent. In Proceedings of SOFSEM 2013 – 39th Conference on
Current Trends in Theory and Practice of Computer Science,
pages 54–63. Institute of Computer Science AS CR, 2013.
|
to Homepage |
|
to Teaching |
|
to the Top |
|
|
|
|