Motivation
At the age of information overflow it is not easy to find exactly what we are searching for. Even if modern search engines try their best to choose the most valuable search results, they do not have enough information to show only desirable results. It is mostly because of simplicity of a search query. Searcher does not think much about making query specific and well-formed at first query. An average query consists of 2-3 keywords that may exactly describe searcher’s problem, but it may be not enough. The query may have many others meanings, because words themselves are ambiguous by nature and may have many meanings that leads to wrong meaning of the query. Method We proposed a method which extends query by few keywords from the search activity context. We get search activity context from the context of an application related to query. Application context is represented by application activity context. To realize this method, at first we need to capture and model actual activity context. Each application has its own purpose, so each application has different activity context. Therefore we need to model activity context for each application. Secondly, we need to find an application that is related to query. Context of this application is the same as, or very similar to, search context. Thirdly, we need to choose most relevant keywords from search activity context to extend the query. Our method therefore consists of the following steps:
To capture searcher’s actions we proposed and developed an activity logger aimed at several selected activity types that we considered as the most influential for our evaluating scenario – novice researcher searching relevant research resources. To log the most important sections of searcher’s activity we designed and realized an activity logger tool that consists of three independent parts each monitoring specific activity type to particular level of detail: Tabber for basic standalone applications monitoring; Wordik for more detailed monitoring of specific application, which we consider as important activity context for the search in our scenario of novice researcher – a text editor, in our prototype MS Word; Annota for monitoring searcher’s activity related to applications executed within a browser. Purpose of described three modules is an evaluation of our proposed method. New specific modules can be added to capture activity from other applications on various level of detail. To determine, which application is connected to the query we propose two strategies that consider various types of connections between an application and a query: syntactic comparison and semantic comparison. Syntactic comparison is used to find out how often keyword in a query occurs in application context. We try to find conjunction between the query keywords and an application context by comparing stemmed keywords and also keywords in natural language. Semantic comparison is used to find a connection between words that have the same or similar meaning but different lexical form. Such connection specifies meaning of a query. There are many types of semantic relations. In our research we aim on synonyms and hyponyms of a query. To get synonyms and hyponyms of the actual query we use Wordnet.API (wordnet.princeton.edu/). We can tell that an application is related to a query only if we are able to find a connection by one of above described strategies. An application related to the query provides larger context of what a searcher is trying to find because context of the application related to the query is the same as, or similar to, search context. We extend the query by few most relevant keywords from this extended search context. The application context consists of lots of words and only few of them are relevant, so we need to assign weight to each keyword to be able to choose the most relevant keywords. We proposed two types of weights: weight in general and weight related to a query. Weight in general reflects a keyword relevance to the current application context. We need to assign weight to each application activity content. Different homonyms have different semantic connections which means that extending query by semantically connected keyword lessen search ambiguity; hence, this keyword gets maximum weight. Evaluation We hypothesize that searcher’s information need comes in many cases from a recently used application. To prove this we proposed an experiment with aim to find a connection between an application and a query and evaluate accuracy of the connection. We asked 9 students of a technical university to log their activity while using their own computers for normal activities. Moreover, we asked them provide us explicit feedback – when searching on the Web they picked out an application related to their actual query. We modified general search engine (we selected Google search) results page and allowed the searcher to select the application that activated the search (if any) by clicking on its name from the list of possible applications. They selected that 88.14% of queries are related to an application they have recently used, i.e. searcher’s search need comes in most cases from an application used recently. To find a related application to a query we firstly needed to find out if there is a connection between the query and the application. Syntactic comparison with semantic relations found a connection for 62.8% queries. The most of connections to the related application (58.7%) were found using just syntactic comparison. We also evaluated searches that had been related to no application. For these queries the task was to find no connection to an application by our strategies and we found no connection for 82% searches.
Publications
Project web page
|
|
||||||||||||||||||||||||||||||||||||||||||||||
|
|