Editors: Yannis Manolopoulos, Pavol Navrat
ABSTRACT: The extensible markup language (XML) is rapidly becoming a dominating technology in the area of data intensive applications. Although several implementations are already offered in commercial products, especially DBMSs, there are still open research issues related to efficiency of XML storage and retrieval. This paper introduces and analyses new index structures suitable for support of regular expressions over character data combined with path expressions in XML queries. The performance of these structures is analyzed and compared with performance of alternative approaches. In addition to usual criteria of I/O and CPU performance, a possibility to implemenent new index structurewithin existing DBMS engines is considered.
ABSTRACT: Proximity queries have been shown to be very useful for semistructured databases in many applications. However, it is challenging to determine proximity even for semistructured database of moderate size. This paper first summarizes our recent proposal for proximity determination of semistructured data. We then present the optimization techniques to scale this proposed methodology to deal with very large semistructured databases, for which disk-based proximity index is probably the only solution to consider. Finally performance analysis of the optimization scheme is presented with discussions of practical considerations.
firstname.lastname@example.org, email@example.com, firstname.lastname@example.org, email@example.com
ABSTRACT: Data mining (DM) processes require data to be supplied in only one table or data file. Therefore, data stored in multiple relations of relational databases must be joined before submission to DM analysis. A problem faced during this preparation step is that, most of the times, the analyst does not have a clear idea of what portions of data should be mined. This paper reckons the strong human ability to interpret data in graphical format to develop a process called "wagging", to visualize data from multiple relations, helping the analyst when preparing data to DM. The data obtained from the wagging process allow to execute further processes as if they were operating over multiple relations, bringing the join operations to become part of the data mining process. Experimental evaluation shows that the wagging process reduces the join cost significantly, turning it possible to visually explore data from multiple tables interactively.
firstname.lastname@example.org, email@example.com, firstname.lastname@example.org, email@example.com
ABSTRACT: The study of the containment of conjunctive queries containing inequalities (denoted inequality queries in this paper) was a thoroughly studied and long standing problem. In "Containment of conjunctive queries with built-in predicates with variables and constants over any ordered domain" (ADBIS'98), we offered an exact condition, along with a procedure, to test query containment of this type of queries, using the idea of canonical databases. In this work, we present a different approach that is sensibly more efficient, based on the idea of testing query containment by using the theory of subsumption of formulas.
firstname.lastname@example.org, email@example.com , firstname.lastname@example.org
ABSTRACT: Performance is a critical issue in Data Warehouse systems (DWs), due to the large amounts of data manipulated, and the type of analysis performed. A common technique used to improve performance is the use of pre-computed aggregate data, but the use of aggregates must be transparent for DW users. In this work, we present MF-Retarget, a query retargeting mechanism that deals with both conventional star schemas and multiple fact table (MFT) schemas. This type of multidimensional schema is often used to implement a DW using distinct, but interrelated Data Marts. The paper presents the retargeting algorithm and initial performance tests.
Burdakov@usa.net, Grigorev@iu5.bmstu.ru, Plutenko@amursu.ru
ABSTRACT: Due to the complexity of modern object-oriented database management systems' (OODBMS) query execution processes it is rather hard for a system designer to predict performance characteristics of an information system under development at the early design stages. This paper proposes novel mathematical model and methods for evaluation of query execution time for OODBMS. These methods provide estimation equations for two basic n-ary algorithms employed in OODBMS: Forward Join and Reverse Join. The proposed methods are based on Generating Functions and Laplace-Stieltjes Transform apparatus, and allow to use arbitrary distribution functions for the definition of query execution algorithm's and database objects' parameters (number of objects, predicate selectivity, index scan time, etc.). For some degenerate cases corresponding corollaries are obtained with simplified equations. Database page organisation is addressed by equations which extend Yao's formula for arbitrary distributions.
ABSTRACT: Business rules approach is a modern methodology that could help to improve both qualitative and quantitative properties of traditional Information Systems (IS). In this paper the basics of this approach are discussed and the best known methods of classifying and modelling business rules are analysed. The main aspects of the creation of business rules repository are viewed through and business rules structuring process is discussed. Basing on the analysis, conceptual business rule repository model is proposed and the process of rule registration is described. Structuring of business rules is illustrated by an example. Conceptual model of business rule manipulation mechanism based on event interception and appropriate rule implementation is presented.
email@example.com, firstname.lastname@example.org, email@example.com
ABSTRACT: Abstract. XML has been quickly emerging as a dominant standard for data representation and exchange on the World Wide Web for its many good features such as well-formed structure or semantic support. Research on semistructured data over the last several years has focused on data models, query languages, and systems where the database is modeled in some form of a labeled, directed graph. Processing this as a sophisticated query on semistructured data is not very easy because of the complexity of the structure of the graph and the lack of corresponding schemata associated with it. To deal with such problems the paper proposes an approach to process semistructured data with XML. Although there are many similarities between semistructured data and XML there exist some differences. A key difference is that current XML DOM only supports tree structures and does not directly support graph structures. To deal with such differences two approaches in this paper are proposed to treat an XML document as a semantic graph and literal tree which are the foundation to transform semistructured data into XML documents for processing. For this purpose several algorithms are designed to transform semistructured data into XML documents and XML-Schema document based on the schema tree extracted from original semistructured data. To ensure that semistructured data can be reconstructed from XML documents this transformation must be lossless. Finally the paper also presents an algorithm for reconstructing semistructured data.
ABSTRACT: The development of spatiotemporal database systems is primarily motivated by applications which track and present mobile objects. An important aspect in this research area is the as-needed communication between moving objects and the database systems. Such communications with high efficiency are required b Internet applications as well as by location-based mobile applications. Another important trend is the evaluation of traveling patterns of moving objects and traffic patterns affecting the route planning. In this paper we will present a communication model in the context of moving object databases using spatial temporal triggers. We will also present a trigger evaluation approach using continues queries with special treatment for high performance. Based upon our existing work of 4D (2D geography plus time and uncertainty) trajectory model for moving objects we will propose a usage of PTL (past temporal logic) embedded with spatial operators to evaluate historical travel patterns of moving objects and traffic data. By incorporating historical patterns in the trigger condition, we have extended the knowledge power of the spatial temporal triggers.
ABSTRACT: We consider the problem of a gap between "What is needed for Analysts and Decision Makers dealing with complex analytic and decision problems" and "What kind of knowledge discovery technologies are available nowadays and could be used to assist in a decision oriented knowledge discovery". Our new model "IT preferences for a Decision Making" was developed for this purpose. We applied this model for an evaluation of three different knowledge discovery engines currently exist - MDS Engine, GUHA Engine and DEX-HINT Engine. Finally we demonstrate a power and flexibility of this model using it for a finding of a new combination of features (partly included in a MDS Engine, partly included in a GUHA Engine). Support of COST Action TARSKI (Theory and Applications of Relational Structures as Knowledge Instruments) is acknowledged. My thanks go also to Rozann and Thomas Saaty for their Expert Choice 2000 Support.
firstname.lastname@example.org; email@example.com, firstname.lastname@example.org
ABSTRACT: The World Wide Web facilitates user access to knowledge-based decision support systems. Such web-enabled systems can provide users with advice about how decision-makers exercise discretion. GetAid, developed using the web-based shell environment WebShell, is an example of a web-based decision support system operating in a discretionary legal domain. This paper presents the Context, Criteria, Contingency evaluation framework for knowledge-based systems, general in design but geared towards the evaluation of legal knowledge-based systems. Central to this framework is a hierarchical model of evaluation criteria arranged in four quadrants: verification and validation, user credibility, technical infrastructure and the impact of the system upon its environment. This framework frames an evaluation both in terms of the context of use of the system and the context of its evaluation and includes guidelines for the selection of appropriate evaluation criteria under differing contingencies. A case study is presented describing the use of this evaluation framework in planning the evaluation of the web-deployed GetAid system.
ABSTRACT: The paper provides a brief overview of the HyperMeData language specifically designed to support data interchange among heterogeneous information systems, and pays attention to meta-level transformation descriptions. The language is sufficiently complex and powerful to catch both intra- and inter- data schema relationships (i.e., to describe both data schemas and data transformations), nonetheless, its routine use requires employing a set of meta-level transformation rules to handle typical schematic differences among semantically similar database objects. The classification of schematic heterogeneities in multidatabases is used as the basis for proposing inter-attribute correspondences (meta-level transformation rules) and the respective translations to HyperMeData descriptions of data transformations (transformation rules).
ABSTRACT: The state of the art in the domain of knowledge discovery in databases (KDD) and data mining (DM) has reached the point where the existence of various languages is becoming highly desirable. This paper presents an XML-based language called DMSL (Data Mining Specification Language). Its purpose is to provide the framework for platform-independent definition of the whole data mining process, and exchange and sharing of DM projects among different applications, possibly operating in heterogeneous environments. We assume that the reader is familiar with the notions of XML, knowledge discovery in databases, and data mining
ABSTRACT: The information to be stored in databases in not always precise. A related issue is the handling of imperfect, flexible or vague queries. M. Kifer and V. S. Subrahmanian introduced generalized annotated logic programs (GAP) which unify and generalize various results and treatments of quantitative Datalog based model theoretic semantic. In this paper we discuss the problem of appropriate proof theoretic data model for restricted annotated programs. We face several problems: the semantics is not continuous, constraint based computational model is not effective and there is a problem with the definition of the natural join. We introduce a variant of annotated programs with continuous semantics, we define a new effective computational procedure and show a solution for the join problem. We use a connection to fuzzy Datalog and make use of an earlier model of fuzzy databases.
ABSTRACT: Using the terminology usual in databases, it is possible to view XML as a language for data modelling. To retrieve XML data from XML databases, several query languages have been proposed. The common feature of these languages is the usage of regular path expressions. Users are allowed to navigate through arbitrary long paths in the data by regular path expressions. Several index structures for XML data have been developed in recent years in order to address the problem. The paper is going to show how the UB-trees can index and retrieve XML documents eciently. UB-trees were introduced by R. Bayer as a structure for indexing n-dimensional space. The basic idea of UB-tree index is that the indexing is performed at the lowest level of the given XML data.
ABSTRACT: Introduced are the concepts of closed entity-relationship diagrams (CERDs), and of partially, and generalized commutative diagrams (which are particular types of CERDs). Briefly presented is the elementary mathematical data model (EMDM), used to enable more accurate database scheme design. Provided is an algorithm for assisting database designers in modeling CERDs. Shown is that every such CERD should be thoroughly scrutinized according to this algorithm, even if some of them might prove "uninteresting" in the end.
ABSTRACT: Workflow management systems (WFMSs) need to adapt dynamic process modifications. In current WFMSs the scope of dynamic modifications is mainly focused on control flow, while other dynamic aspects are neglected. In this paper an approach to adapt dynamic modification in workflow participant assignment (WPA) is presented. The approach extends the meaning of WPA that is proposed by the Workflow Management Coalition. The extension covers dynamic aspects and expresses complex relationships between control, audit and relevant data. On basis of the new definition a WPA Language (WPAL) is proposed. WPAL is a programming interface, which makes it possible to assign dynamically workflow participants. WPAL has been implemented in OfficeObjects® WorkFlow and deployed among several major customers of Rodan Systems. The paper also presents implementation results
ABSTRACT: Industrial systems are heterogeneous. They contain resources provided by various manufacturers/vendors that define the resources' related data in proprietary formats, hence with specific descriptions and implementations. This fact directly influences the management part of an industrial heterogeneous system, significantly increasing its complexity. A solution to this problem is the definition of an industrial specific language able to describe data related to industrial heterogeneous resources. The language should represent a non-proprietary mechanism for data specification and implementation. This paper presents a proposal of such a language, named IndML - Industrial Markup Language, which is based on XML - the eXtensible Markup Language. Further, it provides an application example of IndML on a heterogeneous industrial system.
email@example.com, firstname.lastname@example.org, email@example.com
ABSTRACT: Query in Information Retrieval produces some amount of relevant results. Consecutively, there is a need for some qualitative classification of these particular results in such way the user is able to understand. In this article we introduce a new formal method of navigation through query result. This navigation method is based on an original idea of concept order structure, which exploits the concept lattices theory and the fuzzy set theory. So far, user must provide a subjective factor - attribute scaling. Our method helps to uncover significant concepts without the need of user scaling.
firstname.lastname@example.org, email@example.com, firstname.lastname@example.org
ABSTRACT: Starting with csp2B specification facilities as the core, this paper extends Abstract Machine Notation (AMN) further with the sequential processes, interrupt operator, timing and other facilities specific for TCOZ. An approach for mapping of extensions of the core into AMN and algorithms of their conversion into B machines are defined. B technology provides provable refinement technique required for compositional development. Capabilities of the extended notation are illustrated by an example showing how a refinement of the workflow process specification of requirements by a composition of the pre-existing workflow processes can be formally justified.