RESEARCH COMMUNICATIONS

Editors: Yannis Manolopoulos, Pavol Navrat

ISBN 80-227-1744-4

Contents

Indexing XML to Support Path Expressions, PDF

Dmitry Barashev and Boris Novikov

University of St. Petersburg, Russia
db2@acm.org, borisnov@acm.org

ABSTRACT: The extensible markup language (XML) is rapidly becoming a dominating technology in the area of data intensive applications. Although several implementations are already offered in commercial products, especially DBMSs, there are still open research issues related to efficiency of XML storage and retrieval. This paper introduces and analyses new index structures suitable for support of regular expressions over character data combined with path expressions in XML queries. The performance of these structures is analyzed and compared with performance of alternative approaches. In addition to usual criteria of I/O and CPU performance, a possibility to implemenent new index structurewithin existing DBMS engines is considered.


Proximity Determination and its Optimization for Semistructured Data, PDF

Michael Barg and Raymond K. Wong

School of Computer Science & Engineering, University of New South Wales, Sydney, Australia
db2@acm.org, borisnov@acm.org

ABSTRACT: Proximity queries have been shown to be very useful for semistructured databases in many applications. However, it is challenging to determine proximity even for semistructured database of moderate size. This paper first summarizes our recent proposal for proximity determination of semistructured data. We then present the optimization techniques to scale this proposed methodology to deal with very large semistructured databases, for which disk-based proximity index is probably the only solution to consider. Finally performance analysis of the optimization scheme is presented with discussions of practical considerations.


Visually Mining on Multiple Relational Tables at Once, PDF

Maria Camila Barioni, Humberto Razente, Caetano Traina Jr, Agma Traina

Department of Computer Science and Statistics University of Sao Paulo at Sao Carlos, Brazil
mcamila@icmc.sc.usp.br, hlr@icmc.sc.usp.br, caetano@icmc.sc.usp.br, agma@icmc.sc.usp.br

ABSTRACT: Data mining (DM) processes require data to be supplied in only one table or data file. Therefore, data stored in multiple relations of relational databases must be joined before submission to DM analysis. A problem faced during this preparation step is that, most of the times, the analyst does not have a clear idea of what portions of data should be mined. This paper reckons the strong human ability to interpret data in graphical format to develop a process called "wagging", to visualize data from multiple relations, helping the analyst when preparing data to DM. The data obtained from the wagging process allow to execute further processes as if they were operating over multiple relations, bringing the join operations to become part of the data mining process. Experimental evaluation shows that the wagging process reduces the join cost significantly, turning it possible to visually explore data from multiple tables interactively.


Containment of inequality queries revisited, PDF

José M. Barja, Nieves R. Brisaboa, José R. Paramá, Miguel R. Penabad

Departamento de Computación, Universidade da Coruna, Spain
jmbarja@udc.es, brisaboa@udc.es, parama@udc.es, penabad@udc.es

ABSTRACT: The study of the containment of conjunctive queries containing inequalities (denoted inequality queries in this paper) was a thoroughly studied and long standing problem. In "Containment of conjunctive queries with built-in predicates with variables and constants over any ordered domain" (ADBIS'98), we offered an exact condition, along with a procedure, to test query containment of this type of queries, using the idea of canonical databases. In this work, we present a different approach that is sensibly more efficient, based on the idea of testing query containment by using the theory of subsumption of formulas.


MF-Retarget: Aggregate Awareness in Multiple Fact Table Schema Data Warehouses, PDF

Karin Becker, Duncan Dubugras Ruiz, Kellyne Santos

Faculdade de Informática - Pontifícia Universidade Católica do Rio Grande do Sul
kbecker@inf.pucrs.br, duncan@inf.pucrs.br , kellyne@ufs.br

ABSTRACT: Performance is a critical issue in Data Warehouse systems (DWs), due to the large amounts of data manipulated, and the type of analysis performed. A common technique used to improve performance is the use of pre-computed aggregate data, but the use of aggregates must be transparent for DW users. In this work, we present MF-Retarget, a query retargeting mechanism that deals with both conventional star schemas and multiple fact table (MFT) schemas. This type of multidimensional schema is often used to implement a DW using distinct, but interrelated Data Marts. The paper presents the retargeting algorithm and initial performance tests.


On the Estimation of Query Execution Time in Object-Oriented Databases at the Early Design Stages, PDF

Aleksey V. Burdakov1, Yuri A. Grigorev1, Andrey D. Ploutenko2

1 Moscow State Technical University, Moscow, Russia
2 Amur State University, Blagoveschensk, Russia

Burdakov@usa.net, Grigorev@iu5.bmstu.ru, Plutenko@amursu.ru

ABSTRACT: Due to the complexity of modern object-oriented database management systems' (OODBMS) query execution processes it is rather hard for a system designer to predict performance characteristics of an information system under development at the early design stages. This paper proposes novel mathematical model and methods for evaluation of query execution time for OODBMS. These methods provide estimation equations for two basic n-ary algorithms employed in OODBMS: Forward Join and Reverse Join. The proposed methods are based on Generating Functions and Laplace-Stieltjes Transform apparatus, and allow to use arbitrary distribution functions for the definition of query execution algorithm's and database objects' parameters (number of objects, predicate selectivity, index scan time, etc.). For some degenerate cases corresponding corollaries are obtained with simplified equations. Database page organisation is addressed by equations which extend Yao's formula for arbitrary distributions.


The Business Rules Repository for Information Systems Design, PDF

Rimantas Butleris and Kestutis Kapocius

Department of Information Systems, Kaunas University of Technology, Kaunas, Lithuania
rimbut@if.ktu.lt, kesta.s@takas.lt

ABSTRACT: Business rules approach is a modern methodology that could help to improve both qualitative and quantitative properties of traditional Information Systems (IS). In this paper the basics of this approach are discussed and the best known methods of classifying and modelling business rules are analysed. The main aspects of the creation of business rules repository are viewed through and business rules structuring process is discussed. Basing on the analysis, conceptual business rule repository model is proposed and the process of rule registration is described. Structuring of business rules is illustrated by an example. Conceptual model of business rule manipulation mechanism based on event interception and appropriate rule implementation is presented.


Semistructured Data Store Mapping with XML and Its Reconstruction, PDF

Enhong Chen1, Gongqing Wu1, Gabriela Lindemann2, Mirjam Minor2

1 Department of Computer Science, University of Science and Technology of China
2 Humboldt University of Berlin, Department of Computer Science, Berlin, Germany

cheneh@ustc.edu.cn, wugq@mail.ustc.edu.cn, lindeman@informatik.hu-berlin.de

ABSTRACT: Abstract. XML has been quickly emerging as a dominant standard for data representation and exchange on the World Wide Web for its many good features such as well-formed structure or semantic support. Research on semistructured data over the last several years has focused on data models, query languages, and systems where the database is modeled in some form of a labeled, directed graph. Processing this as a sophisticated query on semistructured data is not very easy because of the complexity of the structure of the graph and the lack of corresponding schemata associated with it. To deal with such problems the paper proposes an approach to process semistructured data with XML. Although there are many similarities between semistructured data and XML there exist some differences. A key difference is that current XML DOM only supports tree structures and does not directly support graph structures. To deal with such differences two approaches in this paper are proposed to treat an XML document as a semantic graph and literal tree which are the foundation to transform semistructured data into XML documents for processing. For this purpose several algorithms are designed to transform semistructured data into XML documents and XML-Schema document based on the schema tree extracted from original semistructured data. To ensure that semistructured data can be reconstructed from XML documents this transformation must be lossless. Finally the paper also presents an algorithm for reconstructing semistructured data.


Spatial Temporal Queries and Triggers for Managing Moving Objects, PDF

Minglin Deng and Fengli Zhang

Department of Computer Science, University of Illinois at Chicago, USA
mdengcs.uic.edu, fzhang@cs.uic.edu

ABSTRACT: The development of spatiotemporal database systems is primarily motivated by applications which track and present mobile objects. An important aspect in this research area is the as-needed communication between moving objects and the database systems. Such communications with high efficiency are required b Internet applications as well as by location-based mobile applications. Another important trend is the evaluation of traveling patterns of moving objects and traffic patterns affecting the route planning. In this paper we will present a communication model in the context of moving object databases using spatial temporal triggers. We will also present a trigger evaluation approach using continues queries with special treatment for high performance. Based upon our existing work of 4D (2D geography plus time and uncertainty) trajectory model for moving objects we will propose a usage of PTL (past temporal logic) embedded with spatial operators to evaluate historical travel patterns of moving objects and traffic data. By incorporating historical patterns in the trigger condition, we have extended the knowledge power of the spatial temporal triggers.


Modelling of an Engine Based Approach to the Decision Support, PDF

Thomas Feglar

Prague, Czech Republic
feglar@czn.cz

ABSTRACT: We consider the problem of a gap between "What is needed for Analysts and Decision Makers dealing with complex analytic and decision problems" and "What kind of knowledge discovery technologies are available nowadays and could be used to assist in a decision oriented knowledge discovery". Our new model "IT preferences for a Decision Making" was developed for this purpose. We applied this model for an evaluation of three different knowledge discovery engines currently exist - MDS Engine, GUHA Engine and DEX-HINT Engine. Finally we demonstrate a power and flexibility of this model using it for a finding of a new combination of features (partly included in a MDS Engine, partly included in a GUHA Engine). Support of COST Action TARSKI (Theory and Applications of Relational Structures as Knowledge Instruments) is acknowledged. My thanks go also to Rozann and Thomas Saaty for their Expert Choice 2000 Support.


A Strategy for Evaluating Web-Based Discretionary Decision Support Systems, PDF

Maria Jean J. Hall1, Andrew Stranieri1 and John Zeleznikow2

1 Donald Berman Laboratory for Information Technology and Law Applied Computing Research Institute, La Trobe University Bundoora, Victoria, Australia
2 Joseph Bell Centre for Forensic Statistics and Legal Reasoning, Faculty of Law, University of Edinburgh, Scotland, UK

jean_hall@bigpond.com; stranier@cs.latrobe.edu.au, john.zeleznikow@ed.ac.uk

ABSTRACT: The World Wide Web facilitates user access to knowledge-based decision support systems. Such web-enabled systems can provide users with advice about how decision-makers exercise discretion. GetAid, developed using the web-based shell environment WebShell, is an example of a web-based decision support system operating in a discretionary legal domain. This paper presents the Context, Criteria, Contingency evaluation framework for knowledge-based systems, general in design but geared towards the evaluation of legal knowledge-based systems. Central to this framework is a hierarchical model of evaluation criteria arranged in four quadrants: verification and validation, user credibility, technical infrastructure and the impact of the system upon its environment. This framework frames an evaluation both in terms of the context of use of the system and the context of its evaluation and includes guidelines for the selection of appropriate evaluation criteria under differing contingencies. A case study is presented describing the use of this evaluation framework in planning the evaluation of the web-deployed GetAid system.


Meta-Level Transformations in Systems Integration, PDF

Jana Kohoutková

Masaryk University Brno, Institute of Computer Science, Czech Republic
kohoutkova@ics.muni.cz

ABSTRACT: The paper provides a brief overview of the HyperMeData language specifically designed to support data interchange among heterogeneous information systems, and pays attention to meta-level transformation descriptions. The language is sufficiently complex and powerful to catch both intra- and inter- data schema relationships (i.e., to describe both data schemas and data transformations), nonetheless, its routine use requires employing a set of meta-level transformation rules to handle typical schematic differences among semantically similar database objects. The classification of schematic heterogeneities in multidatabases is used as the basis for proposing inter-attribute correspondences (meta-level transformation rules) and the respective translations to HyperMeData descriptions of data transformations (transformation rules).


Describing the Data Mining Process with DMSL, PDF

Petr Kotásek, Jaroslav Zendulka

Faculty of Information Technology, Brno University of Technology, Czech Republic
kotasekp@fit.vutbr.cz,  zendulka@fit.vutbr.cz

ABSTRACT: The state of the art in the domain of knowledge discovery in databases (KDD) and data mining (DM) has reached the point where the existence of various languages is becoming highly desirable. This paper presents an XML-based language called DMSL (Data Mining Specification Language). Its purpose is to provide the framework for platform-independent definition of the whole data mining process, and exchange and sharing of DM projects among different applications, possibly operating in heterogeneous environments. We assume that the reader is familiar with the notions of XML, knowledge discovery in databases, and data mining


A data model for annotated programs, PDF

Stanislav Krajči1, Rastislav Lencses1, and Peter Vojtáš2,3

1 Department of Computer Science, Faculty of Science, P. J. Šafárik University
2 Institute of Computer Science, Academy of Sciences of the Czech Republic
3 Mathematical Institute, Slovak Academy of Science

kotasekp@fit.vutbr.cz,  zendulka@fit.vutbr.cz

ABSTRACT: The information to be stored in databases in not always precise. A related issue is the handling of imperfect, flexible or vague queries. M. Kifer and V. S. Subrahmanian introduced generalized annotated logic programs (GAP) which unify and generalize various results and treatments of quantitative Datalog based model theoretic semantic. In this paper we discuss the problem of appropriate proof theoretic data model for restricted annotated programs. We face several problems: the semantics is not continuous, constraint based computational model is not effective and there is a problem with the definition of the natural join. We introduce a variant of annotated programs with continuous semantics, we define a new effective computational procedure and show a solution for the join problem. We use a connection to fuzzy Datalog and make use of an earlier model of fuzzy databases.


Indexing XML Data with UB-trees, PDF

Michal Krátký1, Jaroslav Pokorný2, and Václav Snášel1

1 Department of Computer Science, VŠB-Technical University of Ostrava, Czech Republic
2 Department of Software Engineering, Charles University, Prague, Czech Republic

kotasekp@fit.vutbr.cz,  zendulka@fit.vutbr.cz

ABSTRACT: Using the terminology usual in databases, it is possible to view XML as a language for data modelling. To retrieve XML data from XML databases, several query languages have been proposed. The common feature of these languages is the usage of regular path expressions. Users are allowed to navigate through arbitrary long paths in the data by regular path expressions. Several index structures for XML data have been developed in recent years in order to address the problem. The paper is going to show how the UB-trees can index and retrieve XML documents eciently. UB-trees were introduced by R. Bayer as a structure for indexing n-dimensional space. The basic idea of UB-tree index is that the indexing is performed at the lowest level of the given XML data.


On Modeling Closed Entity-Relationship Diagrams in an Elementary Mathematical Data Model, PDF

Christian Mancaş

"Ovidius" University, Computer Science Department, Constanţa, Romania
mdatasis@fx.ro

ABSTRACT: Introduced are the concepts of closed entity-relationship diagrams (CERDs), and of partially, and generalized commutative diagrams (which are particular types of CERDs). Briefly presented is the elementary mathematical data model (EMDM), used to enable more accurate database scheme design. Provided is an algorithm for assisting database designers in modeling CERDs. Shown is that every such CERD should be thoroughly scrutinized according to this algorithm, even if some of them might prove "uninteresting" in the end.


Dynamic Changes in Workflow Participant Assignment, PDF

Mariusz Momotko1 and Kazimierz Subieta2,3

1 Rodan Systems S.A., Warsaw, Poland
2 Institute of Computer Science PAS, Warsaw, Poland
3 Polish-Japanese Institute of Information Technology, Warsaw, Poland

mdatasis@fx.ro

ABSTRACT: Workflow management systems (WFMSs) need to adapt dynamic process modifications. In current WFMSs the scope of dynamic modifications is mainly focused on control flow, while other dynamic aspects are neglected. In this paper an approach to adapt dynamic modification in workflow participant assignment (WPA) is presented. The approach extends the meaning of WPA that is proposed by the Workflow Management Coalition. The extension covers dynamic aspects and expresses complex relationships between control, audit and relevant data. On basis of the new definition a WPA Language (WPAL) is proposed. WPAL is a programming interface, which makes it possible to assign dynamically workflow participants. WPAL has been implemented in OfficeObjects® WorkFlow and deployed among several major customers of Rodan Systems. The paper also presents implementation results


IndML - An Industrial Markup Language, PDF

Claudia Raibulet and Claudio Demartini

Dipartimento di Automatica ed Informatica, Politecnico di Torino, Turin, Italy
raibulet@athena.polito.it, demartini@polito.it

ABSTRACT: Industrial systems are heterogeneous. They contain resources provided by various manufacturers/vendors that define the resources' related data in proprietary formats, hence with specific descriptions and implementations. This fact directly influences the management part of an industrial heterogeneous system, significantly increasing its complexity. A solution to this problem is the definition of an industrial specific language able to describe data related to industrial heterogeneous resources. The language should represent a non-proprietary mechanism for data specification and implementation. This paper presents a proposal of such a language, named IndML - Industrial Markup Language, which is based on XML - the eXtensible Markup Language. Further, it provides an application example of IndML on a heterogeneous industrial system.


Navigation Through Query Result Using Concept Order, PDF

Václav Snášel, Tomáš Skopal, Daniela Ďuráková

Department of Computer Science, VŠB-Technical University Ostrava, Czech Republic
vaclav.snasel@vsb.cz, tomas.skopal@vsb.cz, daniela.durakova@vsb.cz

ABSTRACT: Query in Information Retrieval produces some amount of relevant results. Consecutively, there is a need for some qualitative classification of these particular results in such way the user is able to understand. In this article we introduce a new formal method of navigation through query result. This navigation method is based on an original idea of concept order structure, which exploits the concept lattices theory and the fuzzy set theory. So far, user must provide a subjective factor - attribute scaling. Our method helps to uncover significant concepts without the need of user scaling.


Applying CSP-like Workflow Process Specifications for their Refinement in AMN by Pre-existing Workflows, PDF

Sergey A. Stupnikov1, Leonid A. Kalinichenko1, Jin Song DONG2

1 Institute for Problems of Informatics, Russian Academy of Sciences,
2 National University of Singapore

ssa@ipi.ac.ru, leonidk@ipi.ac.ru, dongjs@comp.nus.edu.sg

ABSTRACT: Starting with csp2B specification facilities as the core, this paper extends Abstract Machine Notation (AMN) further with the sequential processes, interrupt operator, timing and other facilities specific for TCOZ. An approach for mapping of extensions of the core into AMN and algorithms of their conversion into B machines are defined. B technology provides provable refinement technique required for compositional development. Capabilities of the extended notation are illustrated by an example showing how a refinement of the workflow process specification of requirements by a composition of the pre-existing workflow processes can be formally justified.


Copyright Slovak University of Technology