### Matthias Steinhorst

Location: Brisbane

Arrival: Apr 2nd 2012

Departure: Jun 29th 2012

###
Matthias Steinhorst

#### Mission statement

- Joint research with local QUT staff in the context of the NSS project
- Joint research with local QUT staff in the context of conceptual model analysis

#### Research results

On April 26, I gave a presentation on generic pattern matching in conceptual models. The presentation was held during the weekly BPM seminar. The work I presented was recently published in the EMISA 2011 workshop proceedings. An augmented version of the paper is to appear in the EMISA journal. As a result of my talk, we will look into implementing the matching approach in APROMORE, which is a repository of process models that is maintained by the BPM group at QUT. As the repository contains thousands of models, it will give us a great opportunity to further evaluate our matching approach in terms of runtime performance. In addition, implementing the algorithm in APROMORE will increase the visibility of the research we have been doing in Muenster. After my presentation some attendees pointed out that SQL might better be suited to retrieve patterns in conceptual models, provided the models are stored in an appropriate relational database. Although I doubt SQL has the same expressive power as our approach, using SQL in this manner will be an interesting research question to look into.

Abstract of the paper I presented: Identifying structural patterns in conceptual models serves a variety of purposes ranging from model comparison to model integration and exploration. Although there are a multitude of different approaches for particular modelling languages and application scenarios, the modelling community lacks an integrated approach suitable for conceptual models of arbitrary languages and domains. Therefore, a generic set-theory based pattern matching approach has recently been developed. To prove that this approach is beneficial in terms of performance, we conduct a statistically rigorous analysis of its runtime behaviour. We augment the original approach to include a caching mechanism that further increases performance. We are able to show that the original algorithm is able to identify arbitrary patterns within milliseconds. The caching extension further increases performance by up to fifty per cent given the model base and patterns we used.

In May, I followed up on the feedback I received during my talk. I explored various possibiltites of using SQL to query a graph database. The obvious drawback of using a relational database to store conceptual models is that you always assume a particular structure of the underlying database schema. In this case, however, this structure is rather straightfortward, as a model graph consists of nodes and edges that need to be stored in separate tables. The schema thus contains two tables, one for nodes and one for edges. Given this schema, it is possible to create SQL queries that represent the set-altering functions and operators offered by our pattern matching approach. However, in many application scenarios of pattern matching in IS, we are interested in finding substructures in a graph that are not exactly identical but similar to a given pattern graph. In our matching approach, this can be achieved with the Paths-functions that calculate all paths form a given start node to a given target node. These functions place no restrictions on the length of the paths, thus returning paths of arbitrary length. Paths of nodes can also be calculated using SQL by joining the node and edge tables. However, as demonstrated by Sakr (2009), the lenght of these paths is determined by the number of joins in the SQL query. If one wishes to find arbitrary paths (a.k.a. similar substructures), one has to find a sensible upper limit on the number of joins. As no path can be longer than the total number of nodes in the graph, the number of joins is limited by the number of nodes in the graph. Considering that large process models may easily contain hundreds of nodes, one would have to join the node and edge tables hundreds of time to calculate arbitrary paths. I strongly doubt that a corresponding SQL query will return results faster than our matching algorithm. My initial concerns about the expressive power of SQL have therefore proven true. It is not directly possible to determine arbitrary paths with SQL. It is only possible by guessing an upper limit of the number of joins. This, however, will increase runtime performance significantly. My stay at QUT has consequently provided my with a valuable opportunity to get some feedback to the research we are conducting at Muenster. This feedback has confirmed me in my belief that we are very much on the right track with the pattern matching approach we have developed.

In May and June, I started implementing the pattern matching approach in APROMORE. I have realized how closely related the approach is to the generic meta-model we used stating that any modelling language can be represented by the set of its element types. On model level, this corresponds to a model as the set of its elements. Hence, the set theoretical approach of the matching algorithms. As APROMORE is based on a different meta-model, the matching algorithm cannot be transferred to it as easily as I had emagined. The work will therefore continue beyond the course of the NSS project. If the matching algorithm will return results faster on the APROMORE database than on the database we have used so far also will be part of mid-term research.

On June 4, we submitted a paper to a BPM workshop on process model repositories. The workshop is held in conjunction with the international conference on business process management. The workshop proceedings are editted by Prof Marcello La Rosa of QUT. Marcello was kind enough to read our paper and provide some valuable feedback. Marcellos pointers greatly improved our paper and thus our chances of acceptance. The paper describes the results of an extensive runtime analysis of four graph theoretical algorithms for pattern matching. We applied these algorithms to a large set of process and data models. The runtime data suggests that all algorithms are capable of detecting patterns within fractions of a second. In this paper, we furthermore discuss the relative performance of these algorithms and their applicability in practice. My stay in Brisbane has provided us with the opportunity to discuss this work with some of the most renowned researchers in the field of BPM. This helped us a great deal in improving our work.

#### Blog

Talk of Dr. Stefan Seidel on Green IS

published on 06/19/2012 - 01:55

On May 31, Dr. Stefan Seidel gave a talk on a nearly finished piece of work on Green IS. The paper has been under review at MISQ for almost 1,5 years. Stefan shared a few of his insights on pushing a paper through the review rounds of one of the most highly ranked journals of our discipline. Although, neither the topic nor the research approach are similar to my work, it was nevertheless interesting to gain a few insights into Stefan's experiences.

Arrival in Brisbane and first day of work

published on 04/02/2012 - 07:19

Saturday, I arrived in Brisbane after 20 hours of flight. I got to our appartement by 9 o'clock in the morning where Ralf greeted me with a nice breakfast. Having rested for a little while, we went for a walk in Brisbane. The city is magnificent. Yesterday, we went to the Koala Sanctuary and did what every tourist does: spend a fortune on a picture of oneself with a Koala. Today is my first official work day. As I am still quite a bit jetlagged, I am not doing much. I just collected my staff ID card and got my QUT username and password.