Please note

I left my position at Goethe University in March 2015. This page is outdated. Current information (including downloads) can be found on my main page.

Old stuff

sore-inf

Introduction

sore-inf is a tool that learns deterministic single occurrence regular expressions (SOREs) from positive examples. This is an implementation of the learning algorithms from the paper that Timo Kötzing and I published at ICDT 2013 (paper, free preprint), containing bugfixes from the upcoming journal version (mail me for a preprint). For further explanation, see the paper. If you want to learn DTDs, use the dtd-inf tool.

Installation

You need to install Python 3 on your computer (Python 2 will not work). Download the package, unpack it. You can then run python3 sore-inf.py --help. (Depending on your system, you can give sore-inf.py executable rights and run it directly.)

Example usage

./sore-inf.py abc acb c
Computes a deterministic SORE for the sample consisting of the words abc, acb, and c. Note that the prettifcation algorithm uses character classes, e.g., instead of (a|b|c), it writes [a-c].

Implementation notes

Authors and license

The core inference algorithm was implemented by Dominik D. Freydenberger and uses this implementation of Tarjan's Algorithm by Dries Verdegem (which, to our knowledge, is in the public domain). The prettification algorithm is a part of the M.O.D.O.D. library, which was designed (only for DREs) by Dominik D. Freydenberger and implemented by Christoph Burschka. The creation of the M.O.D.O.D. library was generously supported by the program "Nachwuchswissenschaftler/innen im Fokus" (Goethe University). We put this stuff under the MIT License, and the source code is already included.