Algorithm Engineering
practical algorithms ▪ parallelism ▪ I/O efficiency ▪ randomizationAlgorithm engineering is a development method that aims to turn theoretical algorithms and data structures into practical and efficient solutions with rigorous guarantees. While classical algorithm design often emphasizes asymptotic performance metrics, algorithm engineering is more concerned with the behavior of algorithms on actual machines, under practical constraints, and with real data.
A central tool is an iterative cycle of algorithmic design and analysis, interleaved with implementation and evaluation. In this setting, theoretical insights guide the implementation and experimental set-up to identify bottlenecks in practice. In turn, these empirical findings inform the next iteration to design improved algorithms based on more realistic metrics, models, or design assumptions.
Our group applies this method to a broad range of problems, including graph algorithms in Network Science and Cheminformatics, as well as scalable sampling processes. These projects often involve advanced models of computation to capture, for instance, parallelism and data locality.
Differential Privacy
continual observation ▪ string indexingDifferential privacy is used as the de facto standard privacy definition in data analysis. In the differential privacy setting, a trusted party collects data from users, analyzes them, and instead of outputting the precise result, it outputs a randomized approximation of the result such that the absence or presence of a single user is hidden. The main benefit of this definition is that it promises privacy even against adversaries with large computational power or additional background knowledge. The main challenge is to design algorithms that fulfill this privacy definition and at the same time provide good bounds on the error of the perturbed result with high probability.
In our group, there is work on differential privacy in the continual observation setting, where data evolves over time. Additionally, there is novel work on designing differentially private data structures for string indexing.
Online Algorithms
predictions ▪ advice complexity ▪ performance measuresMany real world problems have an online component, some decisions that have to be made immediately, before all related data is available, and the focus is often on minimizing resource consumption. This relates to environmental protection and resource management in general, facilitating cost-effective production and solutions. The study of online algorithms is aimed at designing and analyzing online algorithms.
Online problems are optimization problems where the input arrives one request at a time, and each request must be processed without knowledge of future requests. As usual in Computer Science, we want to be able to determine which of several algorithms solves a problem best, and a popular technique in the context of online algorithms is competitive analysis. The over-all goal of a theoretical quality measure is to predict behavior of algorithms in practice. In that respect, competitive analysis works well in some cases, but, as pointed out by the inventors and others, fails to discriminate between good and bad algorithms in other cases.
Ever since its introduction, researchers have worked on improving competitive analyses, defining variants, or defining measures based on other concepts to improve on the situation for these hard cases. This has lead us to develop new analytical tools such as the accommodating function, relative worst-order analysis, online bounded analysis, etc.
Advice complexity is an area of research where one attempts to measure how much knowledge of future requests is necessary for an online algorithm to achieve a given performance level, as defined by the competitive ratio. When this knowledge, the advice, is obtainable, this leads to practical algorithms, called semi-online algorithms. On the other hand, each negative result gives robust results about the limitations of a broad range of semi-online algorithms.
In extension of the work on advice complexity, which can be viewed as online algorithms with predictions that are guaranteed to be accurate, we also consider predictions with uncertainty. Among other things, this opens up for an integration of machine learning components with the online algorithms. The overall focus is on establishing results that are good both with regards to consistency (the algorithm does well when the predictions are good) and robustness (the algorithm comes with some reasonable guarantees even when predictions are bad. The group has developed a complexity theory for online algorithms with or without prediction so that the relative hardness of online problems can be compared.
String Algorithms
data structures ▪ indexing ▪ matchingA string is a sequence of elements from a predefined alphabet. The field of string algorithms and data structures is thoroughly studied and has a wide range of applications in computational biology and data analysis. Strings are used to model all sequential data, ranging from large texts over DNA sequences to sequences of events. Since these data sets are very large, efficient algorithms for processing them are necessary. Results from string algorithms have thus propagated to computational biology and other fields, allowing to process more and more data.
We work on variants of the string indexing problem, i.e., preprocessing strings into data structures that can answer pattern matching queries fast (string indexing). While this is well studied for classic pattern matching problems, indexing for more complicated matching problems, for example matching problems involving several patterns or regular expressions, is an ongoing exciting research field.