Recoloring the Colored de Bruijn Graph

Indexed Dynamic Programming to boost Edit Distance and LCSS Computation

Adaptive Computation of the Discrete Frechet Distance

Compressed Range Minimum Queries

The most natural approach is to use string compression and construct a data structure for answering range minimum queries directly on the compressed string. We investigate this approach using grammar compression. We then consider an alternative approach. Even if S is not compressible, its cartesian tree necessarily is. Therefore, instead of compressing S using string compression, we compress the cartesian tree of S using tree compression. We show that this approach can be exponentially better than the former, and is never worse by more than an O(sigma) factor (i.e. for constant alphabets it is never asymptotically worse).

Maximal Motif Discovery in a Sliding Window

Faster Recovery of Approximate Periods over Edit Distance (short paper)

We improve time complexity of the fastest known algorithm for this problem of Amir et al. (TCS 2018) from O(n^{4/3}) to O(n log n). Our tool is a fast algorithm for Approximate Pattern Matching in Periodic Text. We consider only verification for the period recovery problem when the candidate approximate word-period P is explicitly given up to cyclic rotation; the algorithm of Amir et al. reduces the general problem in O(n) time to a logarithmic number of such more specific instances.

New structures to solve aggregated queries for trips over public transportation networks

In this paper, we present all that work and experimental results over synthetic trips generated from accurate schedules obtained from a real GTFS network description (from Madrid) to show the space/time trade-off of both approaches. We considered a wide range of different queries about the use of the transportation network such as counting-based/aggregate queries regarding the load of any line of the network at different times.

Faster and Smaller Two-Level Index for Network-based Trajectories

Fast and Effective Neural Networks for Translating Natural Language into Denotations

3DGraCT: A Grammar based Compresed representation of 3D Trajectories

This work presents a method for representing aircraft trajectories. The new method uses a compact data structure approach, and then, it is possible to directly query the compressed data without a previous decompression. In addition, in the same compressed space, the data structure includes several access methods to accelerate the searches.

Fast Wavelet Tree Construction in Practice

Compressed Communication Complexity of Longest Common Prefixes

Namely, if the longest common prefix has an LZ77 parse of z phrases, only O(lg z) rounds and O(lg l) total communication is necessary.

We extend the result to the natural case when Bob holds a set of strings B1, ..., Bk, and the goal is to find the length of the maximal longest prefix shared by A and any of B1, ..., Bk. Here, we give a protocol with O(log z) rounds and O(lg z lg k + lg l) total communication.

We present our result in the public-coin model of computation but by a standard technique our results generalize to the private-coin model. Furthermore, if we view the input strings as integers the problems are the greater-than problem and the predecessor problem.

Recovering, counting and enumerating strings from forward and backward suffix arrays

Early Commenting Features for Emotional Reactions Prediction

Towards a compact representation of temporal rasters

extends a compact data structure for raster data in order to manage time and thus, it is possible to query it directly in compressed form, instead of the classical approach that requires a complete decompression before any manipulation.

In addition, in the same compressed space, the new data structure includes two indexes: a spatial index and an index on the values of the cells, thus becoming a self-index for raster data.

[short paper] Block Palindromes: A New Generalization of Palindromes

Truncated DAWGs and their application to minimal absent word problem

On Extended Special Factors of a Word

Searching for a Modified Pattern in a Changing Text

There are real-world applications where the text is unchanged and the pattern is slightly modified at every query. However, in the current state- of-the-art, a new search is required if the pattern is modified. In this paper we present an algorithm that reduces this search time to be sublinear, for a variety of types of pattern modification - in addition to the insertion, deletion, and replacement of symbols, we allow copy-paste and delete substring operations.

We also make a step toward a fully dynamic pattern matching model by also supporting text changes, albeit much more modest than the pattern changes. We support dynamic pattern matching where symbols may also be either added or deleted to either the beginning or the end of the text. We show that we can support such a model in time O(log n) for every pattern modification or text change. We can then report all occ occurrences of P in the text in O(occ) time.

Optimal In-Place Suffix Sorting

Designing time/space-efficient suffix array construction algorithms has attracted significant attentions and considerable advances have been made for the past 20 years.

We obtain the \emph{first} in-place linear time suffix array construction algorithms that are optimal both in time and space for (read-only) integer alphabets.

Our algorithm settles the open problem posed by Franceschini and Muthukrishnan in ICALP 2007. The open problem asked to design in-place algorithms in $o(n\log n)$ time and ultimately, in $O(n)$ time for (read-only) integer alphabets with $|\Sigma| \leq n$.

Our result is in fact slightly stronger since we allow $|\Sigma|=O(n)$.

Besides, we provide an optimal in-place $O(n\log n)$ time suffix sorting algorithm for read-only general alphabets (i.e., only comparisons are allowed), recovering the result obtained by Franceschini and Muthukrishnan which was an open problem posed by Manzini and Ferragina in ESA 2002.

Trickier XBWT Tricks

and searching a dictionary. However, to build the trie structure for larger

collections of strings takes up a lot of memory. Since the eXtended

Burrows-Wheeler Transform (XBWT) is able to

compactly represent a labeled tree, it can naturally be used to succinctly

represent a trie. The XBWT also supports navigational operations on the trie,

but it does not support failure links. For example, the Aho-Corasick algorithm

for simultaneously searching for several patterns in a text

achieves its good worst-case time complexity only with the aid of failure links.

Manzini showed that a balanced parentheses sequence P can be

used to support failure links in constant time with only 2n+o(n) bits of

space, where n is the number of internal nodes in the trie. Besides

practical algorithms that construct the XBWT, he also provided two different

algorithms that construct P. In this paper, we suggest an alternative way

for constructing P that outperforms the previous algorithms.

Computing Burrows-Wheeler Similarity Distributions for String Collections

Longest Common Prefixes with k-Errors and Applications

Linear-Time Online Algorithm Inferring the Shortest Path from a Walk

It has been known that this problem is solvable in $O(n \log n)$ time when the targets are path or cycle graphs.

This paper presents an online algorithm for the problem of this restricted case that runs in $O(n)$ time, based on Manacher's algorithm for computing all the maximal palindromes in a string.

Longest Property-Preserved Common Factor

Efficient Computation of Sequence Mappability

Better heuristic algorithms for the Repetition Free LCS and other variants

In this paper we design and test several new approximation algorithms for RFLCS. The first algorithm, which we believe is the most important contribution, uses dynamic programming and on our tests performs noticeably better than the algorithms of Adi et al.. The second algorithm transforms the RFLCS instance into an instance of the Maximum Independent Set (MIS) problem with the same value of the optimum solution. Then, we apply known algorithms for the MIS problem. We also augment one of the approximation algorithms of Adi et al. and we prove that we achieve an approximation of factor $2\sqrt{\min\{|A|,|B|\}}$.

Finally, we introduce two variants of the LCS problem. For one of the problems, named \emph{Pinned Longest Common Subsequence (PLCS)} we present an exact polynomial time algorithm based on dynamic programming. The other variant, named \emph{Multiset Restricted Common Subsequence (MRCS)} is a generalization of RFLCS. We present an exact polynomial time algorithm for MRCS for constant size alphabet. Also, we show that MRCS admits a $2\sqrt{\min\{|A|,|B|\}}$ approximation.

The colored longest common prefix array computed via sequential scans

In this paper, we present a new conceptual data structure, the colored longest prefix array (cLCP), that allows to efficiently tackle several problems with an alignment-free approach. In fact, we show that such a data structure can be computed via sequential scans in semi-external memory. By using cLCP, we propose an efficient lightweight strategy to solve the multi-string ACS problem, that consists in the pairwise comparison of a single string against a collection of m strings simultaneously, in order to obtain m ACS induced distances. Experimental results confirm the effectiveness of our approach.