Opinion
Systems and Networking Last byte

String Wars

Posted
  1. Article
  2. Author
  3. Footnotes
String Wars, illustration

Suppose someone gives you two strings: X and Y. Your goal is to design a minimum-cost collection of smaller strings coll (X|Y) that match and cover every character of string X With order independence without matching any substring of string Y.

Let us first break down that last sentence:

The collection of strings in coll(X|Y) may have duplicates;

Matching and covering every character of string X means the strings in coll(X|Y) should tile string X without overlaps or gaps, and every tile should exactly match an underlying substring of X;

Not matching a substring of string Y means there should be no exact match of any string in coll(X|Y) to any substring of string Y; and

Order independence means no matter in which order the strings of coll(X|Y) is introduced and where they match, X will be tiled once the last string in coll(X|Y) is introduced.

When it satisfies all these properties, coll(X|Y) is called a “proper covering of X with respect to Y.” The cost of coll(X|Y) is the sum of the squares of each element in the collection, including the duplicates.

Here is a simple example to get started. If string X is aaaaaaaaaa and Y is bbbbbbbbbbbb, then coll(X|Y) consisting of 10 instances of “a” will be a proper covering. No instance of “a” will match any substring (letter, in this case) in Y. Coll(X|Y) is order-independent since the elements of Coll(X|Y) can be introduced in any order; all are just the single letter “a” after all. Further, the (total) cost is 10, because each “a” costs 1.

Warm-Up 1. Continuing with this example, suppose X were ababababab and Y were aaaaaaaaaa. What would be a proper covering of X with respect to Y?

Solution to first warm-up. Five strings that are “ab” yielding a total cost of 20.

Warm-Up 2. Suppose X were abaababab and Y were bbababba. What would be a proper covering for X with respect to Y?

Solution to second warm-up. coll(X|Y) = {abaa, babab}. Breaking up either of these strings into shorter strings would entail some matches with Y.

Challenge. Given the scenario of the first warm-up, what would be a minimum-cost collection coll(Y|X) for Y that would cover Y with respect to X?

Solution. Note that five instances of “aa” would not be an order-independent cover of Y with respect to X. The reason is that, for example, one “aa” might match the second and third letters of Y, thus preventing a tiling, because no element would cover the first letter of Y. In fact, only coll(Y|X) = “aaaaaaaaaa” would work. That would have a cost of 10 × 10 = 100.

We see that an inexpensive order-independent covering of X may not work when elements of the covering might match Y. This brings up the possibility that an adversary—perhaps nature in the motivating use case of molecular biology—might create a Y that would greatly increase the cost of covering X.

String War Challenge. With respect to X = abaabaabaaba, the red string in the figure here, can you design a string Y of length 12 that can beat X? That is, we seek a Y such that the minimum-cost proper covering of Y with respect to X costs less than the minimum-cost proper covering of X with respect to Y.

uf1.jpg
Figure. A minimum-cost proper covering of the red string of characters with respect to the blue string is aba, aba, aba, aba for a cost of 4 × 9 = 36. A minimum-cost proper covering of the blue string with respect to the red string is abba, bbba, bbab for a cost of 3 × 16 = 48. The red string thus “beats” the blue string. Can you find a string that beats the red string?

Solution. Y = bbbbbabbbaba. coll(Y|X) = {bbbbba, bbbaba} having cost 36 + 36 = 72. coll(X|Y) = {abaabaabaaba} having cost 144.

String War Upstart. Given an X, can you always design a Y of the same length as X such that Y beats X? If so, design an algorithm to do so. Can you also design an algorithm to give a maximal difference in cost?

Back to Top

Back to Top

Join the Discussion (0)

Become a Member or Sign In to Post a Comment

The Latest from CACM

Shape the Future of Computing

ACM encourages its members to take a direct hand in shaping the future of the association. There are more ways than ever to get involved.

Get Involved

Communications of the ACM (CACM) is now a fully Open Access publication.

By opening CACM to the world, we hope to increase engagement among the broader computer science community and encourage non-members to discover the rich resources ACM has to offer.

Learn More