49
20
This is Fortnightly Challenge #6. Theme: Language Design
There's a chatroom for this challenge. Come and join us if you want to discuss ideas!
And now for something completely different...
This fortnight, we want to experiment with a new type of challenge. In this challenge, you will be designing a language! Pattern matching is a very common problem in programming, and often very useful for code golf. Regular expressions, for example, can be used to detect a pattern in a line of text. There are not, however, any established methods to describe and detect two-dimensional patterns.
The Challenge
You're to design a pattern-matching language, which allows the description of two-dimensional patterns in blocks of text. The mode of operation of your language will be similar to regular expressions (although your language doesn't have to have anything in common with regex otherwise):
- As input, you will receive a rectangular block of text. You may assume that the text consists only of printable ASCII characters (0x20 to 0x7E) as well as newlines (0x0A) to separate the rows of the grid.
- If a match, according to the pattern description, can be found as any subset of this block of text, this match should be returned or printed. If the match can be non-rectangular, it should be padded to a rectangular area with some reserved character. If there are several valid matches, you may decide how the returned match is chosen (largest, smallest, first, etc.).
For some applications it might be useful if your implementation can return the position of a match instead of the match itself, but this is not a requirement.
At the very least, your language should be able to match a pattern as a contiguous, rectangular subregion of its input.
Your answer should contain:
- A description of the language.
- A working implementation. This can be a program, or a set of functions/classes in a language of your choice.
- You should demonstrate your language by showing how it can be used to solve the examples provided below. Your language does not have to be capable of matching all of them, but you must be able to match at least 8 of these. If your language can do something fancy we didn't think of, feel free to include it as well.
If your answer builds on existing ideas, that is fine, but please give credit where it's due.
Extensions
The above describes the minimum that a valid submission has to satisfy. However, several generalisations could make such a pattern matching language even more useful, including but not limited to:
- Being able to anchor the pattern to one or more edges, so that one can check if the entire input region has a certain pattern.
- Producing all matches instead of just one. You may choose the semantics of overlapping matches.
- Taking non-rectangular text as input.
- Allowing patterns to specify non-rectangular matches. In such a case, the output should be padded to a rectangle with some reserved character.
- Allowing patterns to specify matches with holes.
- Allowing non-contiguous matches, like two characters appearing with a certain offset.
- Easy specification of rotations and reflections.
- Optionally treat the input cyclically as a cylinder or a torus, such that opposite edges are considered adjacent.
Scoring
The primary goal of this challenge is to produce an effective 2D pattern-matching language which could potentially be used in the future. As such, a scoring system like "shortest combined length to solve the examples" would lead to hard-coding of certain functionality at the expense of general usability. Therefore, we have decided that this challenge is best run as a popularity contest. The submission with the most net votes wins. While we can't force how people vote, here are a few guidelines for what voters should ideally look for:
- Expressiveness. Can the language solve a variety of problems, even beyond the examples presented in this question? Does it support any of the suggested extensions?
- Readability. How intuitive is the notation (at least to people who know the basic syntax)?
- Golfitude. This is still CodeGolf.SE. For the purposes of this site, it would of course be nice to have a matching language that needs very little code to describe a pattern.
Example Problems
The following Stack Snippet shows 16 example problems which a 2-D pattern matching language could be able to deal with. Each example contains a short problem description and is then usually followed by one input example where a match can be found and one example where no match can be found (if applicable).
As stated above, your language only needs to be able to solve 8 of these problems. Everything on top of that is optional, but should of course increase the number of votes you get.
body{font-family:'Helvetica Neue',Arial,sans-serif;color:#444;font-size:13px;width:500px;line-height:1.3}h3{font-size:16px!important;line-height:1.2em!important;margin-bottom:1.2em}code{white-space:pre-wrap;padding:1px 5px;font-family:'Droid Sans Mono',Consolas,Menlo,Monaco,Lucida Console,Liberation Mono,DejaVu Sans Mono,Bitstream Vera Sans Mono,Courier New,monospace,serif;color:#222;background:#eee}p code{padding:1px 5px}pre{overflow:auto;width:auto;width:480px !ie7;max-height:600px;font-family:'Droid Sans Mono',Consolas,Menlo,Monaco,Lucida Console,Liberation Mono,DejaVu Sans Mono,Bitstream Vera Sans Mono,Courier New,monospace,serif;margin-bottom:10px;padding:5px;background:#eee;border-left:2px dotted #ccc;font-size:12px}pre code{white-space:inherit;padding:0;background:0 0}
<h3>Problem 1 - Finding Chessboards</h3><p>Match any rectangular subregion, at least 3 columns wide and 3 rows tall, which consists of alternating <code>#</code> and <code>_</code>. The top left corner may be either of those two characters.</p><p><strong>Match</strong></p><pre><code>~______~ ~##_#_#~ ~#_#_##~ ~##_#_#~ ~______~ </code></pre><p>(There's an alternating 4x3 region in the centre. The match could be either that or either of its two 3x3 subregions.)</p><p><strong>No match</strong></p><pre><code>#_## _#_# __#_ #_#_ #_#_ </code></pre><p>(Contains two 3x2 regions, but no alternating 3x3 region.)</p><h3>Problem 2 - Verifying Chessboards</h3><p>Match the entire input, provided all of it consists of alternating <code>#</code> and <code>_</code>. It may start with either of those two characters.</p><p><strong>Match</strong></p><pre><code>_#_#_#_# #_#_#_#_ _#_#_#_# </code></pre><p><strong>No match</strong></p><pre><code>_#_#_#__ __#_#_#_ _#_#_#__ </code></pre><h3>Problem 3 - Detect a Rectangle of Digits</h3><p>Match a rectangle (at least 2x2) consisting only of digits and no other characters.</p><p><strong>Match</strong></p><pre><code>hbrewvgr 18774gwe 84502vgv 19844f22 crfegc77 </code></pre><p>You should match either the 5 wide by 3 tall rectangle (or any subset thereof) or the 2 by 2 rectangle.</p><p><strong>No Match</strong></p><pre><code>uv88wn000 vgr88vg0w v888wrvg7 vvg88wv77 </code></pre><p>There are no rectangles of digits.</p><h3>Problem 4 - Finding a Word in a Word Search</h3><p>Match the smallest rectangular region containing containing the word "GOLF" in any orientation (horizontal, vertical, diagonal, forwards or backwards). If your language supports non-rectangular matches, you may also match diagonal words without the surrounding rectangle.</p><p><strong>Match</strong></p><pre><code>INOWCEF IFWNOPH VULUHGY GUYOIGI YTFUGYG FTGYIOO </code></pre><p>"GOLF" is found backwards along an upper-left to bottom-right diagonal. This is the region containing it:</p><pre><code>FWNO ULUH UYOI TFUG </code></pre><p><strong>No Match</strong></p><pre><code>BHTGIVUHSR BWEVYWHBWB BHTWBYTWYB </code></pre><p>("GOLF" cannot be found anywhere.)</p><h3>Problem 5 - Detect Square Inputs</h3><p>Match the entire input if it is a square block of characters.</p><p><strong>Match</strong></p><pre><code>qwerty asdfgh zx vbn uiop[] `1234 67890- </code></pre><p>There are six lines, each of which contains six characters, even though two characters are spaces.</p><p><strong>No Match</strong></p><pre><code> hello world </code></pre><p>The two lines don't each contain two characters.</p><h3>Problem 6 - Find Gliders in a Game of Life</h3><p>In Conway's <a href=http://en.wikipedia.org/wiki/Conway%27s_Game_of_Life rel=nofollow>Game of Life</a> one of the most popular and simple patterns is the <a href=http://www.conwaylife.com/wiki/Glider rel=nofollow>glider</a>. There are two different stages in a glider's life cycle:</p><pre><code>## ## # # and ## # # </code></pre><p>Of course, the Game of Life is invariant under rotation and reflection, so in total there are 16 different shapes which resemble a glider at some stage.</p><p>Given input consisting only of <code>#</code> and spaces, match a 3x3 block containing a glider in any orientation. Spaces are significant! (Due to the conditions in the surrounding 5x5 layer, these matches might not be actual gliders, but don't worry about that.)</p><p><strong>Match</strong></p><pre><code>## # # ## # # # # # ## ### # # # # </code></pre><p>This contains three gliders - top right corner, bottom right corner and left centre.</p><p><strong>No match</strong></p><pre><code>## # # ### ## # # # # ## ### </code></pre><h3>Problem 7 - Match Nether Portals</h3><p>Based on <a href=http://codegolf.stackexchange.com/questions/45488/nether-portal-detection>this challenge</a> by Calvin's Hobbies.</p><p>In the game of Minecraft, players can construct inter-dimensional portals using blocks of obsidian arranged into a rectangular frame. Given a 2D slice of the world, match a Nether portal. A valid Nether portal is a rectangle of empty space (<code>.</code>) surrounded on all sides by obsidian (<code>X</code>). The rectangle of space can be from 2 wide by 3 tall to 22 wide by 22 tall. Corners don't matter.</p><p><strong>Match</strong></p><pre><code>....X...... .XXXXXX.XX. ...X...X... .X.X...XXX. ...X...X.X. .XXX...X.X. X..XXXXX.X. </code></pre><p>This is a 3 wide by 4 tall portal.</p><p><strong>No Match</strong></p><pre><code>XX..XXXX XX..X..X XX..X..X ..X.X..X .X..X.XX </code></pre><p>This is almost a 2 wide by 3 tall portal, but one obsidian block is missing from the bottom.</p><h3>Problem 8 - Minecraft Chest Placement</h3><p>Based on <a href=http://codegolf.stackexchange.com/q/45720/8478>this challenge</a> by Calvin's Hobbies.</p><p>For this problem, returning the position of the match might be more useful than returning the actual match. "Adjacent" refers only to neighbours in four orthogonal directions. You're given a grid of <code>.</code> (empty cell) and <code>C</code> (chest). Two adjacent chests are considered a "double chest". You should match valid positions for placing additional chests. An empty cell is valid as long as it not adjacent to a double chest or two normal chests. Taking the example from the linked challenge, if the input is</p><pre><code>.......C.. ...C..C... .........C .CC...CC.. .......... </code></pre><p>the pattern should match the cell marked by <code>*</code> in the following grid:</p><pre><code>******.C** ***C**C.** *..***..*C .CC.*.CC.* *..***..** </code></pre><h3>Problem 9 - Horizontal and Vertical Alignment</h3><p>Match a part of a column or row if it contains two or more <code>#</code> characters. As an extension, you could return the index of the column or row. If your language supports non-contiguous matches, match <em>only</em> the two <code>#</code>, not the row/column in between.</p><p><strong>Match</strong></p><pre><code>.,.,.,.#., ,.,#,.,.,. .,.,.,.,., ,.,.,.,.,. .,.#.,##., ,.,.,.,.,. </code></pre><p>There are 5 possible matches, three horizontal or two vertical. The horizontal matches are <code>#.,#</code>, <code>##</code>, and <code>#.,##</code>. The vertical matches are <code>#,.#</code> and <code>#.,.#</code>. The language can match any one or more of those.</p><p><strong>No Match</strong></p><pre><code>.,.#.,., ,.,.,.#. .,#,.,., ,.,.,.,# .#.,.,., ,.,.#.,. #,.,.,., ,.,.,#,. </code></pre><p>This is also a solution to the Eight Queens problem.</p><h3>Problem 10 - Collinear Points</h3><p>Match a set of three <code>#</code>s that are in a straight line, which can have any orientation, not necessarily horizontal or vertical (i.e. it could be diagonal are long knight's move steps). If your language supports non-contiguous matches, match <em>only</em> the three <code>#</code>, not the characters in between.</p><p><strong>Match</strong></p><pre><code>........ #..#..#. ...#.... #....... ...#.... </code></pre><p>There are three possible matches. They are: the second row, the fourth column, and a diagonal line across 7 columns and 3 rows.</p><p><strong>No Match</strong></p><pre><code>.#..# #..#. #.... ..#.# </code></pre><p>There are no collinear <code>#</code>s in this input.</p><h3>Problem 11 - Verify Prelude Syntax</h3><p><a href=http://esolangs.org/wiki/Prelude rel=nofollow>Prelude</a> is an esoteric language, which has very few, but unusual, restrictions on what constitutes a valid program. Any block of printable ASCII text is valid provided that:</p><ul><li>Every column of text contains at most one of <code>(</code> and <code>)</code>.</li><li>Ignoring their vertical position, the <code>(</code> and <code>)</code> are balanced. Each <code>(</code> and be paired with exactly one <code>)</code> to the right of it, and vice-versa.</li></ul><p>The pattern should match any valid Prelude program and fail to match invalid programs.</p><p><strong>Match</strong></p><pre><code>?1-(v #1)- 1 0v ^(# 0)(1+0)#)! (#) ^#1-(0 # </code></pre><p><strong>No match</strong></p><pre><code>#(#(##)##)##( )##(##(##)#)# </code></pre><p>For more examples, see <a href=http://codegolf.stackexchange.com/q/47239/8478>this challenge</a>.</p><h3>Problem 12 - Avoid the Letter Q</h3><p>Match any 4x4 square of characters that does not contain the letter <code>Q</code> or <code>q</code>.</p><p><strong>Match</strong></p><pre><code>bhtklkwt qlwQklqw vtvlwktv kQtwkvkl vtwlkvQk vnvevwvx </code></pre><p>There is one 4x4 square that does not contain any Qs. This is</p><pre><code>vlwk twkv wlkv vevw </code></pre><p><strong>No Match</strong></p><pre><code>zxvcmn xcvncn mnQxcv xcvmnx azvmne </code></pre><p>The single <code>Q</code> in the center prevents any matches from being formed.</p><h3>Problem 13 - Diamond Mining</h3><p>Locate a diamond in a field of characters. Diamonds are rotated squares formed by the characters <code>/\X</code>. Each corner of the diamond is formed by an <code>X</code>, and the corners are connected in lines formed by <code>\</code> or <code>/</code>, where each side is the same length. A diamond of size 0 has no <code>/\</code> sides. As an extension, you can return all of the diamonds in the field, return only the diamond's characters, and/or return the size of the diamond found.</p><p><strong>Match</strong></p><pre><code>...X......X.... ../.\..../.\... ./.X.\..X...X.. X.X.X.XX.\./.\. .\.X.//.\.X...X ..\./X...X.\./. .X.X..\./...X.. X.X....X....... .X............. </code></pre><p>There are 6 different diamonds in this field. Two are size 0, three are size 1, and one is size 2. Notice how diamonds can overlap parts or nest. You should be able to match diamonds of any size up to a reasonably high limit.</p><p><strong>No Match</strong></p><pre><code>.X......./.... .\....X....... ...X.\.\...X.. ..X.\...\.X.\. ...X.X...X.\.X ../X\...\...X. .X...\.\..X... ..\./.X....X.. ...X..../..... </code></pre><p>No diamonds are found here.</p><h3>Problem 14 - Matching Crosses</h3><p>We'll define a cross as a rectangular region, which has been sliced into a (not necessarily regular) grid of 3x3 subregions. The four corner regions are filled with <code>.</code> whereas the other five regions are filled with <code>#</code>.</p><p><strong>Match</strong></p><pre><code>....... .###... ######. ######. .###... .###... .###.#. ....### .....#. </code></pre><p>There is the small cross in the lower right corner, and the large cross yields 5 potential matches: the top and left arms will always have length 1, but the right and bottom arms can each either have length 1 or 2 - in addition the bottom arm can have length 3 if the right arm has length 1. Note that the full large cross cannot be matched because the top of the small cross would protrude into the lower right region.</p><p><strong>No Match</strong></p><pre><code>.######. ...##... ...##... ........ </code></pre><p>Each arm must have length of at least 1.</p><h3>Problem 15 - Match a Word in a Boggle Board</h3><p><a href=http://en.wikipedia.org/wiki/Boggle rel=nofollow>Boggle</a> is a bit like a word search, except that you change direction after each letter. Given a block of text, if it contains the word <code>panama</code> (<em>case-insensitive!</em>) according to the Boggle rules, match the smallest rectangle containing it. You may decide whether single letters can be reused or not. If your language can handle both, show it off! If you allow letters to be reused you can also choose if a letter can be used twice in a row (i.e. whether staying on the same cell is a valid move). If your language supports non-rectangular matches, match <em>only</em> the word.</p><p><strong>Match without reusing letters</strong></p><pre><code>EmYPhNuy AaaKVsjL onlGviCw DdOgFrRn ISeHZmqc zMUkBGJQ </code></pre><p>(There's a snaking <code>panamA</code> or <code>panAma</code> towards the upper left corner, depending on the order it's matched in.)</p><p><strong>Match with reusing letters</strong></p><pre><code>ExYPhNuy AfEKVsjL oblGviCw DdOgArRn ISepnmqc zMUkBGJQ </code></pre><p>(There's a <code>pAnAmA</code> towards the lower right corner, where all three <code>A</code> are the same.)</p><p><strong>No Match</strong></p><pre><code>BpbrzTHY mAJVRLuF jyXSPknK hoeGcsEl QCZagNMI dvUOqixt </code></pre><h3>Problem 16 - Wrap around the Edges</h3><p>Match a rectangle of <code>#</code>, at 3x3 in size, which may wrap around the edges of the input.</p><p><strong>Match</strong></p><pre><code>#..## #..## ..... #..## </code></pre><p>(The 9 <code>#</code> form exactly one 3x3 region if you consider the edges to be adjacent.)</p><p><strong>No Match</strong></p><pre><code>...## #..## #..## #..#. </code></pre>
(No, you don't need to read that HTML code. Hit the "Run code snippet" button to get it nicely rendered by your browser, which you can then also view full screen.)
Is there a general time limit on these problems? I'm very interested in solving this but I'm very busy, it could easily take me 2 weeks to do. – Devon Parsons – 2015-03-03T20:38:18.353
7@DevonParsons There is no entry deadline. – PhiNotPi – 2015-03-03T20:42:57.933
Looks interesting, especially since you created a new tag for this. I think there should be bonus points for creating a 2-D language for it. – mbomb007 – 2015-03-03T21:36:08.933
1@mbomb007 There are bonus points for creating a 2-D language. I'm pretty sure it would get a decent amount of upvotes. ;) – Martin Ender – 2015-03-03T21:42:04.417
@MartinBüttner I don't even know how to create a language, really. Could it be something as (simple?) as creating a Python program that takes a file of your new language's code (and interpreting/executing it based on your defined syntax) and producing output? – mbomb007 – 2015-03-03T21:44:14.473
@mbomb007 Yes, exactly that. You need to write a parser for the language and then process some input text block based on what you parsed. – Martin Ender – 2015-03-03T21:47:51.633
@MartinBüttner In that case, I'd like to give it a go. If I succeed or get close, I'll probably want help improving efficiency or with suggestions after the fact, since I'm only an intermediate-level programmer in Python. I tried to understand the Python interpreter for FlogScript, and I didn't get very far. – mbomb007 – 2015-03-03T21:56:08.643
Is the chatroom for this question still open/active? I've got some ideas I want to throw out, but it's not really complete. – luser droog – 2015-03-04T08:36:28.570
@luserdroog Yes it's a alive and well (I see you've already found it, but others might be interested, too).
– Martin Ender – 2015-03-04T13:06:38.630@MartinBüttner @PhiNotPi In the nether portal problem, can you clarify what part of the input should be matched (the obsidian? the enclosed space?) In the Boggle problem, can the same character be used twice in a row (e.g., does the input
A
contain the wordAA
?) – Ell – 2015-03-07T14:47:46.353@Ell For the nether portal, it's enough to match the interior, because the borders might overlap. If your language supports overlapping matches, feel free to include the borders. For the boggle problem, I'd leave that up to what your language can handle. – Martin Ender – 2015-03-07T14:53:10.227
@MartinBüttner @ PhiNotPi Will it be ok to post the spec first and add the implementation later? I'd rather not write a rushed implementation. – Ell – 2015-03-07T18:34:36.817
@Ell Without an implementation it seems hard to judge whether the design actually works as advertised (both for you and others). Since this question isn't likely to hit HNQ anyway at this point, I don't think you lose anything by holding off the post until you've got at least a working prototype. If you want to share the design to get feedback, you're welcome to do so in the chatroom linked at the top of the challenge. :) – Martin Ender – 2015-03-07T18:39:17.430
Bit late, but I feel like collinear points could do with a test case where the distance between hash A and hash B isn't a multiple of the distance between hash C and hash D, e.g.
#...#..#
. Also, it'd be interesting to have more than three hashes in a line :) – Sp3000 – 2015-04-04T16:48:05.300Late to the party, I'm looking for an API or interactive user interface for very lightweight python programmers, that they can be taught use instead of regular expressions. Would you particularly recommend any of this contest's implementations? the target audience are researchers who do not need to become programmers, but do not to work on extracting surface-form stuff from texts for some of their research. Regex is just too obtuse for such audience. – matanster – 2018-06-08T11:41:48.903