Implement the named labels for assembly

6

1

Let's assume we've got an imaginary assembler. The assembler supports numerical labels. An infinite loop example:

:1
    jmp 1

Your task is, to write a preprocessor for this assembler supporting named labels (instead of numerical ones; with up to 8 characters in length), so the label name is replaced to it's numerical value.

The assembler uses a dot (.) to express a character constant, for example:

mov dl, .A

Will store ASCII(65) - 'A', in the register dl. The assembler supports string constants in double quotes too:

:20
db "Hello, world!"

So, to sum things up, you need to be aware of character constants, and the fact that you can't perform any substitution inside quote-enclosed strings.

The label declaration character and reference character is up to your choice, but for the sake of completness I will use a $ to refer to a label, and @ to denote a label.

The label' number might be any natural number not equal to zero.

You may assume that assembly on the input is perfectly valid.

Let's look at a few examples:

@loop
    jmp $loop
:1
    jmp 1

jmp $skip
mov dl, .@
@skip
cmp dl, .$
jmp 1
mov dl, .@
:1
cmp dl, .$

@x
db "$a @b"
@a
    jmp $a
    @b
        jmp $b
:1
db "$a @b"
:2
    jmp 2
    :3
        jmp 3

This is a code golf, so the smallest answer wins. Standard rules and loopholes apply.

You can safetly assume that the user won't use numerical labels anymore, and the labels contain just A-Za-z letters.

Label declaration and reference character has to be different.

Krzysztof Szewczyk

Posted 2019-10-09T16:12:38.990

Reputation: 3 819

2May the strings contain escaped double-quotes? (e.g. db "this is \"bad\"") – Arnauld – 2019-10-09T16:20:13.103

@Arnauld nope, they can't. The example you've given is incorrect input so you don't need to deal with that. – Krzysztof Szewczyk – 2019-10-09T16:29:38.183

Can the named labels contain numbers? Besides a newline can they contain any ASCII character? Or even any character? – FryAmTheEggman – 2019-10-09T16:33:57.360

@Arnauld sorry, my mistake. – Krzysztof Szewczyk – 2019-10-09T16:50:54.147

@FryAmTheEggman you can assume just A-Za-z letters. – Krzysztof Szewczyk – 2019-10-09T16:51:10.217

@Arnauld no, they have to be different. – Krzysztof Szewczyk – 2019-10-09T16:51:40.410

May a single line contain both a string and a genuine reference to a label? – Arnauld – 2019-10-09T17:01:15.827

1(Or maybe more generally: may a genuine reference to a label be followed by anything other than an end of line?) – Arnauld – 2019-10-09T17:11:37.970

@Arnauld yes, it can. – Krzysztof Szewczyk – 2019-10-09T17:27:54.867

You should specify what the behaviour of each of the examples is – Jo King – 2019-10-09T21:11:07.123

2Can anything be on the same line as a label? – Jonathan Allan – 2019-10-09T23:07:54.200

1Can newlines be inside quoted strings? – Jonathan Allan – 2019-10-09T23:09:25.903

1Can quote characters appear anywhere for any other purpose? – Jonathan Allan – 2019-10-09T23:11:23.633

1Answering questions in order, yes - a comment starting with a semicolon and ending with a newline, no - they can not, yes - you have to parse the syntax, not guess the semantics. – Krzysztof Szewczyk – 2019-10-10T05:14:13.410

Do we need to preserve comments unchanged? Also there is nothing about comments in the question itself. – Qwertiy – 2019-10-17T14:48:41.300

Answers

4

JavaScript (ES6), 96 bytes

Uses @label for a declaration and %label for a reference.

s=>s.replace(o=/^\s*@(.+)/gm,(_,s)=>':'+(o[s]=++n),n=0).replace(/".*?"|%(\w+)/g,(s,x)=>x?o[x]:s)

Try it online!

Commented

(alternate slash characters are used below to prevent the SE syntax highligter from going mad)

s =>                  // s = input string
  s.replace(          // 1st pass:
    o =               //   assign the regex object to o; we'll use it to store the labels
    ∕^\s*@(.+)∕gm,    //   look for [start] + [optional whitespace] + '@' + [label]
    (_, s) =>         //   replace with:
      ':' +           //     ':' + label ID
      (o[s] = ++n),   //     increment n and store the label in o
    n = 0             //   start with n = 0
  )                   // end of replace()
  .replace(           // 2nd pass:
    ∕".*?"|%(\w+)∕g,  //   look for either ".*?" (non-greedily) or %label
    (s, x) =>         //   replace with:
      x ? o[x]        //     o[label] if x is defined,
        : s           //     or the original string otherwise
  )                   // end of replace()

Arnauld

Posted 2019-10-09T16:12:38.990

Reputation: 111 334

Nice! Would you like to add explaination? – Krzysztof Szewczyk – 2019-10-09T19:14:34.473

I might have stolen your RegEx idea (embarrassed face). – Night2 – 2019-10-09T19:32:37.267

@Night2 My RegEx! Where's my RegEx?! :p – Arnauld – 2019-10-09T19:41:59.363

/".*?"|%(\w+)/g, ported it to /".*?"|([@_]\w+)/ for my usage! – Night2 – 2019-10-09T19:43:30.877

Btw, invalidated: Doesn't work if there is whitespace at the start of the line. – Krzysztof Szewczyk – 2019-10-10T16:07:10.420

@KrzysztofSzewczyk Do you mean that there can be whitespace before a label declaration? – Arnauld – 2019-10-10T16:24:55.583

@Arnauld yes, quick fix - [ \t]* before the @. – Krzysztof Szewczyk – 2019-10-10T16:26:10.357

@KrzysztofSzewczyk This should be specified. – Arnauld – 2019-10-10T16:33:28.653

@Arnauld I thought it's implicit. Well, I'll state it in the question anyway. – Krzysztof Szewczyk – 2019-10-10T16:41:21.747

@KrzysztofSzewczyk FWIW, most assemblers I know of would throw a syntax error if there's some whitespace before a label declaration. – Arnauld – 2019-10-10T16:46:07.670

@Arnauld NASM doesn't for sure, GAS doesn't too. – Krzysztof Szewczyk – 2019-10-10T16:52:24.430

2

PHP (7.4), 148 143 bytes

-5 bytes by stealing RegEx idea from @Arnauld.

<?=preg_replace_callback('/".*?"|([@_]\w+)/',fn($m)=>$m[1]?[':'][$m[0][0]>A].($_GET[$w=substr($m[0],1)]?:$_GET[$w]=++$_GET[0]):$m[0],$argv[1]);

Try it online!

Define labels with @ and refer to them with _.

Captures any @<label> or _<label> which aren't inside double quotes ("...") and replaces them with a unique number for every unique label, starting from 1. Also adds the : when replacing labels that have @ before them.

PHP's global variable $_GET is used to store and access last used id and unique ids for each label inside the arrow function. $_GET[0] holds last used id and $_GET[<label>] holds unique id for <label>.

Night2

Posted 2019-10-09T16:12:38.990

Reputation: 5 484

Nice job! Quite small one. – Krzysztof Szewczyk – 2019-10-09T18:55:03.533

2

Retina 0.8.2, 97 bytes

ms(1`^@
:@
+`(.*^:(1*).*?^)@
$1:1$2@
^:1*
:$.&
\$(.+?)$(?<=(?=.*^:(\d+)@\1$).+)
$2
^(:\d+).*?$
$1

Try it online! Uses the provided symbols (could save 1 byte by using a different symbol). Explanation:

ms(

Run everything in single multiline mode, where . matches newlines and ^ and $ match at the beginning and end of each line.

1`^@
:@

Prefix a : to the first label.

+`(.*^:(1*).*?^)@
$1:1$2@

Number the labels in unary.

^:1*
:$.&

Convert to decimal.

\$(.+?)$(?<=(?=.*^:(\d+)@\1$).+)
$2

Find all matching references and replace them with the number.

^(:\d+).*?$
$1

Delete the label names.

Neil

Posted 2019-10-09T16:12:38.990

Reputation: 95 035

2

Python 3, 136 117 bytes

New and improved:

import re;d={}
f=lambda s:re.sub(r'(".+?")|([@$])(\w+)',lambda m:m[1]or d.setdefault(m[3],f":{m.end()}")[m[2]<='@':],s)

Try it online!

Uses the position of the match.end() as the integer for a label.

Old code

import re;d={};n=[*range(9999)]
f=lambda s:re.sub(r'(".+?")|([@$])(\w+)',lambda m:m[1]or d.setdefault(m[3],f":{n.pop()}")[m[2]<='@':],s)

Try it online!

Labels start at 9998 and count downward. Some numbers may be skipped. It will handle about 5000 labels, which should be enough for any sane asm program, but the number can be increased at 1 byte for each order of magnitude increase in range.

RootTwo

Posted 2019-10-09T16:12:38.990

Reputation: 1 749

2

Perl 5 (-p), 57 bytes

s/^\s*\K@(\w+)/":".($$1=++$,)/mge;s/".*?"\K|\$(\w+)/$$1/g

Try it online!

  • $, : global variable used as counter
  • s/^\s*\K@(\w+)/":".($$1=++$,)/mge : replaces labels "@[label]" by ":[counter]", increment counter and store counter in variable label name's variable
  • ".*?"\K| : regex part used to ignore strings between ""
  • s/...\$(\w+)/$$1/g : replaces references by variables value

Nahuel Fouilleul

Posted 2019-10-09T16:12:38.990

Reputation: 5 582

Hell yeah, that's what I've been looking for. Classy, write-only Perl snippet. – Krzysztof Szewczyk – 2019-10-17T14:19:45.483

See nothing about ignoring comments starting from semicolon. Or we don't need to preserve them? – Qwertiy – 2019-10-17T14:28:14.220

you don't need to preserve them. – Krzysztof Szewczyk – 2019-10-17T14:37:05.447

Also, I wrote a perl implementation too for this challenge, but it differs from the specification (as it's suited for my assembler), you might want to check it out if you want ;)

https://github.com/KrzysztofSzewczyk/asmbf/blob/master/labels.pl

– Krzysztof Szewczyk – 2019-10-17T14:37:59.927

indeed, it's similar lbl, instead of :. Just strings handling seems easier to maintain when matching first unwanted in alternation

– Nahuel Fouilleul – 2019-10-17T19:38:43.830

0

Javascript ES6, 71 char

s=>s.replace(/(;.*|".*?")|:(:?)(\w+)/gm,(m,s,c,l)=>s||c+parseInt(l,36))

Conditions:

  • :: to introduce the label
  • : to use the label
  • Labels are case insensitive
  • ; starts comment till the end of the line
  • Quoted strings may not have " or linebreak inside
  • Charecter constants may not be followed by colon or letter like .:a or .::
  • Classic numeric labels starting from : are disallowed

Test:

console.log((

s=>s.replace(/(;.*|".*?")|:(:?)(\w+)/gm,(m,s,c,l)=>s||c+parseInt(l,36))

)(`
::loop
    jmp :loop

jmp :skip
mov dl, .:
::skip
cmp dl, .:

::x
db ":a ::b"
::a
    jmp :a
    ::b
        jmp :b
`))
.as-console-wrapper.as-console-wrapper{max-height:100vh}

Qwertiy

Posted 2019-10-09T16:12:38.990

Reputation: 2 697

Your assumptions on character constants make the submission invalid, but the example output is proving your submission is right. Please clarify your points. – Krzysztof Szewczyk – 2019-10-17T14:34:55.570

@KrzysztofSzewczyk, character constant is . followed by any single char. .: is character constant (and it works fine). I require it not to be followed by letter or colon, so .:a and .:: should not be in the program. I don't see points why that makes submission invalid. – Qwertiy – 2019-10-17T14:47:56.947

You should use conjunction instead of alternative. – Krzysztof Szewczyk – 2019-10-17T15:38:59.613

@KrzysztofSzewczyk, don't understand you. – Qwertiy – 2019-10-17T15:55:35.070

You should have used AND instead of OR for the sentence to be true. – Krzysztof Szewczyk – 2019-10-17T16:32:29.943