Tips for storage in a golfing language

16

4

I am writing a golfing language.

Do you suggest variables, stack(s), tapes, registers, etc for storage in a code-golf language? What about implicit input?

Rough definitions:

  • A variable is simply a name (usually one character long in golfing languages) that a value can be assigned to, and later retrieved by that name.
  • A register is like a variable, but it has its own (usually single-byte) commands for setting/getting the value.
  • A stack is a variable-length array/list of values, where the most recently added values (the values "on top") are the ones that are being modified.
  • A queue is like a stack, except the values "on the bottom" are the ones that are being modified.
  • A tape is a static array/list of values where each value has an index. The main difference between a stack and a tape is that values on a tape are modified in-place.

I would appreciate to know the advantages and disadvantages of each option for different scenarios, and overall. Please avoid opinions, and backup statements with reasoning.

programmer5000

Posted 2017-04-29T17:40:24.590

Reputation: 7 828

1I think you should include a definition of these terms in your question – user41805 – 2017-04-29T17:41:51.277

How should the answers be? One tip per answer, or one generalised community-wiki answer for all the tips? – user41805 – 2017-04-29T17:44:14.923

IMHO implicit input belongs in a different question, since it's basically going to be the same no matter which storage method you use. – ETHproductions – 2017-04-29T17:49:24.860

This question is entirely dependent on what kind of language you're making. Too broad. – Fatalize – 2017-04-29T17:55:17.957

2@Fatalize Technically the storage method isn't dependent on what kind of golfing language you're making, the kind of golfing language is dependent on the storage method... – ETHproductions – 2017-04-29T17:56:29.273

1@ETHproductions They're totally interdependent. – Fatalize – 2017-04-29T18:00:27.427

I agree with @Fatalize, this is too broad. Even though I answered, I unfortunately can't make your golfing language for you. – NoOneIsHere – 2017-04-29T18:08:21.997

@Fatalize you say that "This question is entirely dependent on what kind of language you're making", but you also say "They're totally interdependent.". I don't understand. – programmer5000 – 2017-04-29T18:09:33.370

1I added some rough definitions of the various storage terms, feel free to edit or roll back if you don't like them. – ETHproductions – 2017-04-29T18:10:16.263

@ETHproductions thanks, those are great! – programmer5000 – 2017-04-29T18:10:58.933

2There is a very close relationship between the storage method and the type of language, but I think you have to look at both. For example, "imperative" languages (those which perform their instructions strictly left-to-right) can be stack-based (CJam, 05AB1E), tape-based (BrainF***), or something else entirely (V, which uses one big 2D string called the "buffer", along with a few registers). There's also prefix-based languages (Pyth), infix-based languages (Japt, Pip, and basically every mainstream lang), link-based languages (Jelly), etc. all of which hardly use any of the mentioned methods. – ETHproductions – 2017-04-29T18:14:50.990

@programmer5000 It's much better now, but it's still a bit broad. If you say, for example, "I would like to know the advantages and disadvantages of each", or something like that, then this would not be too broad. I don't want to put words in your mouth though, so I won't edit. – NoOneIsHere – 2017-04-29T18:15:14.820

1@programmer5000 Great, I will retract my close vote. – NoOneIsHere – 2017-04-29T18:17:30.890

This question is about starting to design a golfing language. When you're starting to design a golfing language, you pretty much have to decide on the best types of storage to use and the type of data flow through the code (e.g. left-to-right for CJam, BF, etc., right-to-left for prefix langs such as Pyth, a combination of both for langs such as Jelly...) at the same time because these two things are very closely related. Because of this, would you mind editing the question to include choosing the type of language? I can edit for you if you'd like. – ETHproductions – 2017-04-30T00:18:52.347

1Golfing languages are about reducing redundancy - being able to express a solution in terms of its logical components with as little boilerplate as possible. You should try to make as many things implicit as you can. Most likely, use of the queue would be implicit. Tapes tend to complicate things, and I can't see how they would be useful if you already have a stack, reqisters, etc as well as a good selection of data types and builtins on those data types. – Esolanging Fruit – 2017-05-01T09:10:10.250

Answers

4

All of the storage types involve storing something at one point and retrieving it later. To do this in only one operation, you should do either storing or retrieving automatically, and specify the position of the stored value in the other operation.

That is, for explicit storage, you could create an operator to retrieve the nth calculated value before this operation, or put back the current value after n operations. Alternatively, you could use the absolute position from the start of the program, or do more things such as removing some elements automatically after some operations (like in a stack). You could also make multiple operators, retrieving from different copies of the storage with or without these automatic operations. And you should try to make the maximum number needed to specify in the operations reasonably small, so you could assign one operator for each number.

But in most cases, you don't even need an operator and the language will do this implicitly. That is when you need to consider a more standardized model such as stacks or queues. The most successful for now seemed to be tacit programming, which doesn't even mention storage directly.

If you want to design a new such model, you could try expanding the evaluations as a dag, and try to think of a default dag if nothing else is specified. Most likely, the default is just a tree, except for that multiple leaves may be linked to the same input. You could for example use a queue for a balanced tree, or a stack for a deep tree where leaves are mostly constant, or something like Jelly for a deep tree where leaves are mostly copies of the input.

But note that, you could encode the shape of a binary tree in only 2 bits per operator. So, if your language has less than 64 operators, you may actually ignore the traditional models and just encode the complete tree in the spare bits (call them combine_parent and below_leaf flags). Even if there are more operators, you could make a fairly good default (such as the model of Jelly) and 3 modifiers to change it.

You could use the same model for implicit and explicit storage for convenience, but you don't have to. For example, you could use a stack for implicit storage, but don't pop elements in the explicit storage (or in another explicit storage in addition to the implicit one). It's likely it won't be called a stack in the final documentation, but you get the idea.

For reference, the size of the perfect encoding of a binary tree is the logarithm of the Catalan numbers. And the size of the perfect encoding of a "binary" dag is the logarithm of A082161, but obviously impractical. This assumes an operator with different argument order two different operators, adding another bit when it is not.

Sometimes you may still want variables for the loops. It might be possible to rewrite the loops in other ways. But if you really need it, don't use a 1-byte construct in addition to a name to define the variable. unless you are only using the preinitialized values, it's usually more efficient to use a 1-bit flag to specify whether you are reading or writing this variable.

jimmy23013

Posted 2017-04-29T17:40:24.590

Reputation: 34 042

7

I suggest all of them!

More seriously, they all come in handy some of the time, and the more the better! Implicit input is never bad, just have a flag to turn it off. Variables are helpful so they don't need to be found on a stack or tape; same with registers. Stacks are helpful to store data, and so are tapes. I recommend trying to implement multiple, say a stack and registers, or a stack and variables, like GolfScript. If you can make each function one byte, then your language will probably be effective at golfing, as you can use the advantages of each.

An example:

  • Say I want to take two numbers as inputs, and add their string lengths
  • Variables might be better for this (a stack might not)
  • Example code in GolfScript (with implicit input):

    ~,\,+
    ~ # Eval input (e.g. "1 2" > 1, 2)
    , # Take Length
    \ # Reverse items on the stack
    , # Take Length
    + # Add
      # Implicit output
    

    However, with variables (I know it's longer, it just doesn't have to swap places on the stack):

    ~,:a;,:b;ab+
    ~ # Eval input
    , # Get length
    :a# Store in "a"
    ; # Discard value left on stack
    , # Length
    :b# Store in "b"
    ; # Discard
    a # Recall a
    b # Recall b
    + # Add
      # Implicit output
    

Overloads

Another thing that can be helpful is overloads. For example, if you have a variable storing function, maybe it could be used as a monad (one-input function; I'm not sure of the term out side of J/K/APL) to add to the stack or tape.

Example:

c12 # Stores "1" into register 2
c1] # Pushes "1" onto the stack ("]" is used to denote the end of the monad)

NoOneIsHere

Posted 2017-04-29T17:40:24.590

Reputation: 1 916

Maybe if an argument is called with the wrong types, it gets added to a queue which is then used to fill in values if the stack is empty? – Esolanging Fruit – 2017-05-01T09:13:38.553

5

I'd suggest having some quickly usable storage (from the given - tape, queue, stack) and some permanent storage (variables, registers) for things to not get in the way while the program is doing something unrelated. I'd say that much more would rarely give anything and leave more characters free for more 1-byte instructions.

From the definitions given, the ones I think would work the best would be a stack and registers.
A stack, because a tape has to use functions just to store a new thing in it, while a stack should have simple push and pop functions (usually built in the commands). Registers, because they take less bytes usually compared to variables and if you need to store more than 2-4 different things for something, you're doing something wrong.

Don't limit the functions of them only to what their names or definitions suggest trough - some functions like put the 1st thing of the stack on top of the stack can definitely help.

dzaima

Posted 2017-04-29T17:40:24.590

Reputation: 19 048

5

There are basically two sorts of storage that need to be handled differently; access to recently generated values and/or the inputs; and long-term storage.

For long-term storage, variables seem to work best; anything with a limited number of options doesn't scale (although languages with this property, like Jelly, nonetheless can do fairly well even on medium-sized tasks). However, there's no need to provide variable names when storing the variable, most of the time; just have a command to store a value in the variable, and autogenerate the names according to a predictable pattern so that storing and retrieving the value can be one command each in simple cases. (For example, you could have commands to restore the most recently assigned variable, the second most recently, the third most recently, and so on, up to a small fixed number, plus a general command that took an argument.)

For short term storage, you want as much to be implicit as possible. Almost all golfing languages I've seen connect the output of one command to an input of the next, by default; the exact way in which this is done differs from language to language but normally comes to the same thing. However, the second input for a 2-input command is more interesting. Taking it from a stack works well in cases where values are only used once, but doesn't scale well when reusing values. (Try to avoid stack manipulation primitives; if you have to resort to using them, you're wasting a lot of bytes.) Alternatively, using user input or a recently assigned variable as the implicit second argument tends to give a few byte savings on simple programs, although you'll then need a parenthesis equivalent to make it possible to place nontrivial expressions on both sides of a binary operator.

In a golfing language I'm working on at the moment, I use a combination of a very cheap mechanism (a single bit) to specify the shape of the parse tree, automatically filling in missing arguments from user inputs by default, and a checkpointing approach that makes it possible to set the default for missing arguments to something else (plus plenty of special cases). At some point I'm likely going to add variables to communicate information greater distances along the program.

user62131

Posted 2017-04-29T17:40:24.590

Reputation:

0

I suggest a tape and a register.

I prefer tapes over stacks because tapes tend to have fewer elements, making manipulation of them easy. Also, being able to place elements in the tape in the register and vice versa makes for easy, short code.

user68614

Posted 2017-04-29T17:40:24.590

Reputation: