Tips for Creating/Maintaining a Golfing Language

23

8

Creating a golfing language can be hard. Let's help budding golfing language creators out and provide some helpful tips on how to create one.

I'm looking for tips on:

  1. The design process
  2. Implementation
  3. Community engagement

One tip per answer would be appreciated, and tips like "don't create a new language, use an existing one" probably aren't very helpful.

You don't need to have created a golfing language to answer this question. Feel free to answer from personal experiences or from general observations.

Lyxal

Posted 2020-01-24T09:11:07.947

Reputation: 5 253

1

The community engagement process is a dupe of this.

– None – 2020-01-24T10:45:30.493

@a'_', the launching part is, but I also am interested in seeing how to keep a language going after being launched – Lyxal – 2020-01-24T11:19:50.803

3@a'_' Oh wow, that first answer of DJMcMayhem♦ is almost exactly what I wrote in my CW answer below regarding point 3, except much more elaborated. Never seen that meta-answer before, so it's funny how similar it is. :D – Kevin Cruijssen – 2020-01-24T12:56:28.137

Answers

19

Here are some suggestions. Sorry that this partially overlaps with other answers, which have been posted as I was writing this.

Design process:

  1. One possibility (by all means not the only one) to decide which features (functions, data types, etc.) your language L should have is to base it on another language B that you have been using for long. That way you already have a good idea which functions of B are used most often, and therefore should be included in your language L, and can assign shorter names in L to functions that are commonly used in B.

    Examples:

    • L = MATL: B = MATLAB/Octave.
    • L = Pyth: B = Python.
    • L = Japt: B = JavaScript.
    • L = Brachylog: B = Prolog.
    • L = Husk: B = Haskell.
    • L = ShortC: B = C.
    • L = V: B = Vim.
  2. Decide which "paradigm" your language will use. Some examples are

    • Stack-based (Cjam, MATL, 05AB1E);
    • Tacit (Jelly);
    • Prefix notation (Pyth);
    • Infix notation (Pip, Japt);
    • Fixed arity (Pyth);
    • Variable arity (CJam, MATL).

    These decisions are quite independent from item 1. For example, MATL's function f has the same functionality as MATLAB's find, but is used differently in that it pops its inputs from the stack and pushes its outputs onto it, whereas MATLAB uses normal function arguments which can be stored in variables.

  3. Incorporate functions from other golfing languages that you find useful. Don't let item 1 (if applicable) limit language L's definition.

  4. Be prepared to add functions in the future. As you (or others) use the language, you will find things it would be nice to add. So reserve some of the "namespace" for future expansion. For example, don't use up all your single-character names at once.

Implementation

  1. Write the compiler (or interpreter) for your language L in a language C that you know well (often C = B).
  2. The compiler is most often actually a transpiler into another language T (which can be the same as B or C). Language T should have a free implementation. That way it is viable to have an online compiler for your language. The compiler will be a program in C that takes source code in L to produce transpiled code in T, and then calls T's compiler/interpreter to run the transpiled code.

    A natural choice is C = T. Examples:

    • L = Japt: B = C = T = JavaScript.
    • L = Jelly: C = T = Python (and Jelly is inspired by J; but perhaps not so much as to claim that B = J).
    • L = MATL: B = C = T = MATLAB/Octave.

Community engagement

  1. Host your compiler/interpreter in a public repository such as GitHub, so people can easily download it, create bug reports, suggest new features or even contribute with code.
  2. Write good documentation. That helps users understand your language better. Besides, I found that task more rewarding than I expected. I recommend writing the specification while you are designing the language, not at the end. That way consistency between language behaviour and its specification is ensured.
  3. Ideally the documentation should include some quick reference (table or summary), so experienced users don't have to go the full documentation to find out the name of a function they know about but whose name or syntax they have forgotten.
  4. Create an esolangs page with basic information about your language.
  5. Create a chat room where people can ask and discuss about the language. Visit it often.
  6. Answer questions in your language, with explanations about how the code works. You'll want to do that anyway (it's your language, you will find it fun to use), but this also helps get people curious about your language, and shows some of the language's properties, which might get people interested.
  7. If you chose to base your language L on a language B (see item 1) that is general-purpose and well known, users of B will find it easy to switch to L, which will allow them to provide short answers (in L) with minimal effort (coming from B).

Luis Mendo

Posted 2020-01-24T09:11:07.947

Reputation: 87 464

7

It strikes me that the most important design decision is what the underlying paradigm of the golfing language is.

Here are some possible types of language:

  • Stack based
  • Array based
  • Object based
  • Functional
  • Imperative
  • Declarative

Indeed you might even have a mixture of these, or something else entirely like a two dimensional language, automata, regex, machine language or a Turing machine and so on.

This is important because it will greatly affect the syntax of the language and in some ways how concisely code can be written.


Edit: Follow Up

I thought that I would show how the design and implementation phases of creating a language actually work in practice. I realise that's it's more than one tip in this answer and it's lengthy I hope that's ok.

Design Considerations

I wanted the following things out of my new language:

  • Quick to implement (or prototype)
  • Flexible and concise syntax, perhaps with a view to Golfing
  • Can do useful things, not just a toy language

I plumped for some version of Forth, because it would satisfy the first two critera. It would also have to be interpreted at least for prototyping, compiling is out because it would take too much development time.

In terms of it doing useful things, it absolutely had to have the following traits:

  • Ability to call functions
  • Arithmetic and logic operations
  • Manipulation of numbers and strings
  • Loops and conditional structures
  • Stack manipulation

And because it's Forth I thought the following would be kind of cool:

  • Multiple data stacks
  • Ability to 'pass in' function literals to be executed inside functions (kind of functional style)

Implementation Details

Firstly, I would leverage my knowledge and use a language I know well for the interpreter: PHP - at least for the prototyping stage.

Next I needed the interpreter to be able to recognise (tokenise) these four things:

  • String Literals: 'hello world'
  • Numeric Literals: 3.1415
  • Labels (for function names) drop
  • symbols (representing atomic actions): #@$^

The simplest solution for tokensing the program is to use regex. Also because I wanted concise syntax labels would be strictly alphabetic. I would also need some sort of separator to remove ambuiguity in tokenising, I left a space and comma free for that.

So with all that in mind I could create a fat-free Forth syntax. For example:

1.5,2.7,3,4add add add;

would push four numbers on to the stack, and call a function add three times and then return ;.

Once tokenised the interpreter can then consume tokens one by one and act accordingly.

One consequence of using regex, is that it's unable to handle a nested syntax. So I would need to manage nested loops in some way. The way to do this is to look for the start of loops (token) and find the corresponding end and record somehow in the start token where the end is and vice-versa. That way the interpreter could jump around loops and conditionals very easily. A stack would be needed to know which token closes which opening token.

Functions I would just manage as simply labels, or named positions in the token stream. When a function is executed, the position is looked up and interpretation continues from there. I would return from the function using a ; token. This would also require a function call stack to handle nested function calls - and return back to the calling position in the token stream.

For the fancy stuff, such as passing in function literals, the idea would be to push a string literal containing the code fragment on to the data stack e.g:

'2+;'apply;apply:`;

So to break the above down, '2+;' is a string literal containing the code fragment (push 2 and add to top item on data stack). A function called apply is then called. The function definition begins apply: and a backtick actually pops the string literal and executes it in its own brand new context. Once the fragment has been executed and returns, the function then continues.

The interpreter would handle this by separating out the parsing and tokenisation from the actual execution. That way the literal code fragment can be parsed when pulled off the stack, and that new context passed into the Executing function, using PHP's scoping to handle the new context's scope. The only fly in the ointment is that to be able to call a function from the code fragment, it would need to be able to access the parent's context. For example:

'dotproduct;'walkarray;walkarray:...`...;dotproduct:...;

Next for multiple data stacks that would be easy. I would have just one main data stack where all the action happens and provide atomic actions that can push and pull to other named data stacks. That should greatly simplify operations with vectors or arrays of numbers.

I also wanted conditional loops with the condition either at the beginning or end or just infinite loops. So I chose [...] for an if condition, [...) and (...] for conditional loops and (...) for infinite loop.

Lastly, some features that are missing: general mathematical functions, extensive string handling, goto and breaking out of loops and conditions. Although it is possible to break out by returning in a function by using func:(...[;]...).

Anyway, here's the semi-golfed prototype, and hopefully semi readable, enjoy!

Try it online!

Guillermo Phillips

Posted 2020-01-24T09:11:07.947

Reputation: 561

1If you want a beginner-friendly golfing language, the stack-based paradigm will suit you most. (Stack-based languages are famously easy to write in.) Also it's the most common paradigm of golfing languages and the easiest paradigm to implement. – None – 2020-01-24T10:58:46.997

2Stack based is good because you don't have to have extra syntax to override precedence rules. But stack based require a fair amount of pure stack manipulation and this would add overhead. – Guillermo Phillips – 2020-01-24T11:01:49.843

Stack-based is also the least original paradigm for golfing, and will therefore probably lead to less adoption by other people, unless it is truely groundbreaking compared to the plethora of other stack-based languages. – Fatalize – 2020-01-24T13:11:11.913

6

Remember to add implicit input/output

In my opinion, a golfing language (whether successful or not) should always have some kind of implicit input/output. This applies to Jelly, 05AB1E, and pretty much every competitive golfing language.

I'll take CJam as an example. CJam doesn't have an all-purpose implicit input; it has input-reading instructions like l and q, and then if the input isn't just a string, it's followed by a ~. That's how Pyth gained the upper hand than CJam.

If you are trying to make a competitive golfing language, try to avoid making your input type-dependent, otherwise you'll be needing type conversion every time you take input (which is pretty wasteful).

Types of implicit input

For my purpose I could only think of two types of implicit input: Taking the whole input as an argument (e.g. GolfScript & Pyth), and cycling the implicit inputs for the operators of the program (e.g. 05AB1E and Jelly). The former is a basic form of implicit input, but still allows you to win some challenges. For the latter, you need to think carefully about how this system works, otherwise it would not help programmers.

Implicit output

Implicit output is another very important design feature of a golfing language. For example, Element isn't very well-designed, as you need a backtick (output the whole stack) every time at the end of the program.

Currently there are only two types of implicit output: full implicit output (the thing that Jelly & GolfScript/CJam uses) and top implicit output (the one that Pyth and 05AB1E uses). Both of them are perfectly competive, however under my very unscientific testing, top implicit output languages seem to require extra joins at the end.

I never used Jelly/GolfScript/CJam, but when I compare 05AB1E's implicit top output and MathGolf's implicit full joined output, I personally prefer top output tbh. Many times I have to clean up the stack if I only want to output a single item or single list in MathGolf. Here an example answer. Although I certainly see your point, and in some cases it's indeed useful. Maybe I'm just too used to output top after starting with MathGolf, but I personally prefer just implicit top in most cases. – Kevin Cruijssen

user85052

Posted 2020-01-24T09:11:07.947

Reputation:

That's how Pyth got shorter than CJam You may be over-simplifying a bit. Pyth programs can be shorter or longer than CJam programs for many different reasons – Luis Mendo – 2020-01-24T12:47:57.513

I never used Jelly/GolfScript/CJam, but when I compare 05AB1E's implicit top output and MathGolf's implicit full joined output, I personally prefer top output tbh. Many times I have to clean up the stack if I only want to output a single item or single list in MathGolf. Here an example answer. Although I certainly see your point, and in some cases it's indeed useful. Maybe I'm just too used to output top after starting with MathGolf, but I personally prefer just implicit top in most cases.

– Kevin Cruijssen – 2020-01-24T13:10:56.210

5

I never created a language myself, but I can partially answer number 3 (I will make this a community wiki, so feel free to add more).

Some tips after the core part of your language is done:

  • Have a GitHub/GitLab project where you can link to, including a wiki-page/README of your language in general (what it's made for, how to compile and run it locally, etc.), and a short description of each of its builtins.
  • Ask @Dennis in talk.tryitonline.net chat to add your language to the online compiler TIO.
  • Add your language to the Showcase of Languages to expose it to the community.
  • Possibly create a chat room so people have a place to report bugs, discuss improvements, or talk about golfing strategies in your language.
  • Create a page for your language, and perhaps post a few tips as answers to get it started.
  • Start answering challenges here on CGCC in your language, so people seeing it for the first time might become interested. Answering well-known challenges like Hello World, "Never Gonna Give You Up"; Default Quine; Prime checker; Fizz-Buzz sequence; Odd or Even; etc. are all great challenge to introduce your new language, and let people who see it get a general feel for what's possible in your language.
    Apart from those and some other popular challenges, just answering any new challenge is also a great way to expose your language, since people tend to look at new challenges more often than existing ones.

Kevin Cruijssen

Posted 2020-01-24T09:11:07.947

Reputation: 67 575

You didn't have to make this CW. I realised that I should have added a part saying You don't necessarily have to have created a golfing language to answer this question. – Lyxal – 2020-01-24T09:44:03.250

1@Lyxal I made it a CW so everyone can edit it freely and add more points for no. 3. I don't think each of these tips above would need a separated tip, nor any similar ones I might have forgotten. I do am very curious about the no. 1 and 2 of your question though. Some things that come to mind are: what is the underlying compiler/language used to compile/run your language in, and why did you choose that one? What did you do to ensure a good performance? Etc. I'm curious what kind of answers we will get. I'll probably tag some people in chat who I know created a language (after a while). :) – Kevin Cruijssen – 2020-01-24T09:47:40.687

1I'm thinking of trying to ask Mego and DjMcMayhem fot tips on discord – Lyxal – 2020-01-24T09:50:05.873

2

Sounds like a good idea. I thought about tagging Adnan, although he isn't very active anymore unfortunately. He started with 05AB1E once (written in Python); then derived 2sable for more flexible implicit input; he later on added this implicit input to 05AB1E as well, making 2sable rather obsolete; and mid-2018 he 'released' a completely rewritten 05AB1E, with new and changed builtins and infinite lists support (written in Elixir) (last two are available on TIO). So lots of experience and history to contribute here.

– Kevin Cruijssen – 2020-01-24T09:57:47.400

2

This relates to the implementation phase

Decide Upon How the Language Will Be Executed

The two most common options to implement your language are a) interpretation and b) compilation.

Interpretation is where your language is executed step by step. In other words, you write all the processesing logic behind the language.

Compilation or transpilation is when you turn your language into another language. This is like translating it into another language.

Compilation is much easier to implement than interpretation, but interpretation allows for more flexibility.

Lyxal

Posted 2020-01-24T09:11:07.947

Reputation: 5 253

4Can you give an example of interpretation allowing more flexibility? – None – 2020-01-24T11:38:27.380