I'm looking for tips for golfing in the R statistical language. R is perhaps an unconventional choice for Golf. However, it does certain things very compactly (sequences, randomness, vectors, and lists), many of the built-in functions have very short names, and it has an optional line terminator (;). What tips and tricks can you give to help solve code golf problems in R?

Ari B. Friedman

Posted 2011-11-30T12:40:07.743

Reputation: 1 013

14The answers to this question could double as an anti-styleguide for R, given that code golf is really the only time you should do a lot of these things :-) – Andrew Brēza – 2017-08-07T18:53:14.527

Answers

Some tips:

In R, it's recommended to use <- over =. For golfing, the opposite holds since = is shorter...
If you call a function more than once, it is often beneficial to define a short alias for it:
```
as.numeric(x)+as.numeric(y)

a=as.numeric;a(x)+a(y)
```
Partial matching can be your friend, especially when functions return lists which you only need one item of. Compare rle(x)$lengths to rle(x)$l
Many challenges require you to read input. scan is often a good fit for this (the user ends the input by entring an empty line).
```
scan()    # reads numbers into a vector
scan(,'') # reads strings into a vector
```
Coercion can be useful. t=1 is much shorter than t=TRUE. Alternatively, switch can save you precious characters as well, but you'll want to use 1,2 rather than 0,1.
```
if(length(x)) {} # TRUE if length != 0
sum(x<3)         # Adds all the TRUE:s (count TRUE)
```
If a function computes something complicated and you need various other types of calculations based on the same core value, it is often beneficial to either: a) break it up into smaller functions, b) return all the results you need as a list, or c) have it return different types of values depending on an argument to the function.
As in any language, know it well - R has thousands of functions, there is probably some that can solve the problem in very few characters - the trick is to know which ones!

Some obscure but useful functions:

sequence
diff
rle
embed
gl # Like rep(seq(),each=...) but returns a factor

Some built-in data sets and symbols:

letters     # 'a','b','c'...
LETTERS     # 'A','B','C'...
month.abb   # 'Jan','Feb'...
month.name  # 'January','Feburary'...
T           # TRUE
F           # FALSE
pi          # 3.14...

Tommy

Posted 2011-11-30T12:40:07.743

Reputation: 821

Instead of importing a package with library, grab the variable from the package using :: . Compare the followings:
```
library(splancs);inout(...)
splancs::inout(...)
```
Of course, it is only valid if one single function is used from the package.
This is trivial but a rule of thumb for when to use @Tommy's trick of aliasing a function: if your function name has a length of m and is used n times, then alias only if m*n > m+n+3 (because when defining the alias you spend m+3 and then you still spend 1 everytime the alias is used). An example:
```
nrow(a)+nrow(b)     # 4*2 < 4+3+2
n=nrow;n(a)+n(b)
length(a)+length(b) # 6*2 > 6+3+2
l=length;l(a)+l(b)
```
Coercion as side-effect of functions:
- instead of using as.integer, character strings can be coerced to integer using : :
```
as.integer("19")
("19":1)[1] #Shorter version using force coercion.
```
- integer, numeric, etc. can be similarly coerced to character using paste instead of as.character:
```
as.character(19)
paste(19) #Shorter version using force coercion.
```

plannapus

Posted 2011-11-30T12:40:07.743

Reputation: 8 610

8Re: 3rd tip, el("19":1) is even shorter by one byte. – JayCe – 2018-08-22T22:37:24.463

Some very specific golfing tips:

if you need to extract the length of a vector, sum(x|1) is shorter than length(x) as long as x is numeric, integer, complex or logical.
if you need to extract the last element of a vector, it may be cheaper (if possible) to initialise the vector backwards using rev() and then calling x[1] rather than x[length(x)] (or using the above tip, x[sum(x|1)]) (or tail(x,1) --- thanks Giuseppe!). A slight variation on this (where the second-last element was desired) can be seen here. Even if you can't initialise the vector backwards, rev(x)[1] is still shorter than x[sum(x|1)] (and it works for character vectors too). Sometimes you don't even need rev, for example using n:1 instead of 1:n.
(As seen here). If you want to coerce a data frame to a matrix, don't use as.matrix(x). Take the transpose of the transpose, t(t(x)).
if is a formal function. For example, "if"(x<y,2,3) is shorter than if(x<y)2 else 3 (though of course, 3-(x<y) is shorter than either). This only saves characters if you don't need an extra pair of braces to formulate it this way, which you often do.
For testing non-equality of numeric objects, if(x-y) is shorter than if(x!=y). Any nonzero numeric is regarded as TRUE. If you are testing equality, say, if(x==y)a else b then try if(x-y)b else a instead. Also see the previous point.
The function el is useful when you need to extract an item from a list. The most common example is probably strsplit: el(strsplit(x,"")) is one fewer byte than strsplit(x,"")[[1]].
(As used here) Vector extension can save you characters: if vector v has length n you can assign into v[n+1] without error. For example, if you wanted to print the first ten factorials you could do: v=1;for(i in 2:10)v[i]=v[i-1]*i rather than v=1:10:for(...) (though as always, there is another, better, way: cumprod(1:10))
Sometimes, for text based challenges (particularly 2-D ones), it's easier to plot the text rather than cat it. the argument pch= to plot controls which characters are plotted. This can be shortened to pc= (which will also give a warning) to save a byte. Example here.
To take the floor of a number, don't use floor(x). Use x%/%1 instead.
To test if the elements of a numeric or integer vector are all equal, you can often use sd rather than something verbose such as all.equal. If all the elements are the same, their standard deviation is zero (FALSE) else the standard deviation is positive (TRUE). Example here.
Some functions which you would expect to require integer input actually don't. For example, seq(3.5) will return 1 2 3 (the same is true for the : operator). This can avoid calls to floor and sometimes means you can use / instead of %/%.
The most common function for text output is cat. But if you needed to use print for some reason, then you might be able to save a character by using show instead (which in most circumstances just calls print anyway though you forego any extra arguments like digits)

JDL

Posted 2011-11-30T12:40:07.743

Reputation: 1 135

1tail(v,1) is the same length as rev(v)[1] for the "last element of an array" golfing tip as well. – Giuseppe – 2018-05-01T19:58:53.110

read.csv(t="a,b,c",,F) is shorter than el(strsplit("a,b,c",",")). – J.Doe – 2018-09-24T15:10:21.113

1An equivalent to sum(x|1) is sum(1^x). When we have a shorthand for sum, this can be useful due to operator precedence, as in "!"=sum;!1^x. – Robin Ryder – 2020-02-04T22:30:32.133

Abuse the builtins T and F. By default, they evaluate to TRUE and FALSE, which can be automatically converted to numerics 1 and 0, and they can be re-defined at will. This means that you don't need to initialize a counter (e.g. i=0 ... i=i+1), you can just use T or F as needed (and jump straight to F=F+1 later).
Remember that functions return the last object called and do not need an explicit return() call.
Defining short aliases for commonly used functions is great, such as p=paste. If you use a function a lot, and with exactly two arguments, it is possible that an infixing alias will save you some bytes. Infixing aliases must be surrounded by %. For example:
```
`%p%`=paste
```
And subsequently x%p%y, which is 1 byte shorter than p(x,y). The infixing alias definition is 4 bytes longer than the non-infixing p=paste though, so you have to be sure it's worth it.

rturnbull

Posted 2011-11-30T12:40:07.743

Reputation: 3 689

11You can use primitive functions and you save many bytes: \+`=paste; x+y` – Masclins – 2017-05-04T09:53:05.233

Using `if`, `ifelse`, and `if`

There are several ways to do if-statements in R. Golf-optimal solutions can vary a lot.

The basics

if is for control flow. It is not vectorized, i.e. can only evaluate conditions of length 1. It requires else to (optionally) return an else value.
ifelse is a function. It is vectorized, and can return values of arbitrary length. Its third argument (the else value) is obligatory.*
`if` is a function, with the same syntax as ifelse. It is not vectorized, nor are any of the return arguments obligatory.

* It's not technically obligatory; ifelse(TRUE,x) works just fine, but it throws an error if the third argument is empty and the condition evaluates to FALSE. So it's only safe to use if you are sure that the condition is always TRUE, and if that's the case, why are you even bothering with an if-statement?

Examples

These are all equivalent:

if(x)y else z # 13 bytes
ifelse(x,y,z) # 13 bytes
`if`(x,y,z)   # 11 bytes

Note that the spaces around else are not required if you are using strings directly in the code:

if(x)"foo"else"bar"   # 19 bytes
ifelse(x,"foo","bar") # 21 bytes
`if`(x,"foo","bar")   # 19 bytes

So far, `if` looks to be the winner, as long as we don't have vectorized input. But what about cases where we don't care about the else condition? Say we only want to execute some code if the condition is TRUE. For one line of code alone, if is usually best:

if(x)z=f(y)         # 11 bytes
ifelse(x,z<-f(y),0) # 19 bytes
`if`(x,z<-f(y))     # 15 bytes

For multiple lines of code, if is still the winner:

if(x){z=f(y);a=g(y)}        # 20 bytes
ifelse(x,{z=f(y);a=g(y)},0) # 27 bytes
`if`(x,{z=f(y);a=g(y)})     # 23 bytes

There's also the possibility where we do care about the else condition, and where we want to execute arbitrary code rather than return a value. In these cases, if and `if` are equivalent in byte count.

if(x)a=b else z=b   # 17 bytes
ifelse(x,a<-b,z<-b) # 19 bytes
`if`(x,a<-b,z<-b)   # 17 bytes

if(x){z=y;a=b}else z=b   # 22 bytes
ifelse(x,{z=y;a=b},z<-b) # 24 bytes
`if`(x,{z=y;a=b},z<-b)   # 22 bytes

if(x)a=b else{z=b;a=y}   # 22 bytes
ifelse(x,a<-b,{z=b;a=y}) # 24 bytes
`if`(x,a<-b,{z=b;a=y})   # 22 bytes

if(x){z=y;a=b}else{z=b;a=y}   # 27 bytes
ifelse(x,{z=y;a=b},{z=b;a=y}) # 29 bytes
`if`(x,{z=y;a=b},{z=b;a=y})   # 27 bytes

Summary

Use ifelse when you have input of length > 1.
If you're returning a simple value rather than executing many lines of code, using the `if` function is probably shorter than a full if...else statement.
If you just want a single value when TRUE, use if.
For executing arbitrary code, `if` and if are usually the same in terms of byte count; I recommend if mainly because it's easier to read.

rturnbull

Posted 2011-11-30T12:40:07.743

Reputation: 3 689

1Nice! Very good comparisons, +1! – Billywob – 2016-10-28T11:13:28.810

You can assign a variable to the current environment while simultaneously supplying it as an argument to a function:
```
sum(x <- 4, y <- 5)
x
y
```
If you are subseting a data.frame and your condition depends on several of its columns, you can avoid repeating the data.frame name by using with (or subset).
```
d <- data.frame(a=letters[1:3], b=1:3, c=4:6, e=7:9)
with(d, d[a=='b' & b==2 & c==5 & e==8,])
```
instead of
```
d[d$a=='b' & d$b==2 & d$c==5 & d$e==8,]
```
Of course, this only saves characters if the length of your references to the data.frame exceeds the length of with(,)
if...else blocks can return the value of the final statement in which ever part of the block executes. For instance, instead of
```
a <- 3
if (a==1) y<-1 else
if (a==2) y<-2 else y<-3
```
you can write
```
y <- if (a==1) 1 else 
     if (a==2) 2 else 3
```

Matthew Plourde

Posted 2011-11-30T12:40:07.743

Reputation: 231

4Only caution about (1) is that when you do that you're passing it in by order not by named arguments. If f <- function(a,b) cat(a,b), then f(a <- 'A', b <- 'B') is not the same as f(b <- 'B', a <- 'A'). – Ari B. Friedman – 2013-10-15T13:11:02.873

Do-while loops in R

Occasionally, I find myself wishing R had a do-while loop, because:

 some_code
while(condition){
 some_code # repeated
}

is far too long and very un-golfy. However, we can recover this behavior and shave off some bytes with the power of the { function.

{ and ( are each .Primitive functions in R.

The documentation for them reads:

Effectively, ( is semantically equivalent to the identity function(x) x, whereas { is slightly more interesting, see examples.

and under Value,

For (, the result of evaluating the argument. This has visibility set, so will auto-print if used at top-level.

For {, the result of the last expression evaluated. This has the visibility of the last evaluation.

(emphasis added)

So, what does this mean? It means a do-while loop is as simple as

while({some_code;condition})0

because the expressions inside {} are each evaluated, and only the last one is returned by {, allowing us to evaluate some_code before entering the loop, and it runs each time condition is TRUE (or truthy). The 0 is one of the many 1-byte expressions that forms the "real" body of the while loop.

Giuseppe

Posted 2011-11-30T12:40:07.743

Reputation: 21 077

Implicit type conversion

The functions as.character, as.numeric, and as.logical are too byte-heavy. Let's trim them down.

Conversion to logical from numeric (4 bytes)

Suppose x is a numeric vector. Using the logical not operator ! implicitly recasts the numeric to a logical vector, where 0 is FALSE and nonzero values are TRUE. ! then inverts that.

x=!x

x=0:3;x=!x returns TRUE FALSE FALSE FALSE.

Conversion to character from numeric or logical (7 bytes)

This is a fun one. (From this tweet.)

x[0]=''

R sees that you're updating the vector x with '', which is of class character. So it casts x into class character so it's compatible with the new data point. Next, it goes to put '' in the appropriate place... but the index 0 doesn't exist (this trick also works with Inf, NaN, NA, NULL, and so on). As a result, x is modified in class only.

x=1:3;x[0]='' returns "1" "2" "3", and x=c(TRUE,FALSE);x[0]='' returns "TRUE" "FALSE".

If you have a character object already defined in your workspace, you can use that instead of '' to save a byte. E.g., x[0]=y!

Conversion to character from numeric or logical under certain conditions (6 bytes)

J.Doe pointed out in the comments a six-byte solution:

c(x,"")

This works if x is atomic and if you intend to pass it to a function which requires an atomic vector. (The function may throw a warning about ignoring elements of the argument.)

Conversion to numeric from logical (4 bytes)

You can use the funky indexing trick from above (e.g. x[0]=3), but there's actually a quicker way:

x=+x

The positive operator implicitly recasts the vector as a numeric vector, so TRUE FALSE becomes 1 0.

rturnbull

Posted 2011-11-30T12:40:07.743

Reputation: 3 689

Your last trick could be x=+x to keep TRUE as 1. – Giuseppe – 2018-04-18T09:16:11.240

@Giuseppe Oh, duh, of course! Thanks, updated now. – rturnbull – 2018-04-19T07:37:45.457

Conversion from numeric or logical to character. You can use c(x,"") if x is atomic, provided that you're then going to use x in a function that only cares about the first element (it may complain). This is 1 byte cheaper than x[0]="";. – J.Doe – 2018-09-05T12:01:26.400

Save values in-line: Others have mentioned that you can pass values in-order and assign them for use elsewhere, i.e.
```
sum(x<- 1:10, y<- seq(10,1,2))
```
However, you can also save values inline for use in the same line!

For instance
```
n=scan();(x=1:n)[abs(x-n/2)<4]
```
reads from stdin, creates a variable x=1:n, then indexes into x using that value of x. This can sometimes save bytes.
Alias for the empty vector You can use {} as the empty vector c() as they both return NULL.
Base Conversion For integer digits of n in base 10, use n%/%10^(0:nchar(n))%%10. This will leave a trailing zero, so if that is important to you, use n%/%10^(1:nchar(n)-1)%%10 as it is shorter than array indexing. This can be adapted to other bases, using floor(log(n,b))+1 instead of nchar(n)
Using seq and :: Rather than using 1:length(l) (or 1:sum(x|1)), you can use seq(l) as long as l is a list or vector of length greater than 1, as it defaults to seq_along(l). If l could potentially be length 1, seq(a=l) will do the trick.

Additionally, : will (with a warning) use the first element of its arguments.
Removing attributes Using c() on an array (or matrix) will do the same as as.vector; it generally removes non-name attributes.
Factorial Using gamma(n+1) is shorter than using factorial(n) and factorial is defined as gamma(n+1) anyway.
Coin Flipping When needing to do a random task 50% of the time, using rt(1,1)<0 is shorter than runif(1)<0.5 by three bytes.
Extracting/Excluding elements head and tail are often useful to extract the first/last few elements of an array; head(x,-1) extracts all but the last element and is shorter than using negative indexing, if you don't already know the length:
```
head(x,-1)
x[-length(x)]
x[-sum(x|1)]
```

Giuseppe

Posted 2011-11-30T12:40:07.743

Reputation: 21 077

@J.Doe worthy of its own post, I think! Perhaps with a title of "alternatives to rep". Other tips questions have a restriction of one tip per answer, which I wholeheartedly endorse for this question, too! Also, 1:n*0 is shorter than Im(1:n) by two bytes, which means your second trick can be x+0*-n:n as well :-) – Giuseppe – 2018-09-07T17:42:26.903

1@J.Doe Or even better, !1:n is also an array of n zeros depending on use case; credit to the MATL/MATLAB tips question (probably Luis Mendo) for that one, though. – Giuseppe – 2018-09-07T17:43:34.387

Thanks, @Giuseppe! Can I suggest you create this post, as I don't want to take reputation from your good ideas. – J.Doe – 2018-09-07T17:48:21.383

@J.Doe oh, I don't mind. Always good to have other R golfers getting more visibility; I think it's fair to say I'm a pretty known entity at this point! You've been going around suggesting quite impressive improvements, so take the rep (pun not intended) and keep up the good work golfing :-) – Giuseppe – 2018-09-07T17:51:55.380

1not (log(i,b)%/%1):0) instead of floor(log(n,b))+1? – ASCII-only – 2019-02-24T02:10:49.180

@ASCII-only wow. Yeah, that'd work. I think even 0:log(i,b) would work, as : is really an alias for seq(a,b,by=1), so it will truncate the log(i,b). Sometimes, simpler is better :-( – Giuseppe – 2019-02-25T14:50:32.503

Yeah, but for reverse looks like intdiv is shortest – ASCII-only – 2019-02-25T22:48:50.500

Abuse outer to apply an arbitrary function to all the combinations of two lists. Imagine a matrix with i, j indexed by the first args, then you can define an arbitrary function(i,j) for each pair.
Use Map as a shortcut for mapply. My claim is that mapply is cheaper than a for loop in situations where you need to access the index. Abuse the list structure in R. unlist is expensive. methods::el allows you to cheaply unlist the first element. Try to use functions with list support natively.
Use do.call to generalize function calls with arbitrary inputs.
The accumulate args for Reduce is extremely helpful for code golf.
Writing to console line by line with cat(blah, "\n") is cheaper with write(blah, 1). Hard coded strings with "\n" may be cheaper in some situations.
If a function comes with default arguments, you can use function(,,n-arg) to specify the n-th argument directly. Example:seq(1, 10, , 101) In some functions, partial argument matching is supported. Example: seq(1, 10, l = 101).
If you see a challenge involving string manipulation, just press the back button and read the next question. strsplit is single handily responsible for ruining R golf.

Now for some newly discovered tips from 2018

A[cbind(i,j)] = z can be a good way to manipulate matrices. This operation is very byte efficient assuming you design i, j, z as vectors with correct lengths. You may save even more by calling the actual index/assign function "[<-"(cbind(i,j), z). This way of calling returns the modified matrix.
Use a new line instead of \n for line breaks.
Squeezing down line counts can save you bytes. In-line assignment lapply(A<-1:10,function(y) blah) and function args assignment function(X, U = X^2, V = X^3) are ways of doing this.
So "[<-" is a function in R (and is related to my ancient question on SO)! That is the underlying function responsible for operations such as x[1:5] = rnorm(5). The neat property of calling the function by name allows you to return the modified vector. In order words "[<-"(x, 1:5, normr(5)) does almost the same thing as the code above except it returns the modified x. The related "length<-", "names<-", "anything<-" all return modified output

Vlo

Posted 2011-11-30T12:40:07.743

Reputation: 806

2I think using "[<-" is worthy of its own "Tips" answer, as it will return the modified array/matrix/whatever. – Giuseppe – 2018-03-15T15:05:11.833

Change the meaning of operators

R operators are just functions that get special treatment by the parser. For example < is actually a function of two variables. These two lines of code do the same thing:

x < 3
`<`(x, 3)

You can reassign another function to an operator, and the parser will still do it's thing, including respecting operator precedence, but the final function call will be the new one rather than the original. For example:

`<`=rep

now means these two lines of code do the same thing:

rep("a", 3)
"a"<3

and precedence is respected, resulting in things like

"a"<3+2
#[1] "a" "a" "a" "a" "a"

See for example this answer, and also the operator precedence page. As a side effect, your code will become as cryptic as one written in a golf language.

Some operators like + and - can accept either one or two parameters, so you can even do things like:

`-`=sample
set.seed(1)
-5  # means sample(5)
#[1] 2 5 4 3 1
5-2 # means sample(5, 2)
#[1] 5 4

See for example this answer.

See also this answer for using [ as a two-byte, three-argument operator.

JayCe

Posted 2011-11-30T12:40:07.743

Reputation: 2 655

This is a comment on rturnbull's tips but I think we need to start enforcing a "one tip per answer" rule because it's so freakin' hard to find the one I need when I come here.

– Giuseppe – 2018-05-16T16:24:43.603

1also depending on the precedence of the operators, you can do some funky stuff that might help; like < has lower precedence than +, but * has higher precedence than + so you could potentially chain them together! – Giuseppe – 2018-05-16T16:27:32.820

1@Giuseppe you know what I tried to find before posting and couldn't find it. Thanks for pointing it out. I'm planning to add more details on operator precedence with examples as I start using this trick more and more. – JayCe – 2018-05-16T16:28:36.230

3Here's a fun one: if you bind ? to paste or some other function that can take two arguments, the precedence order means you can still use inline assignments via a<-b?d<-e. – J.Doe – 2018-10-02T09:40:37.020

You should add [ as a three-element alias (that's two bytes); I often find it helpful for things like outer (and consistently forget about it!), although of course you need to ensure you don't actually need to use [. It would also likely be helpful to link to the operator precedence page to help with alias selection.

– Giuseppe – 2019-05-02T19:48:50.487

@Giuseppe good point! I wish I had more time to golf... golfing made me a better programmer by forcing me to fully understand the impact of any character I type... and I'm not even joking! – JayCe – 2019-05-03T00:02:23.263

Some basic concepts but should be somewhat useful:

In control flow statements you can abuse that any number not equal to zero will be evaluated as TRUE, e.g.: if(x) is equivalent to if(x!=0). Conversely, if(!x) is equivalent to if(x==0).
When generating sequences using : (e.g. 1:5) one can abuse the fact that the exponentiation operator ^ is the only operator that has precedence over the :-operator (as opposed to +-*/).
```
1:2^2 => 1 2 3 4 
```
which saves you two bytes on the parentheses that you would normally have to use in case you wanted to e.g. loop over the elements of an n x n matrix (1:n^2) or any other integer that can be expressed in a shorter manner using exponential notation (1:10^6).
A related trick can of course be used on the vectorized operations as well +-*/, although most commonly applicaple to +-:
```
for(i in 1:(n+1)) can instead be written as for(i in 0:n+1)
```
This works because +1is vectorized and adds 1 to each element of 0:n resulting in the vector 1 2 ... n+1. Similarly 0:(n+1) == -1:n+1 saves you one byte as well.
When writing short functions (that can be expressed on one line), one can abuse variable assignment to save two bytes on the enclosing curly brackets {...}:
```
f=function(n,l=length(n))for(i in 1:l)cat(i*l,"\n")
f=function(n){l=length(n);for(i in 1:l)cat(i*l,"\n")}
```
Note that this might not always comply to rules of certain challenges.

Billywob

Posted 2011-11-30T12:40:07.743

Reputation: 3 363

Just a little correction: ^ is vectorized, it's just that it has precedence over : (i. e. it's executed before : unless brackets explicitly indicate the opposite, see ?Syntax for the exact order of precedence of binary and unary operators). Same goes for the binary +-/* which have lower precedence than : hence your trick n°3. – plannapus – 2016-10-21T11:18:00.667

@plannapus Thanks for clarifying. Updated the wording. – Billywob – 2016-10-21T11:21:16.227

Scenarios where you can avoid `paste(...,collapse="")` and `strsplit`

These are a pain in the usual string challenges. There are some workarounds.

Reduce(paste0,letters) for -5 bytes from paste0(letters,collapse="")
A 2-byte golf where you have a list containing two vectors c(1,2,3) and c(4,5,6) and want to concatenate them element-wise to a string "142536". Operator abuse gives you p=paste0;"^"=Reduce;p^p^r which saves two bytes on the usual paste0 call.
Instead of paste0("(.{",n,"})") to construct (eg) a regex for 20 bytes, consider a regex in a regex: sub(0,"(.{0})",n) for 17 bytes.

Sometimes (quite often, actually) you'll need to iterate through a vector of characters or strings, or split a word into letters. There are two common use cases: one where you need to take a vector of characters as input to a function or program, and one where you know the vector in advance and need to store it in your code somewhere.

a. Where you need to take a string as input and split it into either words or characters.

If you need words (including characters as a special case):
- If a newline 0x10 (ASCII 16) separating the words is OK, x=scan(,"") is preferred to wrapping your code in function(s,x=el(strsplit(s," "))).
- If the words can be separated by any other whitespace, including multiple spaces, tabs, newlines etc, you can use @ngm's double scan trick: x=scan(,"",t=scan(,"")). This gives the scanned in string to scan as the text arg and separates it by whitespace.
- The second argument in scan can be any string so if you have created one, you can recycle it to save a byte.
If you need to turn an input string into a vector of characters:
- x=el(strsplit(s,"")) is the shortest general solution. The split argument works on anything of length zero including c(), {} etc so if you happen to have created a zero length variable, you could use it to save a byte.
- If you can work with the ASCII character codes, consider utf8ToInt, since utf8ToInt(x) is shorter than the strsplit call. To paste them back together, intToutf8(utf8ToInt(x)) is shorter than Reduce(paste0,el(strsplit(x,""))).
- If you need to split arbitrary strings of numbers like "31415926535" as input, you can use utf8ToInt(s)-48 to save 3 bytes on el(strsplit(s,"")), provided you can use the integer digits instead of the characters, as is often the case. This is also shorter than the usual recipe for splitting numbers into decimal digits.

b. Where you need a fixed vector of either words or characters in advance.

If you need a vector of single characters that have some regular pattern or are in alphabetic order, look at using intToUtf8 or chartr applied to a sequence via a:b or on the built in letters sets letters or LETTERS. The pattern language built into chartr is especially powerful.
For 1 to 3 characters or words, c("a","b","c") is the only general shortest solution.
If you need a fixed vector of between 4 and 10 non whitespace characters or words, use scan with stdin as the file arg:

f(x=scan(,""))
q
w
e
r
t
y
u

If scan from stdin isn't possible, for 6 or more non whitespace characters or words, use scan with the text argument scan(,"",t="a b c d e f").
If you need a vector of (a) 6 or more characters of any type or (b) 10 or more non-whitespace characters , strsplit via x=el(strsplit("qwertyuiop","")) is probably the way to go.
You may be able to get away with the following quote trick: quote(Q(W,E,R,T,Y)), which creates that expression. Some functions like strrep, and grep will coerce this to a vector of strings! If you do, this is good for any length of word or character vector from 3 to 11.
There's no good reason to use strsplit on words via x=el(strsplit("q w e r t y"," ")). It always loses to scan(,"",t="q w e r t y")) by a fixed overhead of 5 bytes.

Here's a table of the byte counts used by each approach to read in a vector of single characters of length n. The relative ordering within each row is valid for characters or words, except for strsplit on "" which only works on characters.

| n  | c(...) | scan | scan | strsplit | quote |
|    |        |+stdin|+text | on ""    | hack  |
|    |        |      |      | CHAR ONLY|       |
|----|--------|------|------|----------|-------|
| 1  | 3      | 11   | 15   | 20       | 8     |
| 2  | 10     | 13   | 17   | 21       | 11    |
| 3  | 14     | 15   | 19   | 22       | 13    |
| 4  | 18     | 17   | 21   | 23       | 15    |
| 5  | 22     | 19   | 23   | 24       | 17    |
| 6  | 26     | 21   | 25   | 25       | 19    |
| 7  | 30     | 23   | 27   | 26       | 21    |
| 8  | 34     | 25   | 29   | 27       | 23    |
| 9  | 38     | 27   | 31   | 28       | 25    |
| 10 | 42     | 29   | 33   | 29       | 27    |
| 11 | 46     | 31   | 35   | 30       | 29    |
| 12 | 50     | 33   | 37   | 31       | 31    |

c. If you need to input text as a character matrix, a few recipes that seem short are

s="hello\nworld\n foo"

# 43 bytes, returns "" padded data frame
# If lines > 5 are longer than lines <= 5, wraps around and causes error
read.csv(t=gsub("(?<=.)(?=.)",",",s,,T),,F)

# 54 bytes with readLines(), "" padded matrix
sapply(p<-readLines(),substring,p<-1:max(nchar(p)),p))

# plyr not available on TIO
# 58 bytes, returns NA padded matrix, all words split by whitespace
plyr::rbind.fill.matrix(Map(t,strsplit(scan(,"",t=s),"")))
# 61 bytes, returns NA padded matrix
plyr::rbind.fill.matrix(Map(t,(a=strsplit)(el(a(s,"\n")),"")))

J.Doe

Posted 2011-11-30T12:40:07.743

Reputation: 2 379

1scan has a text argument, which is more competitive than el(strsplit(x," ")) if you only need strings! Try it online! As opposed to your last suggestion of read.csv. – Giuseppe – 2018-10-05T18:13:16.853

If you just want characters, your call of scan is better up to 5 characters, el(strsplit(x,"")) is more competitive than scan for 6 or more. Try it online! I haven't yet found a good use for read.csv, but maybe it would be useful if you needed a data table for some reason?

– J.Doe – 2018-10-05T18:20:13.177

I've never found a use for a data.frame but maybe we need to find / create a challenge where it would be helpful! Maybe a dplyr style group_by() and summarize() type of manipulation? IDK. – Giuseppe – 2018-10-05T18:26:38.960

And for reading in strings scan(,"") still seems better? Try it online!

– J.Doe – 2018-10-05T18:26:41.880

Yeah for sure, although if you interpret an input format strictly as ngm does here then double scan is handy.

– Giuseppe – 2018-10-05T18:28:26.737

When you do need to use a function, use pryr::f() instead of function().

Example:

function(x,y){x+y}

is equivalent to

pryr::f(x,y,x+y)

or, even better,

pryr::f(x+y)

Since If there is only one argument, the formals are guessed from the code.

BLT

Posted 2011-11-30T12:40:07.743

Reputation: 931

Unless you can get it down to one argument (like in the third example), this isn't a golf, for function(x,y){x+y} can be written as function(x,y)x+y for the same bytecount as pryr::f(x,y,x+y) but with more readability. – Khuldraeseth na'Barya – 2018-03-01T23:20:41.737

1pryr::f(x+y) does work though – qwr – 2019-11-09T19:49:57.087

Alternatives to `rep()`

Sometimes rep() can be avoided with the colon operator : and R's vector recycling.

For repeating n zeroes, where n>0, 0*1:n is 3 bytes shorter than rep(0,n) and !1:n, an array of FALSE, is 4 bytes shorter, if the use case allows it.
To repeat x n times, x+!1:n is 2 bytes shorter than rep(x,n). For n ones, use !!1:n if you can use an array of TRUE.
To repeat x 2n+1 times, where n>=0, x+0*-n:n is 4 bytes shorter than rep(x,2*n+1).
The statement !-n:n will give a TRUE flanked on both sides by n FALSE. This can be used to generate even numbers of characters in calls to intToUtf8() if you remember that a zero is ignored.

Modular arithmetic can be useful. rep statements with the each argument can sometimes be avoided using integer division.

To generate the vector c(-1,-1,-1,0,0,0,1,1,1), -3:5%/%3 is 5 bytes shorter than rep(-1:1,e=3).
To generate the vector c(0,1,2,0,1,2,0,1,2), 0:8%%3 saves 4 bytes on rep(0:2,3).
Sometimes nonlinear transformations can shorten sequence arithmetic. To map i in 1:15 to c(1,1,3,1,1,3,1,1,3,1,1,3,1,1,3) inside a compound statement, the obvious golfy answer is 1+2*(!i%%3) for 11 bytes. However, 3/(i%%3+1) is 10 bytes, and will floor to the same sequence, so it can be used if you need the sequence for array indexing.

J.Doe

Posted 2011-11-30T12:40:07.743

Reputation: 2 379

Some ways to find the first non-zero element of an array.

If it has a name x:

x[!!x][1]

Returns NA if no non-zero elements (including when x is empty, but not NULL which errors.)

Anonymously:

Find(c, c(0,0,0,1:3))

Returns NULL if no non-zero elements, or empty or NULL.

ngm

Posted 2011-11-30T12:40:07.743

Reputation: 3 974

This will return NA if all elements of x are zero, I believe, so use it with caution! – Giuseppe – 2018-08-22T23:40:16.500

Find(c,x) is the same bytecount with: the advantage you don't need to repeat(define) x, and a different behavior if no match.TIO – JayCe – 2018-08-23T20:24:39.480

Find is also a little safer as it works on NULL, as long as nothing else needs to happen to the result, in which case I'm not sure if returning NA or NULL is safer. – ngm – 2018-08-23T20:36:26.023

oh that's right. the issue with returning NULL is errors... in the version comparison question I first tried sign(Find(c,w)) which caused errors - had to do Find(c,sign(w)) to get it not to error. I think both ways have their uses. – JayCe – 2018-08-23T20:42:10.763

Surviving challenges involving strings

As mentioned in another answer, unlist(strsplit(x,split="") and paste(...,collapse="") can be depressing. But don't just walk away from these, there are workarounds!

utf8ToInt converts a string to a vector, intToUtf8 does the reverse operation. You're getting a vector of int, not a vector of char but sometimes this is what you're looking for. For instance to generate a list of -, better useintToUtf8(rep(45,34)) than paste(rep("-",34),collapse="")
gsub is more useful than other function of the grep family when operating on a single string. The two approaches above can be combined as in this answer which benefited from the advice of ovs, Giuseppe and ngm.
Choose a convenient I/O format as in this answer taking input as lines of text (without quotes) or this one taking a vector of chars. Check with the OP when in doubt.
As pointed out in the comments, < compares strings lexicographically as one would expect.

JayCe

Posted 2011-11-30T12:40:07.743

Reputation: 2 655

1intToUtf8 also has a second argument multiple = FALSE which will convert from ints to individual characters (length-one strings) rather than a single string if set to TRUE. – Giuseppe – 2018-10-17T16:58:21.743

Also, starting in 3.5.0, there's a third argument allow_surrogate_pairs = FALSE , but I don't know what it does; the docs say something about reading two-bytes as a UTF-16 but I barely know what UTF-8 is so I'll just ignore it until someone else finds a way to golf with it. – Giuseppe – 2018-10-17T16:59:42.110

Tips for restricted source challenges :

Characters in R literals constants can be replaced by hex codes, octal codes and unicodes.

e.g. the string "abcd" can be written :
```
    # in octal codes
    "\141\142\143\144"

    # in hex codes
    "\x61\x62\x63\x64"

    # in unicodes
     "\u61\u62\u63\u64"
    # or
    "\U61\U62\U63\U64" 
```
We can also mix characters with octal/hex/unicode and use some oct codes and some hex codes together, as long as unicode characters are not mixed with octal/hex e.g. :
```
    # Valid
    "a\142\x63\x64"

    # Valid
    "ab\u63\U64"

    # Error: mixing Unicode and octal/hex escapes in a string is not allowed
    "\141\142\x63\u64"
```
See the end of this section for further details.
Since functions can be written using string literals, e.g. cat() can be written alternatively :
```
'cat'()
"cat"()
`cat`()
```
we can use octal codes, hex codes and unicode for function names as well :
```
# all equal to cat()
"\143\141\164"()
`\x63\x61\x74`()
'\u63\u61\u74'()
"ca\u74"()
```
with the only exception that unicode sequences are not supported inside backticks ``
Round brackets can be avoided abusing operators e.g. :
```
cat('hello')

# can be written as
`+`=cat;+'hello'
```

An application of all the three tricks can be found in this answer

digEmAll

Posted 2011-11-30T12:40:07.743

Reputation: 4 599

1Also, numbers can be written in hexadecimal: 0xB and 0xb return 11 (no need for backticks or quotes). – Robin Ryder – 2019-09-16T07:14:31.633

Do computation in default arguments

This can save on braces if the function can be reduced to one statement. A silly example

function(x,y,a=x*x+y*y,b=a+x+y)a*b+a+b

qwr

Posted 2011-11-30T12:40:07.743

Reputation: 8 929

Tips for golfing in R

Answers

Using if, ifelse, and `if`

The basics

Examples

Summary

Do-while loops in R

Implicit type conversion

Conversion to logical from numeric (4 bytes)

Conversion to character from numeric or logical (7 bytes)

Conversion to character from numeric or logical under certain conditions (6 bytes)

Conversion to numeric from logical (4 bytes)

Change the meaning of operators

Scenarios where you can avoid paste(...,collapse="") and strsplit

Alternatives to rep()

Surviving challenges involving strings

Tips for restricted source challenges :

Do computation in default arguments

Using `if`, `ifelse`, and `if`

Scenarios where you can avoid `paste(...,collapse="")` and `strsplit`

Alternatives to `rep()`