Randomly shuffle rows in a large text file

11

2

I have a text file of ~1GB with about 6k rows (each row is very long) and I need to randomly shuffle its rows. Is it possible? Possibly with awk?

ddmichael

Posted 2014-05-30T16:05:01.457

Reputation: 337

Answers

19

You can use the shuf command from GNU coreutils. The utility is pretty fast and would take less than a minute for shuffling a 1 GB file.

The command below might just work in your case because shuf will read the complete input before opening the output file:

$ shuf -o File.txt < File.txt

Suraj Biyani

Posted 2014-05-30T16:05:01.457

Reputation: 316

Thanks, I forgot to mention I am on OSX, any equivalents? – ddmichael – 2014-05-30T16:59:04.030

5@ddmichael Run brew install coreutils and use /usr/local/bin/gshuf. – Lri – 2014-05-30T17:13:54.243

2@ddmichael Alternatively for OS X you can use this Perl one liner. Got this one one of the old blogs. Did a quick test and found working.

cat myfile | perl -MList::Util=shuffle -e 'print shuffle(<STDIN>);'

I am note sure how fast would it run though – Suraj Biyani – 2014-05-30T18:23:22.493

4

Python one-liner:

python -c 'import sys, random; L = sys.stdin.readlines(); random.shuffle(L); print "".join(L),'

Reads all the lines from the standard input, shuffles them in-place, then prints them without adding an ending newline (notice the , from the end).

Cristian Ciupitu

Posted 2014-05-30T16:05:01.457

Reputation: 4 515

2

For OSX the binary is called gshuf.

brew install coreutils
gshuf -o File.txt < File.txt

ishandutta2007

Posted 2014-05-30T16:05:01.457

Reputation: 121

1

If like me you came here to look for an alternate to shuf for macOS then use randomize-lines.

Install randomize-lines(homebrew) package, which has an rl command which has similar functionality to shuf.

brew install randomize-lines

Usage: rl [OPTION]... [FILE]...
Randomize the lines of a file (or stdin).

  -c, --count=N  select N lines from the file
  -r, --reselect lines may be selected multiple times
  -o, --output=FILE
                 send output to file
  -d, --delimiter=DELIM
                 specify line delimiter (one character)
  -0, --null     set line delimiter to null character
                 (useful with find -print0)
  -n, --line-number
                 print line number with output lines
  -q, --quiet, --silent
                 do not output any errors or warnings
  -h, --help     display this help and exit
  -V, --version  output version information and exit

Ahmad Awais

Posted 2014-05-30T16:05:01.457

Reputation: 111

0

I forgot where I found this, but here's the shuffle.pl that I use:

#!/usr/bin/perl -w

# @(#) randomize Effectively _unsort_ a text file into random order.
# 96.02.26 / drl.
# Based on Programming Perl, p 245, "Selecting random element ..."

# Set the random seed, PP, p 188
srand(time|$$);

# Suck in everything in the file.
@a = <>;

# Get random lines, write 'em out, mark 'em done.
while ( @a ) {
        $choice = splice(@a, rand @a, 1);
        print $choice;
}

Icydog

Posted 2014-05-30T16:05:01.457

Reputation: 1 127

0

At least in ubuntu, there's a program called shuf

shuf file.txt

Gonzo

Posted 2014-05-30T16:05:01.457

Reputation: 147

That program is part of coreutils, as mentioned by Suraj Biyani.

– Cristian Ciupitu – 2014-05-30T17:27:50.160