11
2
I have a text file of ~1GB with about 6k rows (each row is very long) and I need to randomly shuffle its rows. Is it possible? Possibly with awk?
11
2
I have a text file of ~1GB with about 6k rows (each row is very long) and I need to randomly shuffle its rows. Is it possible? Possibly with awk?
19
You can use the shuf
command from GNU coreutils. The utility is pretty fast and would take less than a minute for shuffling a 1 GB file.
The command below might just work in your case because shuf
will read the complete input before opening the output file:
$ shuf -o File.txt < File.txt
4
Python one-liner:
python -c 'import sys, random; L = sys.stdin.readlines(); random.shuffle(L); print "".join(L),'
Reads all the lines from the standard input, shuffles them in-place, then prints them without adding an ending newline (notice the ,
from the end).
2
For OSX the binary is called gshuf
.
brew install coreutils
gshuf -o File.txt < File.txt
1
If like me you came here to look for an alternate to shuf
for macOS then use randomize-lines
.
Install randomize-lines
(homebrew) package, which has an rl
command which has similar functionality to shuf
.
brew install randomize-lines
Usage: rl [OPTION]... [FILE]...
Randomize the lines of a file (or stdin).
-c, --count=N select N lines from the file
-r, --reselect lines may be selected multiple times
-o, --output=FILE
send output to file
-d, --delimiter=DELIM
specify line delimiter (one character)
-0, --null set line delimiter to null character
(useful with find -print0)
-n, --line-number
print line number with output lines
-q, --quiet, --silent
do not output any errors or warnings
-h, --help display this help and exit
-V, --version output version information and exit
0
I forgot where I found this, but here's the shuffle.pl
that I use:
#!/usr/bin/perl -w
# @(#) randomize Effectively _unsort_ a text file into random order.
# 96.02.26 / drl.
# Based on Programming Perl, p 245, "Selecting random element ..."
# Set the random seed, PP, p 188
srand(time|$$);
# Suck in everything in the file.
@a = <>;
# Get random lines, write 'em out, mark 'em done.
while ( @a ) {
$choice = splice(@a, rand @a, 1);
print $choice;
}
0
At least in ubuntu, there's a program called shuf
shuf file.txt
That program is part of coreutils, as mentioned by Suraj Biyani.
– Cristian Ciupitu – 2014-05-30T17:27:50.160
Thanks, I forgot to mention I am on OSX, any equivalents? – ddmichael – 2014-05-30T16:59:04.030
5@ddmichael Run
brew install coreutils
and use/usr/local/bin/gshuf
. – Lri – 2014-05-30T17:13:54.2432@ddmichael Alternatively for OS X you can use this Perl one liner. Got this one one of the old blogs. Did a quick test and found working.
cat myfile | perl -MList::Util=shuffle -e 'print shuffle(<STDIN>);'
I am note sure how fast would it run though – Suraj Biyani – 2014-05-30T18:23:22.493