How to apply regex to large text file?

2

1

I have large text files (some mega) example and I would like to know how if there is a more efficient way to apply regex than to PyCharm or SublimText 2 on Mac OSX.

Thanks.

Comment: I want to replace stuff not only search. An example would be welcome.

Alexis Benoist

Posted 2014-11-25T15:17:39.823

Reputation: 123

Answers

3

Most efficient way to search is grep or perhaps ag, like this:

grep -E "pattern" files

Most efficient way to replace is sed, like this:

sed -e "s/pattern/replacement/g" <input.txt >output.txt

However, these methods require you using command line, not fancy GUI.

UPDATE

After looking into the file you linked, I realize that using grep or any other text based utilities is actually wrong approach - this file is 150MB uncompressed, and is actually CSV data. Instead, I recommend to import this CSV data into some kind of database. For your purposes, I think SQLite would work best, but you can also use bigger databases like PostgreSQL or MySQL. Key to get very fast searches is to create indexes on field(s) being searched for.

mvp

Posted 2014-11-25T15:17:39.823

Reputation: 3 705

0

a more efficient way to apply regex to large text files?

The most efficient way I know of is grep search_expression hugefile

than PyCharm or SublimText 2 on Mac OSX

Those are text editors, not all text editors are optimized for searching in large text files. It isn't their main job. They might perform a lot of parsing (e.g. for syntax detection and highlighting) and other work that isn't relevant to your task.

It is often the case that small specialized tools can outperform more general purpose tools.

RedGrittyBrick

Posted 2014-11-25T15:17:39.823

Reputation: 70 632