Regexp based editing of content in large CSV files

0

I wanted to remove certain pattern of contents based on a regex from csv file which is very large in size containing 100,000+ records, how is it possible via windows commandline? I also have sed and awk installed in my windows commandline

It hangs on opening in any spreadsheet processor or text editor(Including Notepad++).

suuser

Posted 2014-05-09T17:58:24.300

Reputation: 127

Question was closed 2014-05-12T09:47:10.967

No , I want to process , EDIT large files based on regexp – suuser – 2014-05-09T18:05:24.527

@techie007 Updated question – suuser – 2014-05-09T18:08:23.047

Many of the editors mentioned support Regex search/replace. If you're determined to do it from the command-line, then what have you tried already? Did you try using findstr? What were the results? – Ƭᴇcʜιᴇ007 – 2014-05-09T18:19:50.160

I tried various editors , but I dont know about findstr – suuser – 2014-05-09T18:21:39.630

What have you tried with sed/awk? Have you tried the usual recommended editors like notepad++? – m4573r – 2014-05-09T18:51:55.640

@Ali - FINDSTR is a non-starter. It has crippled regex capability, it has many bugs, and it cannot replace content. – dbenham – 2014-05-09T19:45:19.820

I did try notepad++ but it crashes @m4573r – suuser – 2014-05-10T04:20:20.103

Answers

2

You might want to try out Google Refine.

It can do complex refactoring of CSV's using several methods including REGEX.

I've used it to cleanse and manipulate very large user databases in CSV form.

Julian Knight

Posted 2014-05-09T17:58:24.300

Reputation: 13 389