Split one file into multiple files

2

1

I have one file which is in Json format like below:

    {
        "sources":[{
        "field1":1000,
        "field2":"winevent_log",
        "field3":"winevent_log",
        "field4":"os_security",
        "field5":true,
        "field6":false,
      },{
        "field1":1001,
        "field2":"winperf_cpu",
        "field3":"winperf_cpu",
        "field4":"os_perf",
        "field5":false,
        "field6":false,
      },{
        "field1":1002,
        "field2":"winperf_disk",
        "field3":"winperf_disk",
        "field4":"os_perf",
        "field5":false,
        "field6":false,
      },{
        "field1":1003,
        "field2":"winperf_mem",
        "field3":"winperf_mem",
        "field4":"OS_perf",
        "field5":false,
        "field6":false,
      }
    }

I'm trying to split it into different files based on the delimiter. I would like to see 4 different files like below:

file 1:

    {
        "field1":1000,
        "field2":"winevent_log",
        "field3":"winevent_log",
        "field4":"os_security",
        "field5":true,
        "field6":false,
    }

file 2:

    {
        "field1":1001,
        "field2":"winperf_cpu",
        "field3":"winperf_cpu",
        "field4":"os_perf",
        "field5":false,
        "field6":false,
    }

And so on and so forth.

I tried using the csplit and awk commands:

    csplit input_file '/"id"/' '{*}'
    awk '/,{/{n++}{print >"out" n ".json" }' input_file

But haven't gotten the output_files they way I expected them to be because the delimiter is spread across multiple lines and starts in middle of one line.

Does anyone know how to use awk or csplit in such a way that the start delimiter is "{ newline "field1"" while the ending delimiter is "},"

Arun

Posted 2016-07-14T21:10:35.877

Reputation: 61

You can do this with standard gnu utilities, and I'm sure you'll get a bunch of options soon, but take a look at this project. It is awk-like tool for json files. Its very effective at what you are trying to do. https://github.com/micha/jsawk

– Argonauts – 2016-07-14T21:51:17.397

You forgot to mention that you do not want to split the file on some delimiter, you want to extract a random part of the (tree-structured) file and split that on some delimiter. – Michael Vehrs – 2016-07-15T08:56:45.727

I would use jq --compact-output to flatten the JSON then split by line – Neil McGuigan – 2016-08-11T20:47:23.637

Answers

1

Assuming the delimiter in your case is },{, you can use ex editor (part of Vim) to split the file, for example:

ex +%j +'%s/},{/},\r{/g' +'g/./exe ".w! file".line(".").".txt"' -scq! -V1 file.txt

which will join all the lines (%j), substitute (%s) the content by replacing },{ with a new line (\r), then write each line into separate file based on the current line number. The downside is that your component part is in one line, but you can unsplit it again by the comma. For more details, check: How to write each line into separate file?

kenorb

Posted 2016-07-14T21:10:35.877

Reputation: 16 795

1

Use of a range address (like in sed) and put back { and } with sprintf

awk '/field1/,/field6/ {if ($0 ~ /field1/) {i++;$0=sprintf("    {\n%s",$0)}; if ($0 ~ /field6/) {$0=sprintf("%s\n    }",$0)}; print > ("file" i)}' input_file

There are some static strings, if they will change then you can replace with Regex.

Paulo

Posted 2016-07-14T21:10:35.877

Reputation: 606