Using sed to compact JSON arrays

3

1

I have a JSON output in the following format:

{
  "DaysCfg": {
    "Range": {
      "lowerDate": "2017-07-28T00:00:00.000-04:00",
      "upperDate": "2017-08-04T00:00:00.000-04:00"
    },
    "DaysInPeriod": 8,
    "DaysToSchedule": [
      0,
      1,
      2,
      3,
      4,
      5,
      6
    ]
  },
  "DepartmentsID": [
    138837,
    139734,
    141934,
    142436,
    149687,
    151049
  ],
  "EmployeesID": [
    5039,
    5170,
    5889,
    6051,
    6236,
    7208,
    7281,
    8776,
    8781,
    8936,
    9261
  ],
  "EndDate": "2017-08-03T23:59:00.000-04:00",
  "IntervalSize": 15,
  "IsActivitiesEnabled": true,
  "ModifyExisting": false,
  "OrignId": 134721,
  "PrimaryOption": 0,
  "SchoolDays": [],
  "ScChanges": [],
  "StartDate": "2017-07-28T00:00:00.000-04:00",
  "ZonesToSchedule": [
    5,
    4,
    6,
    3,
    3,
    3,
    2,
    14
  ]
}

Since I can't change the program that output it, I have to use sed (or awk) to compact JSON arrays myself. The desirable output would be:

{
  "DaysCfg": {
    "Range": {
      "lowerDate": "2017-07-28T00:00:00.000-04:00",
      "upperDate": "2017-08-04T00:00:00.000-04:00"
    },
    "DaysInPeriod": 8,
    "DaysToSchedule": [0, 1, 2, 3, 4, 5, 6]
  },
  "DepartmentsID": [138837, 139734, 141934, 142436, 149687, 151049],
  "EmployeesID": [5039, 5170, 5889, 6051, 6236, 7208, 7281, 8776, 8781, 8936, 9261],
  "EndDate": "2017-08-03T23:59:00.000-04:00",
  "IntervalSize": 15,
  "IsActivitiesEnabled": true,
  "ModifyExisting": false,
  "OrignId": 134721,
  "PrimaryOption": 0,
  "SchoolDays": [],
  "ScChanges": [],
  "StartDate": "2017-07-28T00:00:00.000-04:00",
  "ZonesToSchedule": [5, 4, 6, 3, 3, 3, 2, 14]
}

I have tried to come up a sed script myself, but it is only half-cooked and not fully working:

sed -r -e :a -e '/^ *[]}],*$/!N; /": \[/s/\n +//; ta' -e 'P;D'

Please Help. Thx.

xpt

Posted 2017-07-28T17:01:37.580

Reputation: 5 548

1

I think sed isn't the best tool for this problem. You should try some JSON-parser/formatter, for example jq (https://stedolan.github.io/jq/). Maybe you want to check https://stackoverflow.com/questions/9105031/how-to-beautify-json-in-python-or-through-command-line and https://stackoverflow.com/questions/352098/how-can-i-pretty-print-json-in-unix-shell-script

– uzsolt – 2017-07-28T19:46:24.373

FYI, the output that I wanted to work on, is created exactly by jq, a C program, thus I'm not interested in any Python solutions. @uzsolt. What you see here is only a selected representation of of 4~6M output that I'm having. If you don't understand the above sed command, then it is inappropriate for you to make the judgment whether sed is the best tool or not. – xpt – 2017-07-28T20:14:11.807

So, what do you want exactly? You want delete '\n' (after a comma) if we are inside brackets ([ and ]). If you want do it with sed just do it! It isn't impossible but "FYI" there are better tools. I'm curious about the sed-way solution. Go for it! (Someone downvoted your question - FYI not me) – uzsolt – 2017-07-29T12:34:34.677

Thanks for the input @uzsolt. OK, I gotya. I'll forget about sed and do it in awk then. – xpt – 2017-07-29T13:44:41.520

The down-voting shows nothing but there are narrow minded and mean people out in the wild. I'll do it in awk and post back. – xpt – 2017-07-29T13:48:03.823

I think awk is better in this case :) – uzsolt – 2017-07-29T19:54:34.970

Answers

5

I edited your sed, hope this helps.

sed -r '/\[$/ {:a;N;s/\]/&/;Ta;s/\n +//g}'

sed -r '

# sed will apply the commands between '{}' only to lines that matches the address '/\[$/'.
/\[$/ {

# Set a mark with label 'a'.
:a

# N command, it appends a '\n' to the pattern space,
# reads the next line of the input (file,stdin) and appends it to the pattern space.
N

# Substitute ']' for itself. If the substitution isn't made (if there isn't a ']' on the
# pattern space), the 'T' command jumps to the 'a' label.
# Here is the loop to put some lines (or all lines of a file) in the same line.
# While there isn't a ']' in the pattern space (which is the last line OP wants to put
# on the same line), sed will append '\n<next line>' to the pattern space.
s/\]/&/
Ta

# When the substitution is made, sed leaves the loop and applies other commands.
# Substitute all occurrences (g flag) of new line character (with any
# spaces after) for nothing.
s/\n +//g
}'

Paulo

Posted 2017-07-28T17:01:37.580

Reputation: 606

OMG! That's amazing. I've been dissuaded by people to have given up on sed, but deep down in my mind I still believe sed can do it, and ... BANG! here is your unbelievable simple solution, which works flawlessly!!! Please take my +50 points in total as a warm welcome to superuser! – xpt – 2017-07-31T02:53:16.233

I think I understand almost everything, but how the loops terminates. Would you elaborate please? – xpt – 2017-07-31T02:54:11.230

Now I wish I've had more down-votes for this question -- obviously they down-voted because they don't believe there will be a sed solution, even in their wildest imaginations. To all those people, this excellent answer shows how narrow minded you are in your face. – xpt – 2017-07-31T03:03:53.410

I can only award my +50 bounty in 23 hours, so take my +25 points for now. – xpt – 2017-07-31T03:08:35.770

I figured it out myself from the man T label: "If no s/// has done a successful substitution since the last input line was read and since the last t or T command, then branch to label" – xpt – 2017-07-31T03:20:44.767

Excuse me I don't know much how the "points stuff" works but thank you very much for the points.Finally I have enough points to comment :) thanks. Yes you're right, the 'T' command does the job for jumping to the 'a' label if there isn't any ']' in the pattern space. All the commands are applied only when sed finds a line with the address '/[$/' , other lines are printed without any edition. – Paulo – 2017-07-31T03:21:24.180

1Nice answer, but I suggest you add an explanation section with a breakdown of the command in order to illustrate how it works. It appears you are using some sed functionality that not everybody is familiar with. – simlev – 2017-07-31T07:25:36.823

While this may answer the question, it would be a better answer if you could provide some explanation why it does so. – DavidPostill – 2017-07-31T07:31:58.697

Edited to add explanation. See 'info sed' for GNU-sed commands. ps- excuse me for English errors. – Paulo – 2017-07-31T12:48:29.307