1
I have a file that starts like this
## CONFIG-PARAMS-START ##
##
## text1 text2 NNNNNNNNN (arbitrary_comment) ##
## text1 text2 NNNNNNNNN (arbitrary_comment) ##
## text1 text2 NNNNNNNNN (arbitrary_comment) ##
##
## CONFIG-PARAMS-END ##
<arbitrary rest of file>
Output:
I'd like to validate the file with awk
or perl
, to check that it starts this way.
If yes, output just the data lines (not the start/end, or "bare" lines, or anything after this section), and if no, return a nonzero rc [$?] or some other easily testable condition such as [empty string].
File spec:
In modern (PRCE) regex terms, the data lines format is:
^##[[:space:]]* - starts with ## and optional spaces
(([a-zA-Z0-9_-]+\.)+) - >=1 repetition of [text_string][dot] (no spaces)
[[:space:]]+ - spaces
([^[:space:]]+) - block of non-spaces
[[:space:]]+ - spaces
([0-9]+) - block of digits
[[:space:]] - spaces
\(.* - '(' + any text
##[[:space:]]*$ - 2 hashes, optional spaces + line end
( so a typical line might be ## abc.3ef. w;4o8c-uy3tu!ae 9938 (good luck!)##
)
There mustn't be any other lines (including empty/whitespace lines) before the first line, or anywhere else in the data block. Within each line, consecutive white space effectively acts as a single delimiter. White space after the first ## and before+after the last ## are all optional. There will typically be <15 lines in the section so size/speed/efficiency will be negligible considerations.
(The greedy capture on the 2nd last line isn't an issue, it'll backtrack minimally to match '##' in the final line)
Compatibility:
Wide compatibility is important, as the code will eventually need to be runnable on default/standard builds of different Linux, FreeBSD + other BSDs, maybe even other modern *nix platforms. (It's part of a patch for a widely used open-source package). Perhaps basic POSIX would provide a level field rather than assuming only some specific awk
/perl
variant? Maintainability/ease of understanding is also useful for the same reason. Hoping greatly to avoid perl ;-) removed this last, see comments
I haven't got the hang of using any text processing method for this sort of forward-and-backward referencing and checking, and even less an idea about managing compatibility / slight differences in implementations.
Awk/perl skills would be appreciated to get a working version of this snippet!
Judging from how competently you expose the problem, it seems you already are knowledgeable enough to solve this. Please tell us exactly where you got stuck and post what you have so far. – simlev – 2019-05-10T09:01:38.687
1-1 for Hoping greatly to avoid perl ;-) Perl is more consistent than, say, AWK between e.g. Linux and FreeBSD. For this kind of job, I'd go with Perl any day (or Python, or PHP) or any language that you are comfortable with and provides solid built-in PCRE support while allowing you to write clear code). – simlev – 2019-05-10T09:08:14.013
@simlev - I'm competent with PCRE regex, and understanding of the problem + its requirements. But I've never used awk or perl in my life, and have zero knowledge - literally - of either. (Which is kinda where I'm stuck, to answer your question). I'm happy to accept your advice on perl, but the syntax appears incomprehensible in examples. But maybe I prejudged - I guess regex must have seemed that way, once, long ago. So scrap that concern, and thank you. Maybe this is where I first dabble in perl? But the question remains, how do I solve this problem? – Stilez – 2019-05-10T15:30:14.273
I also upvoted your comment, I think on reflection you're right to haul me up if there's an appropriate and widely used tool that through ignorance and newness, I've excluded from my thinking. Question edited. – Stilez – 2019-05-10T15:37:45.567