How to search a string starting from second column

I have a file that contains comma separated strings. The strings may contain dots (i.e. not just alphanumeric characters). This is an example:

site1.com,Level1.2
site2.com,Level1.1,Level1.0,Level1.2
site3.com,Level1.2
site4.com,Level1.2,Level1.1,Level1.0,Levelv3
siteLevel1.2,Levelv2
Level1.2,Levelv2

I need to do search for the site names (please note that I do not have specific format for site name, i.e. it does not end with .com always so I should not conside how the first column look like)

I need the sites that ONLY contains specific string. In this example, Level1.2 exclusively (without Level1.1 nor Level1.0 not Level3 either before or after). Then print the result in a new file that matches the condition (only contains Level1.2). So the search key words are starting from the second column (I do not want search result that finds matching pattern in the site name).

So if I'm searching for Level1.2, the new file should contain:

site1.com,Level1.2
site3.com,Level1.2

But my command result in:

site1.com,Level1.2
site3.com,Level1.2
siteLevel1.2,Levelv2
Level1.2,Levelv2

If there is a site that contains Level1.2 in its name, it should not be counted as I do not care about the first column.

I tried this command and it works for me. The only thing is that I need the searching to ignore the occurrence of the search string in the first column.

awk '/Level1.2/ && !/Level1.1/ && !/Level1.0/ !/Level3/' myfile.txt > result.txt

user9371654

Posted 2018-06-06T13:36:29.833

Reputation: 647

Can't you just do grep 'Level1.2$' myfile.txt? – Arkadiusz Drabczyk – 2018-06-06T13:44:18.490

What is the $ for? and with grep, if it finds it in a line with other levels: Level1.1, I do not want it to count. I want to count it if it is the only one in the line (after the site name) without others. – user9371654 – 2018-06-06T13:49:52.623

$ means end of line. How about: grep -E '^site[0-9]+\.com,Level1.2$' myfile.txt? – Arkadiusz Drabczyk – 2018-06-06T13:52:28.710

But end of line does not mean Lev1.2 is the only one in the line. It can be preceded by others. In this cas I do not want it. What I want is to make sure that Level1.2 is the only one strating from column2 onwards. i.e. I need to exclude the occurrence of any other level after column2. – user9371654 – 2018-06-06T13:54:57.373

Try the second grep command I posted. – Arkadiusz Drabczyk – 2018-06-06T13:55:50.590

Sorry it does not do the purpose. The first column is not fixed format. I can not use ^site[0-9]+\.com. My command is correct just please point to me how to search starting from the second column if you know. Thanks. – user9371654 – 2018-06-06T14:02:49.757

Answers

You can try this awk:

awk -F, '$2=="Level1.2" && NF==2' myfile.txt

The input delimiter is set to ,. The command prints lines containing 2 fields with the second one having the matching string.

oliv

Posted 2018-06-06T13:36:29.833

Reputation: 321

I need to use my command. I corrected a typo now. It does the purpose by excluding any other string. I just need to start searching from second column (in a comma separated columns, i.e. after the first comma) because the search string (Level1.2 can occur in the first column and I do not care about the first column). Can you help me in this? – user9371654 – 2018-06-06T13:57:38.527

@user9371654 Please update your question with an example that includes all possible cases you want to target. – oliv – 2018-06-06T14:01:59.263

updated. Please just help me how to start searching from the second column (i.e. I want to ignore the search string occurrence in the first column). – user9371654 – 2018-06-06T14:07:26.667

even if you have another solution, I am comfortable with my pattern as I understand it. Just need to know how to search from 2nd column. – user9371654 – 2018-06-06T14:08:17.123

@user9371654 My script still works with your updated example... – oliv – 2018-06-06T14:19:59.610

The following works:

grep '^[^,]*,Level1\.2' myfile.txt | grep -v ',Level.*Level'

This skips the first field and its trailing comma, then looks for a match with Level1.2; the result is then filtered by ignoring all records with a subsequent Level (any Level in the first field will not have a preceding comma).

I have assumed that other text can be appended to Level1.2, provided it does not contain a Level string. If this is not true, then you can use the simpler:

grep '^[^,]*,Level1\.2$' myfile.txt

AFH

Posted 2018-06-06T13:36:29.833

Reputation: 15 470

(My answer crossed with your acceptance of oliv's answer, but I am leaving it, as it offers an alternative approach and it may be more adaptable to related search problems.) – AFH – 2018-06-06T14:33:47.817