What does really mean null string?

2

I'm sure most of us know that when we want that some regex matches with a string at the begin of the line we must use "^" ...

But, I'm trying to understand what really means "empty string at the begin of the line"

I know that echo "Hello World" | grep ^H it matches

So please take a look the output of these commands:

[sergio@localhost ~]$ dd if=/dev/zero of=/tmp/texto  count=1 bs=1 2>/dev/null
[sergio@localhost ~]$ od -ta /tmp/texto          
0000000 nul
0000001

So far everything as expected, so:

[sergio@localhost ~]$ echo  "Hello" >> /tmp/texto
[sergio@localhost ~]$ grep -a "^Hello" /tmp/texto 

Well the first thing I must confess didn't expected, before of Hello there are a null character, so why is it not matching?

OK, let's use grep in perl style:

[sergio@localhost ~]$ grep -a -P "\x00Hello" /tmp/texto 
Hello

OK, It matches

But what I don't understand (perhap I have some misconcept) why grep -a "^Hello" does not match...

Could you help me?

thanks in advance!

sebelk

Posted 2013-09-04T12:38:14.243

Reputation: 217

1Sorry, I was mistaking null with empty string!! – sebelk – 2013-09-04T12:44:57.870

Answers

5

You are confusing the null character (binary value 0) with the empty string.

The "empty string at the beginning of the line" is simply the non-content (that exists) before the first character of the line. The empty string at the end of the line, similarly, is the similar non-content found after the last character of the line. An empty string can be thought of as consisting of "empty string, empty string" whereas a string with some content can be thought of as "empty string, Hello world, empty string".

A string in C is represented as a series of non-0 bytes followed by one or more 0 bytes which indicates string termination, but that is completely separate from "the empty string" in regular expression parlace and really largely an internal choice in the C programming language and standard library. Few other languages do it that way, but even so can represent empty strings.

a CVn

Posted 2013-09-04T12:38:14.243

Reputation: 26 553

2

A "null string" is not the same as a "null character". A null string is the empty string, "". The null character is a character with all bits set to 0. It can be represented in double quotes as \0, so the result of your dd command was "\0". The append then made it "\0Hello". This was not a null string. The pattern "^Hello" only matches lines that begin with "Hello", which yours did not, because it began with "\0", instead of "H".

wingedsubmariner

Posted 2013-09-04T12:38:14.243

Reputation: 1 432

0

Imaginary string of five characters, followed by two different strings:

^  _ _ _ _ _  $
^  H e l l o  $
^ \0 H e l l  $
  • The dollar sign and the circumflex do not match any characters, they match the boundaries of a string.
  • null (\0) is a real character and takes space just as abcd...

So "^H" would not match "\0abcd" (just as "^Z" would not match "abcd"), because "\0abcd" starts with the null character and not "H".

Ярослав Рахматуллин

Posted 2013-09-04T12:38:14.243

Reputation: 9 076