C
My homework assignment is take a string and split it into pieces at every new line. I have no idea what to do! Please help!
Tricky problem for a beginning C programming class! First you have to understand a few basics about this complicated subject.
A string is a sequence made up of only characters. This means that in order for programmers to indicate an "invisible" thing (that isn't a space, which counts as a character), you have to use a special sequence of characters somehow to mean that invisible thing.
On Windows, the new line is a sequence of two characters in the string: backslash and n (or the string "\n"
)
On Linux or OS/X Macs, it is a sequence of four characters: backslash, n, backslash, and then r: (or "\n\r"
).
(Interesting historical note: on older Macintoshes it was a different sequence of four characters: "\r\n"... totally backwards from how Unix did things! History takes strange roads.)
It may seem that Linux is more wasteful than Windows, but it's actually a better idea to use a longer sequence. Because Windows uses such a short sequence the C language runtime cannot print out the actual letters \n
without using special system calls. You can usually do it on Linux without a system call (it can even print \n\
or \n\q
... anything but \n\r
). But since C is meant to be cross platform it enforces the lowest common-denominator. So you'll always be seeing \n
in your book.
(Note: If you're wondering how we're talking about \n
without getting newlines every time we do, StackOverflow is written almost entirely in HTML...not C. So it's a lot more modern. Many of these old aspects of C are being addressed by things you might have heard about, like CLANG and LLVM.)
But back to what we're working on. Let's imagine a string with three pieces and two newlines, like:
"foo\nbaz\nbar"
You can see the length of that string is 3 + 2 + 3 + 2 + 3 = 13. So you have to make a buffer of length 13 for it, and C programmers always add one to the size of their arrays to be safe. So make your buffer and copy the string into it:
/* REMEMBER: always add one to your array sizes in C, for safety! */
char buffer[14];
strcpy(buffer, "foo\nbaz\nbar");
Now what you have to do is look for that two-character pattern that represents the newline. You aren't allowed to look for just a backslash. Because C is used for string splitting quite a lot, it will give you an error if you try. You can see this if you try writing:
char pattern[2];
strcpy(pattern, "\");
(Note: There is a setting in the compiler for if you are writing a program that just looks for backslashes. But that's extremely uncommon; backslashes are very rarely used, which is why they were chosen for this purpose. We won't turn that switch on.)
So let's make the pattern we really want, like this:
char pattern[3];
strcpy(pattern, "\n");
When we want to compare two strings which are of a certain length, we use strncmp
. It compares a certain number of characters of a potentially larger string, and tells you whether they match or not. So strncmp("\nA", "\nB", 2)
returns 1 (true). This is even though the strings aren't entirely equal over the length of three... but because only two characters are needed to be.
So let's step through our buffer, one character at a time, looking for the two character match to our pattern. Each time we find a two-character sequence of a backslash followed by an n, we'll use the very special system call (or "syscall") putc
to put out a special kind of character: ASCII code 10, to get a physical newline.
#include "stdio.h"
#include "string.h"
char buffer[14]; /* actual length 13 */
char pattern[3]; /* actual length 2 */
int i = 0;
int main(int argc, char* argv[]) {
strcpy(buffer, "foo\nbar\nbaz");
strcpy(pattern, "\n");
while (i < strlen(buffer)) {
if (1 == strncmp(buffer + i, pattern, 2)) {
/* We matched a backslash char followed by n */
/* Use syscall for output ASCII 10 */
putc(10, stdout);
/* bump index by 2 to skip both backslash and n */
i += 2;
} else {
/* This position didn't match the pattern for a newline */
/* Print character with printf */
printf("%c", buffer[i]);
/* bump index by 1 to go to next matchable position */
i += 1;
}
}
/* final newline and return 1 for success! */
putc(10, stdout);
return 1;
}
The output of this program is the desired result...the string split!
foo
baz
bar
\t
is for \trolling...
Absolutely incorrect from the top to the bottom. Yet filled with plausible-sounding nonsense that has scrambled information like what's in the textbook or Wikipedia. Program logic appears transparent in the context of the misinformation, but is completely misleading. Even global variables and returning an error code, for good measure...
...
Of course, there's only one character in the C string representation of the two-character source literal sequence \n
. But making a buffer larger is harmless, as long as strlen()
is used to get the actual length to operate on.
...
We try to convince the reader that strncmp
is a boolean operation that either matches (1) or doesn't (0). But it actually has three return values (-1 matching less, 0 for equal, 1 for matching greater). Our two character "pattern" being compared is not [\
, n
], but rather [\n
, \0
]...picking up the implicit null terminator. As that sequence slides through the string it will never be greater than a two-character sequence it's compared to...at best it will be zero if there is a terminating newline in the input string.
...
So all this does is loop through the string and print it one character at a time. The top branch never runs. (Though you could get it to if your string had lower-than \n
codes in it, say tab...which could be used to mysteriously omit characters from the output :-P)
1Is this a [tag:popularity-contest] or [tag:code-golf]? – osvein – 2013-12-30T22:55:53.457
@user1981338, neither, read the wiki of the code-trolling tag. – Turion – 2013-12-31T14:32:47.150
7
Here's a valuable resource I found regarding string splitting... I hope you find it useful!
http://bit.ly/1dSklhO
Code-trolling is in the process of being removed, as per the official stance. This post recieved over 75% "delete" votes on the poll. It does have a large amount of votes on the question and the answers, but it is over 3 months old and no reputation will be lost. Therefore, I am closing this and will delete it in 24 hours. Note that since this is an outlier in that it has a large amount of votes, I'll be happy to undelete and lock given a convincing argument on meta.
– Doorknob – 2014-05-12T12:22:19.003@Doorknob, this is not a question to be deleted according to your accepted answer in the linked official stance. It has 44 answers and 21 votes, which is quite popular. As for the poll, I wasn't even aware of such a poll existing until now. I'm not going into spending time on writing another answer on meta pro code-trolling since it's obvious that exactly the meta-users are opposed to code-trolling whereas a sizeable part of codegolf users isn't. Closing this question is an excellent idea, but deleting it is in my opinion unnecessary and unhelpful. – Turion – 2014-05-12T12:54:39.377
Thanks for your input, @Turion! As I said, this post is... different - it does have many votes, but the community wants (in the poll and as expressed in chat) it to be deleted, as it is very underspecified. If you'd like, you can join us in chat now to discuss this.
– Doorknob – 2014-05-12T12:58:07.530After further consideration, according to input in chat and on meta, this post is being locked instead. – Doorknob – 2014-05-13T12:16:11.753