Why are special characters such as "carriage return" represented as "^M"?

93

12

Why is ^M used to represent a carriage return in VIM and other contexts?

My guess is that M is the 13th letter of the Latin alphabet and a carriage return is \x0D or decimal 13. Is this the reason? Is this representation documented anywhere?

I notice that Tab is represented by ^I, which is the ninth letter of the Latin alphabet. Conversely, Tab is \x09 or decimal 9, which supports my theory stated above. However, where might this be documented as fact?

dotancohen

Posted 2014-06-05T08:31:31.843

Reputation: 9 798

1Also keep in mind that dos/windows use "0x0d 0x0a", also noted as "CR LF". But unix/linux use only "0x0a" or "LF". So when you open a windows document in linux it detects extra "CR", and when you open a linux document in windows it doesn't detect new lines. – LatinSuD – 2014-06-05T08:47:03.667

3@LatinSuD caret notation (and corresponding use of the Ctrl-key) relates to the C0 control set (historically part of ASCII) directly and not whether and how a given operating system or program uses part of that set in representing new lines, or anything else. Similarly, whether ^H deletes a character or allows overprinting (such as n^H~ as an obsolete way to produce ñ) or any other actual use of the control character is separate from the caret notation. – Jon Hanna – 2014-06-05T12:05:28.250

11old one ... I can't remember the original code, but ctrl-G rings a bell! – Brian Drummond – 2014-06-05T13:28:57.913

the ^M you see when in linux (which uses "0x0a"(LF)) is probably from a file made on windows (which uses "0x0d 0x0a" (CR LF)). Thus, at the end of each line, you see the extra "0x0d" (CR). (the 0x0a being interpreted as a newline, and not shown in vi (well, it is : the next line will have a "~" if the previous line didn't end with a Newline). So the the ^M is not exactly a "carriage return", it's part of what a carriage return is in windows. The Answer tells why it's represented that way (using Caret Notation, ^@ = 0x00, ^A=0x01, etc..., ^M=0x0d, ...) – Olivier Dulac – 2014-06-05T14:47:58.107

3@OlivierDulac no, the ^M is exactly a carriage return, just like ^J is exactly a line-feed. While different OSs have had different views as to whether line-feed and/or carriage return or something else (like the Newline character used by some IBM characters but not part of ASCII and so not part of the historical heritage of some other OSs) should represent a new line in a text file, and while some programs have then overridden that in different ways, U+000D itself is still a carriage return, whatever later operating systems like Unix or DOS decided to do with it. (Of course, calling it... – Jon Hanna – 2014-06-05T21:43:42.940

1@OlivierDulac ... U+000D is proleptic, since that name came with Unicode in the 1990s, but that does quite definitely reference the code as it existed in ASCII in 1963, anf through that as it existed in Murray's modified Baudot code in 1901. Murray was solving problems related to moving paper around, with the same tools used in the concept of "text file" many decades later. Hammer a screw into something like a nail, and it's still a screw. Use LF and/or CR to represent the end of a line in a text file, and they're still line-feeds and carriage returns. – Jon Hanna – 2014-06-05T21:47:39.003

@JonHanna: apologies, i mixed in my comment carriage return and newlines. – Olivier Dulac – 2014-06-06T07:43:11.753

Because Control-M was the ASR-33 TTY keyboard combination to get the character. (And yes, Brian, Ctrl-G does ring a bell.) – Daniel R Hicks – 2014-06-06T18:30:07.070

Has nothing to do with "letter of the alphabet", other than when the ASCII table was laid out the alpha characters were assigned sequentially, starting from 0x41. – Daniel R Hicks – 2014-06-06T23:05:06.073

I knew you could actually use ctrl+i as tab (I use it on connectbot on my phone in vim) I didn't realize that ^M works the same way, and they work basically everywhere. Cool! – Wayne Werner – 2014-06-09T19:36:57.357

Answers

115

I believe that what OP was actually asking about is called Caret Notation.

Caret notation is a notation for unprintable control characters in ASCII encoding. The notation consists of a caret (^) followed by a capital letter; this digraph stands for the ASCII code that has the numerical value equivalent to the letter's numerical value. For example the EOT character with a value of 4 is represented as ^D because D is the 4th letter in the alphabet. The NUL character with a value of 0 is represented as ^@ (@ is the ASCII character before A). The DEL character with the value 127 is usually represented as ^?, because the ASCII '?' is before '@' and -1 is the same as 127 if masked to 7 bits. An alternative formulation of the translation is that the printed character is found by inverting the 7th bit of the ASCII code

The full list of ASCII control characters along with caret notation can be found here

Regarding vim and other text editors: You'll typically only see ^M if you open a Windows-formatted (CRLF) text file in an editor that expects Linux line endings (LF). The 0x0A is rendered as a line break, the 0x0D right before it gets printed as ^M. Most of the time, editor default settings include 'automatically recognize line endings'.

Art Gertner

Posted 2014-06-05T08:31:31.843

Reputation: 6 417

1

@keshlam It turns out that the uparrow was actually part of ASCII itself :-) The caret replaced the uparrow (and the underscore replaced the leftarrow) later on. Found this out here via Wikipedia.

– Abbafei – 2016-08-15T05:43:44.473

1That is correct, @abbafei. I started programming on ASR33 teletypes which had the older characters. – keshlam – 2016-08-15T08:14:54.117

Perfect, thank you. This is exactly what I was looking for. – dotancohen – 2014-06-05T09:17:17.570

1I always wondered what that thing was called... – smci – 2014-06-05T12:08:27.613

5This convention goes back at least to the 1970's; I first saw it on the TOPS-10 operating system but it may well have existed earlier. For what it's worth, on older ASCII terminals the character now shown as a caret was actually an upward-pointing arrow, so this originated as "uparrow notation". – keshlam – 2014-06-05T12:56:50.677

15This is explictly built into the ASCII design so that the Ctrl key just toggles bit 7. – OrangeDog – 2014-06-05T20:19:36.647

2It's not used only with letters. I would not define it as the control character with "the letter's numeric value" but rather as "xor 64". In other words, ^A is 0x41 xor 0x40, or 0x01 and ^? is 0x3F xor 0x40, or 0x7F. – R.. GitHub STOP HELPING ICE – 2014-06-06T06:24:25.350

It's also not used just with ASCII characters anymore. Windows for example allows you to detect and act on Ctrl-Del (hold Ctrl down and press the Del key). The Del key (or Delete)has no ASCII value, yet we sometimes see it written as ^Del. – rossmcm – 2014-06-06T06:49:18.397

@rossmcm - Actually, ASCII 0x7F is "DEL". Or course, what Windows regards as a valid key combo likely has no relation to reality. – Daniel R Hicks – 2014-06-07T12:27:48.617

1Ascii DEL (^?) has nothing to do with the delete key. It's actually the standard code generated by the <--- key (also, confusingly, called backspace) on VT100-like terminals. – R.. GitHub STOP HELPING ICE – 2014-06-09T22:32:21.050

The DEL code is significant (and is called DEL for "delete") because if you over-punch a paper tape with DEL (all ones) you erase the character. – Daniel R Hicks – 2014-06-11T00:47:08.270

22

That is exactly the reason.

ASCII defines characters 0-31 as non-printing control codes. Here's an extract from the ascii(7) manual page from a random Linux system (man ascii), up to and including CR (13):

   Oct   Dec   Hex   Char                       
   ─────────────────────────────────────────────
   000   0     00    NUL '\0'                    
   001   1     01    SOH (start of heading)     
   002   2     02    STX (start of text)         
   003   3     03    ETX (end of text)           
   004   4     04    EOT (end of transmission)   
   005   5     05    ENQ (enquiry)               
   006   6     06    ACK (acknowledge)           
   007   7     07    BEL '\a' (bell)             
   010   8     08    BS  '\b' (backspace)       
   011   9     09    HT  '\t' (horizontal tab)  
   012   10    0A    LF  '\n' (new line)        
   013   11    0B    VT  '\v' (vertical tab)    
   014   12    0C    FF  '\f' (form feed)       
   015   13    0D    CR  '\r' (carriage ret)    

Conventionally these characters are generated with Control and the letter relating to the character required. Teletypes and early terminal keyboards had 'BELL' written above the G key for this reason.

The standards document that defined ASCII is ASA X3.4-1963, which was published by the American Standards Association in 1963. I can't find the original document on their website, but this extract from the original document shows the character table, including the control codes above.

Flup

Posted 2014-06-05T08:31:31.843

Reputation: 3 151

4Thank you. Though informative, this answer does not contain the answer to the question. – dotancohen – 2014-06-05T09:01:01.643

1The answer is hidden in the second paragraph: ^M is shorthand for Control-M. On the terminal you would press the Control key together with the M key to send the ASCII kode 0x0D also known as a carriage return. – Martin Liversage – 2014-06-06T04:02:27.387

14

The notation goes back to the earliest ASCII Teletypes (ca 1963). There was a CTRL key that toggled the 0x40 bit so that CTRL-M (carriage return) would be 0D instead of 4D, CTRL-G (bell) would be 07 instead of 47, CTRL-L (form feed) would be 0C instead of 4C.

There was no "design" in assigning particular letters to particular functions, it was just chance that, when the dust settled from assigning ASCII codes, the M key was one bit different from carriage return and hence carriage return became CTRL-M.

Here is the best shot I can find of an ASR33 keyboard. As you can see the control character names are printed in small letters on the corresponding alpha keys.

Teletype Model 33 ASR with paper tape punch/reader

Image by Marcin Wichary, User:AlanM1 (Derived (cropped) from File:ASR-33 2.jpg) [CC BY 2.0], via Wikimedia Commons

The M key does not have a notation on it because there is a dedicated "RETURN" key, so CTRL-M is redundant.

Daniel R Hicks

Posted 2014-06-05T08:31:31.843

Reputation: 5 783

2On some level the extent to which we are still bound by design choices made for what now seem like ancient systems is quite surprising - I guess on reflection that (a) it's not that long ago, it's just that the pace of change in the interim has been astonishing, and (b) if enough design decisions are made, some of them (especially the ones that don't cause people enough problems) are bound to stick around long after the reasons for them disappear into memory. Still an odd feeling to look back at the history of some of these things though. – Stuart Golodetz – 2014-06-07T16:15:22.967

2@StuartGolodetz - Actually, I find it strangely reassuring. But then I remember when Teletypes were "advanced technology". (The Teletype ASR-33, by the way, was remarkable for it's elegant simplicity. I only wish that "modern" computer systems were as well-designed.) – Daniel R Hicks – 2014-06-07T17:29:10.310

1This is fascinating but what I don't understand is.. why of all things did they decide this typewriter needed a bell? – CaptainCodeman – 2014-06-08T00:08:23.473

4@CaptainCodeman - When you transmitted an important message you'd ring the bell to get the attention of the operator on the other end. – Daniel R Hicks – 2014-06-08T01:05:59.153

1@DanielRHicks - I guess the thought it makes me have is that perhaps the gap between what we consider "modern" and "ancient" technology isn't nearly as large as one might think it is. Indeed, much supposedly modern technology incorporates things with very old roots, although each generation thinks they're doing everything from scratch. Those young'n's :) – Stuart Golodetz – 2014-06-08T10:27:59.733

2It is interesting to note that the Ctrl key survives to this day on PC keyboards. – Daniel R Hicks – 2014-06-08T11:38:05.557

I don't see a dedicated "RETURN" key, but I do see a LineFeed key. Is that what you mean? – dotancohen – 2014-06-09T12:35:05.137

@dotancohen - Second row, far right, next to LINE FEED. – Daniel R Hicks – 2014-06-09T15:16:01.403

Thanks, I did not even recognize what was written there on two lines! – dotancohen – 2014-06-09T15:29:19.870

@dotancohen - I can probably still find it in my sleep. – Daniel R Hicks – 2014-06-11T00:40:33.417

@DanielRHicks: I understand that you're still wearing T-shirts from the mid 70's! – dotancohen – 2014-06-11T05:32:07.607

@dotancohen - Yeah, and my wife is really after me to take it off and wash it. – Daniel R Hicks – 2014-06-11T11:46:04.817

@DanielRHicks: I'll get off your lawn now! – dotancohen – 2014-06-11T11:48:13.303

3

The caret (^) is just shorthand for writing hold the Control key - CTRL down.

In the good old days you could type these codes (see above) in directly, Ctrl key + G (^G) would make the terminal go "ding"

When you want to add a CR in Vim you use Ctrl key + M etc tab = Ctrl + I

Don

Posted 2014-06-05T08:31:31.843

Reputation: 31

The term you are looking for is digraph, which means two characters that represent one character. Specifically, digraphs and trigraphs are used to represent nonprintable characters. Historically they have also been used for characters that do not appear on a keyboard, although with modern GUIs and keyboards this is less of an issue so this use is more archaic. – None – 2014-06-06T15:08:39.140

"In the good old days" is still today, with ^C and ^D being perfectly functional. The only reason that ^G doesn't make the terminal ding anymore is that most terminal emulators have that response turned off. – SevenSidedDie – 2014-06-09T17:45:29.897

2

The need for some visual manner of displaying what are by definition non-printable characters.

So, someone in the early 1970s (or maybe earlier) (I remember seeing it on CP/M, and someone else has already mentioned TOPS) decided that "caret plus letter" would be the symbol for the 26 unprintable ASCII control characters with values 1 thru 26. Value 0 is/was printed as ^@, and value 127 as ^?.

RonJohn

Posted 2014-06-05T08:31:31.843

Reputation: 165

1

Where is it documented, well this page lists every control character, with how to enter/represent it with the control key(though the first one, ascii character 0, has no control key representation), and it has nothing for character 127. And it provides sources at the bottom

https://www.cs.tut.fi/~jkorpela/chars/c0.html

One might wonder, given that there are 33 control characters (ASCII characters 0-31 so 32 charactres, + character 127. so, =33 characters) How they would be all represented as there are only 26 letters in the alphabet. Well, it uses Ctrl-A for Ascii character 1, Ctrl-Z for ascii character 26, and there once it reaches Ctrl-Z, it uses [ \ ] ^ _

It lists Ctrl-Z as SUB, though in DOS and the cmd prompt it's EOF, and as a techie user you use it when doing copy con a.a where a.a is your file. You enter the text and terminate it with Ctrl-Z which funnily enough doesn't enter an EOF marker. But does tell CMD that's the end of the file so CMD writes it.

That cs.tut.fi webpage gives this as a source
http://www.wps.com/texts/codes/X3.4-1963/index.html

but it's a broken link, but available on archive.org it's in the form of JPGs

American Standard Code for Information Interchange
ASA standard X3.4-1963

https://web.archive.org/web/20010430085116/http://www.wps.com/texts/codes/X3.4-1963/index.html

barlop

Posted 2014-06-05T08:31:31.843

Reputation: 18 677

Most of the control characters are meaningless, but even some of those with meaning like Ctrl-I i'm not sure where you can just do Ctrl-I and get a tab. – barlop – 2014-06-05T18:27:39.817

1none of the control characters are meaningless. Many of them are unused in many contexts, but every single one has at least one meaning. – Jon Hanna – 2014-06-05T21:53:42.100

@JonHanna Of course I don't mean they were meaningless(past tense).But R.Have been meaningless for decades i.e. they had their original meanings from eons ago,tech that no longer runs, are (most of the chars) meaningless today w/ current and even slightly old tech.n if any are being put to modern uses it's not many. There's a list here http://en.wikipedia.org/wiki/Control_character of ones in common use 0,7,8,9,10,11,12,13,127. 9/33 so the others (24 of them) u would either c very rarely or not at all as they r as dead as the antic unused out of use for decades machinery they were used on

– barlop – 2014-06-05T23:25:33.407

Associated Press still use ANPA-1312 which uses 1–4, 6 & 16 are used to start every TCP/IP connection. Modern printers (among other thigns) still use 17 & 19. Together with those you mention, we've quite a percentage of them covered without really trying. I'll grant you they aren't in heavy use, but they ain't dead either. – Jon Hanna – 2014-06-05T23:47:23.230

1@barlop You can do ^I for a tab in standard bash: type ls ~/^I^I and you should see all the folders in your home directory. – wchargin – 2014-06-06T02:11:46.940

@JonHanna In the case of TCP, it uses SYN and ACK but not with those ascii codes of SYN-0x16(^V) and ACK-0x6(^F). TCP doesn't use that ASCII, it uses a single bit for SYN 0x002 and a single bit for ACK 0x010 And so any values with those bit set would indicate SYN and/or ACK. As for Printers DC1,DC3 and Associated Press and the AP-1312 that is an interesting case I see mentioned here too http://en.wikipedia.org/wiki/C0_and_C1_control_codes I suppose that counts but I wonder to what extent they are control characters if you can't make them with Ctrl - Maybe back in the day you could?

– barlop – 2014-06-06T08:54:47.460

0

You can see all of the non pritable ASCII characters Control mapping in this table.

Ofir Luzon

Posted 2014-06-05T08:31:31.843

Reputation: 216

5

Whilst this may theoretically answer the question, it would be preferable to include the essential parts of the answer here, and provide the link for reference. That way, should the linked page ever change or become invalid for any reason, the answer will still be useful to visitors to Super User.

– a CVn – 2014-06-05T08:56:16.620

3Thank you. Though informative, this answer does not contain the answer to the question. – dotancohen – 2014-06-05T09:01:32.320