What happens when assembly code is translated into Object code?

2

I am interested in System Software development. I have been analyzing the working of a compiler for a few days. An Assembly code generated by a Compiler(say) clc has an opcode f8 and I am sure that the Assembler assembling the above mnemonic, substitutes its opcode f8 in it's place.

What is bothering me is the aftermath of this stage(I'm aware of the Linking stage in-between).

I mean, What exactly happens after this stage? Say the final executable is a raw binary file. Does that mean the opcode f8 is converted into binary data 1111 1000 and stored in the file?

If that is the case, why am I not able to view the binary contents of a binary file using a normal text editor(say Notepad) - after all it's '0's and '1's right?

Panther Coder

Posted 2017-02-16T04:48:39.720

Reputation: 133

"I'm aware of the Linking stage in-between" -- Incorrect, the linking stage would be after assemby. "What exactly happens after this stage" -- Depends on whether the assembly produces relocatable object code (which could be linked with other objects files), or absolute object code. "after all it's '0's and '1's right" -- Yes, but a text editor always treats that binary data as codes for text (e.g. ASCII), whereas a disassembler will treat the data as machine code, and display opcodes and operands. – sawdust – 2017-02-16T06:43:54.910

1You are missing a key point, f8 doesn't need to be "converted", it already is 1111 1000 they are just different representations of the exact same thing. One is shown as hex, the other as binary. Hex has the benefit of being slightly more human readable and has a neat side effect of splitting binary quads into single digits, in this case f = 1111 and 8 = 1000. The basic unit used by the CPU is binary digits, but humans tend to use the hex representations. – Mokubai – 2017-02-16T07:26:26.263

Answers

2

First, always use the right tool for the job. Text editor for viewing binary files is the same as to use a knife for nailing. Use any HEX viewer/editor for such tasks or better use the tool that knows internals of the binary file in question. If we talking about CPU's opcodes then something like IDA Pro free or OllyDbg would be useful for analyzing internals of executable files.

Does that mean the opcode f8 is converted into binary data 1111 1000 and stored in the file?

As was correctly pointed by @Mokubai - 0xF8 is same number as 1111 1000, one represented in HEX notation and the last one as binary representation. It is the same as number 248 in decimal system.

If you creating manually executable code from CPU opcodes (or compile assembler source code), then i386 CPU will recognize 0xF8 (or 0b11111000 or 248 - it all the same) as CLC instruction.

An Assembly code generated by a Compiler(say) clc has an opcode f8 and I am sure that the Assembler assembling the above mnemonic, substitutes its opcode f8 in it's place.

That's true, except - "An Assembly code generated by a Compiler". I just want to be sure you correctly understanding difference between "Assembly code" and opcodes. Opcodes are exact language that CPU can understand, it just numbers ( and it is how we programmed first computers when translators from CPU mnemonics aka assembler was a dream )

Nowadays, we mostly using "direct" compilation from high level programming language directly to executable binaries with compilers such C/C++/GoLang that produce CPU opcodes.
(When I said "direct compilation" that's not actually true, under the hood compilers doing multiple steps before it produced executable binaries, but for the end user it looks the same as we driving a car without need to know how gasoline converted to movement)

As was mentioned correctly by @sawdust in comment, higher level programming languages can use different strategies to create CPU opcodes. You can analyze for example gcc compiler how it would cook opcodes by telling it to generate assembler code that would be used to make opcodes(object codes)

 gcc -S -o myprogram.asm myprogram.c

If that is the case, why am I not able to view the binary contents of a binary file using a normal text editor(say Notepad) - after all it's '0's and '1's right?

Notepad speak another language. It understands its own "opcodes" - ASCII, anything else it's "greek" to Notepad.

Alex

Posted 2017-02-16T04:48:39.720

Reputation: 5 606

1"Compilers such C/C++/GoLang don't create "assembly code", but generate opcodes directly," -- Maybe you can come up with an exception, but this is not true in general. I've used at least three C compilers, and all three generated assembly source before generating object code. I know this because I had to report optimizing compiler bugs, and used the assembler output to prove that the compiler was generating bad code from C. – sawdust – 2017-02-16T06:52:53.093

You are misleading OP by agreeing that f8 gets "converted" to binary. No actual conversion happens at all nor needs to, they are simply different representations of the exact same thing. f8 is simply a more human readable representation of 1111 1000. – Mokubai – 2017-02-16T07:33:18.943

@Mokubai No I didnt tell that 'f8' ->binary, I said clc mnemonic ->f8 which is 11111000. f8 and 11111000 is just different representation of the same number – Alex – 2017-02-16T07:35:58.443

@sawdust You right, I meant gcc -o binexe source.c. I'm going to correct my answer – Alex – 2017-02-16T07:40:09.700

He asks a specific question which you quoted, and then answered in the affirmative, then clarified. I don't dispute the correctness of the following sentence, only that giving a direct "yes" as the first part of your sentence answering his question may give him the impression that his assumption was correct and that there is some extra conversion step going on. Removing the "yes" would be enough. – Mokubai – 2017-02-16T07:40:45.340

@Mokubai Ohh, I see now what you mean. Thanks for help ! – Alex – 2017-02-16T08:05:22.160

*"it is how we programmed first computers when compilers from CPU mnemonics aka assembler was a dream"* -- Are you claiming to be that old (to have used UNIVAC)??!! Compilers are not assemblers, and should be conflated. I doubt that you've been programming longer than me (i.e. since 1967). What computer were you using that didn't have an assembler? FWIW I have written in machine code, but only for patches to firmware. – sawdust – 2017-02-16T08:40:45.180

@sawdust UNIVAC like computer that what we programed on paper cards in university, but it didn't touch me. My passion to computer's world started from intel 8080 on custom "computer" that was engineered and built with hundreds of SN74LS74,SN74LS00... incompatible with the rest of the world where "operation system" fit there in 8kb of EPROM. You can imagine how much "fun" was to program first versions of that computer by typing on hardware programmer opcodes and burn it to EPROM.
Later we wrote assembler for that comp and it was one of the happiest moment in my live :)
– Alex – 2017-02-16T09:35:16.277

@sawdust You are right about correct term regarding assembler, I corrected it, compilers -> translators. I probably should avoid long explanations with my English. I appreciate you helped me to correct my answer. – Alex – 2017-02-16T09:39:06.803

I'm pretty sure Intel had an assembler available; you just had to pay for it. But you probably did not have the peripherals to use it anyway. – sawdust – 2017-02-16T18:13:12.860

"we mostly using direct compilation from high level programming language directly to executable binaries with compilers" -- That's still a false statement. Just because a compiler has intermediate steps (such as generation of assemby language from HLL) that are not visible, that does not mean that there's a direct code generation. You're also ignoring the linking step. The typical executable the OP refers to is not a binary image, but probably a relocatable executable file that requires dynamic linking with shared libraries. – sawdust – 2017-02-16T18:15:55.147

@sawdust I don't think that we need to start with deep explanations - "How compilers works" when the question is about - "Why opcodes can't be seen in Notepad", but I really appreciate your comments that helped improve my post ! I added clarification regarding "direct compilation". – Alex – 2017-02-16T23:24:37.550