HTML From Text File

1

Why is it that copying 'view source' data from a webpage, putting it in a word document, and opening it back up in a webpage results in a text file instead of the actual html showing? For example, say I grab some html from superuser.com, make a slight modification, and try to run it. Is there something that will prevent me from seeing the page?

user347132

Posted 2014-08-23T23:26:59.790

Reputation: 17

it should be viewable as html, without css/scripts that were based relative to the original web page. What browser did you use? Maybe you have the wrong ext on the file? – Logman – 2014-08-23T23:33:37.827

@Tyson You should make this an answer. – krowe – 2014-08-24T02:55:10.487

Answers

1

Pages from any web site are just HTML files that are transfered from a remote server to your computer so your browser can render them. (It is a lot more complex than that, most times they can be generated on-the-fly, and there's also separate images files transferred together with the HTML file so the browser can place images on the page, and also there's javascript that will describe bevahiour and CSS that will describe appearance. But for the purposes of this answer we can just simplify it to what I stated earlier).

HTML files are just plain text files. They must have specific tags in them (the HTML tags, enclosed by <>), but other than that they are ASCII text files just like any .txt. If you "view the source" on any browser, you are just seeing the exact contents of the HTML file the browser receives prior to rendering it on your screen.

Now, your browser doesn't care where the HTML file comes from. It can come from a website, or from a folder on your computer. You can even drag a .html file to a broser window and it will try to render it (it might be broken and weird for the lack of images, javascript and css, but it will have some content at least).

When you view the source, copy it, paste on Word, and save as text, you are just creating a new HTML file on your computer. That file will lack all the images, javascript and CSS, but other than that it will be a perfectly valid HTML file. What you will see on your screen will be the best attempt of the browser on rendering it properly.

To illustrate what I mean, I have opened this very exact page for this question, pasted the code on notepad, saved it on a folder and opened it. Here is the result (note I don't see any single HTML tag, just text!):

enter image description here

Depending on the browser, if you just change the file extension to .txt it will display the source code of the file, HTML tags and all, instead of rendering it. Firefox 31 on Windows 7 does that at least.

Note that if you paste the HTML on Word and save it as a .doc or .docx file, and then open it on your browser, all you will see are garbled characters, because browsers aren't meant to render Word files.

That Brazilian Guy

Posted 2014-08-23T23:26:59.790

Reputation: 5 880

-2

Save it as a text file with notepad (not word), rename it to .html or .htm, then open it in a browser. Sure you can combine save and rename into one operation if you understand when the extension is actually changing and when it's not. The difference is word is also writing unseen info... notepad does not.

Tyson

Posted 2014-08-23T23:26:59.790

Reputation: 1 304