TL;DR> MAC addresses are a low level component of an Ethernet network (and some other similar standards, such as WiFi). They allow a device to communicate with a machine on the local physical network (LAN), and cannot be routed across the Internet - because physical hardware might in theory be plugged in anywhere in the world.
By contrast, IP addresses cover the whole internet, and routers use them to figure out where to send data even if it needs multiple hops to reach its destination – but they aren't helpful in interfacing with the physical hardware on your local network.
If we ever found a better standard than Ethernet, it might not use MAC addresses but IP traffic from the internet could still flow across it, even if other people on the internet had never heard of it.
If we ever found a better standard than IP (for example IPv6 if all the IPv4 addresses ran out), most Ethernet hardware could carry the new kind of traffic without modification – and a simple software/firmware update would fix most of the rest.
MAC addresses are required to make a local Ethernet (or wifi) network function. They allow a network device to attract the attention of a single directly connected device, even though the physical connection is shared. This can be important when thousands of devices are connected together within a single organisation. They serve no function on the wider internet.
To really understand the answer to this question, you need to understand the OSI (sometimes known as the 7-layer) model.
For communication to take place between 2 applications running on separate machines which don't have a direct physical connection, a lot of work needs to take place.
In the olden days, each application would know exactly which machine code instructions needed to be run in order to produce an appropriate signal that would reach, and could be decoded by, the application at the far end. All communication was effectively point-to-point, and software had to be written to suit the exact situation in which it was to be deployed. Obviously, that was unsustainable.
Instead of this, the problem of networking was split into layers, and each layer knew how to speak to the matching layer on a remote machine, and how to communicate with the layer beneath (and sometimes above) it on its local machine. It knew nothing at all about any other layers in place – so your web browser doesn't need to care whether it is running on a machine that uses a token ring, ethernet or wifi network – and definitely doesn't need to know what hardware the remote machine uses.
To make this work, the 7 layer model uses a system rather like nested envelopes; the application creates its data and wraps it in an envelope for the Operating System to deliver. The OS wraps this in another envelope and passes it to the Network driver. The Network driver wraps this in yet another envelope and puts it onto the physical cable. And so on.
The bottom layer, layer 1, is the physical layer. This is the layer of wires and transistors and radio waves, and at this layer, communication is mostly just a stream of ones and noughts. The data goes everywhere that is physically connected. You plug your computer's network port into your switch using a CAT-5 cable.
Layer 2 is the Data link layer. This provides some structure to the ones and noughts, some error detection and correction capabilities, and some indication about which physically connected device (physical connections here can actually be over wifi) should pay attention to the message. This is the layer that MAC addresses come into play, and we'll come back to it later. But MAC addresses aren't the only possibility at this layer. Token ring networks, for example, need a different data link implementation.
Layer 3 is the Network layer. This is the layer that IP works at (though it isn't the only network layer protocol either), and it is this that allows computers to send a message that can get to any machine anywhere on the "network". There does not need to be a direct connection between the machines in question.
Layers 4-7 are higher level protocols. They get ever further away from the hardware and closer to the application. TCP, for example, sits on top of IP, and provides mechanisms that automatically resend messages when they go missing.
So MAC addresses work at Layer 2, and permit 2 machines that are physically connected to one another to send messages that will be ignored by other machines which share the same physical connection.
Suppose I have an application that wants to send some data to the machine with IP address 8.8.8.8
Layer 3 wraps up the data in an envelope that contains, amongst other things, the IP address 8.8.8.8 and then hands this to layer 2.
Layer 2 looks at this IP address and decides which machine that it is directly connected to is able to deal with this message. It will have a lookup table of a selection of the directly connected IP addresses together with the corresponding MAC address of the network card in that machine. This lookup table is constructed using a protocol called ARP, which lets a network card asks questions of the other directly connected devices. Ethernet reserves a special MAC address, FF:FF:FF:FF:FF:FF, which lets a device talk to all physically connected devices.
If the IP address is in the table (or can be resolved through ARP), it will wrap the Layer 3 envelope in a Layer 2 envelope with the MAC address in the new header, and then pass the whole bundle to the hardware at Layer 1. The network card with the matching MAC address will receive the message and the network driver will open the Layer 2 envelope and pass the contents up to whichever part of the operating system is expecting to receive messages at the specific IP address.
Alternatively, if the IP address isn't on the local network, the new envelope will have the MAC address of the default gateway (i.e. Router) configured for this network interface, and the hardware will transport the packet to the router.
The router notices its own MAC address in the layer 2 envelope, and opens the level 2 packet. It looks at the IP address on the level 3 envelope, and works out where the message needs to go next, which is probably going to be the router at your ISP. If the router uses NAT (or similar), it may even modify the level 3 envelope at this point, to keep your internal IP addresses private. It will then wrap the level 3 envelope in a new level 2 envelope that is addressed to the ISP's router's MAC address, and send the message there.
This process of removing the outer envelope and wrapping the contents in a new envelope addressed to the next step in the chain will continue until the message reaches the destination machine.
The envelopes will then continue being ripped off as the message walks back up the layers until it finally reaches its intended recipient, which will be an application somewhere which, hopefully, will know what to do with the message – but will have no idea how the message got there nor indeed all the steps required to get the response back to the original machine.
But it all works, almost like magic!
Note that network switches can use MAC addresses to optimise the flow of network traffic. While an ethernet hub simply forwards all incoming traffic to all of its ports, by contrast a switch can forward traffic only to the single port that the packet's destination MAC address is connected to. This increases the effective bandwidth of the network; by targeting specific ports, the switch avoids forwarding traffic on unnecessary segments of the network. The switch will use either ARP or packet sniffing to identify which devices are connected to which port. Switches completely ignore the contents of the Layer 2 packets.
@BillMichell, Wow this is amazing! Just some quick questions: 1) What do you mean by "it might not use MAC addresses but IP traffic from the internet could still flow across it"? Do you mean that we build another "IP-like" layer below MAC layer? and 2) Seriously this is where it gets interesting, "most Ethernet hardware could carry the new kind of traffic without modification", please elaborate exactly what kind of traffic are you describing? – Pacerier – 2015-04-06T15:37:53.570
@Pacerier Those questions would seem to be worthy of a follow-up question. Basically, I'm saying the system is built to be flexible, and parts of it can be swapped out freely. In particular, I don't care what hardware you have, as long as you can cope with TCP traffic, and my Switch doesn't care if someone comes up with a new application format – it can still route the Ethernet traffic. – Bill Michell – 2015-04-07T08:59:29.097
Hi! thanks for the answer. As far as I've read, your answer is the best. It would be awesome if you could include some more concepts like ARP and NAT within your scenario. – Vishnu Vivek – 2013-07-26T01:09:41.300
1Added reference to ARP and network Switches. I don't think NAT has anything to do with MAC addresses, being a layer 3 function... – Bill Michell – 2013-07-26T08:39:27.830
@BillMichell: In IPv6 the MAC or other local ('hardware') ID can be used to compose the IP. – Luciano – 2013-07-26T15:37:42.723
The answer is community Wiki. You can probably edit it to include this additional information if you think it will help answer the OP's question. – Bill Michell – 2013-07-26T15:48:50.133
This needs a TL;DR. – AJMansfield – 2013-07-26T16:33:23.920