Antivirus detection is a feature extraction and a classification problem.
A great analogy is the 20 questions game where the goal is to identify an arbitrary object by asking 20 seemingly unrelated yes/no questions. The idea behind the game is that each answer would eliminate half of the objects so it is theoretically possible to describe 2^20 (1,048,576) objects with only 20 binary features.
A different analogy is how the visual cortex processes visual information. The brain has very simple and fast hardware for detecting and classifying an infinite number of images. Only six layers of neurons (the number of neurons is estimated at 140 million) are used to extract progressively more complex features and pass them on to the next layer. The layers interact back and forward to each other to produce abstract notions that can be verified against memory.
Antivirus engines store many features of known malware in the definition file and when they scan a new file they optimize the extraction and classification (matching) of those features. Storing features also makes the detection more robust so that small changes in a piece of malware won't thwart detection. Feature extraction is also done in parallel so that resources are fully utilized.
Most features are designed by humans but there are some that do not make sense by themselves, like having a null byte at the end of the file or a ratio between file size and printable text size. Those nonsensical or unintuitive features are randomly generated and tested by data mining vast quantities of files. In the end the file is described and classified by the combination of features. As a side note, the best predictor for questions being closed on Stack Exchange is whether the first letter of the question is in lower case.
So when a new file is scanned, it is quickly classified into finer and finer categories and then it is matched against a small set of signatures. Each step would exclude a large number of clean files and would dictate what other features should be extracted next. The first steps are very small in terms of computing resources but they dictate which more expensive steps should be taken later.
By using only a few disk reads and CPU cycles the engine can determine the file type. Let's say it is a JAR file. Using this information, it starts collecting features of the JAR file. If it's signed, then the scan is aborted. If it's not importing any functions then the scan is aborted (I'm oversimplifying here). Is it using any tricky functionality? then more features should be extracted. Does it use known vulnerable functions? Then it should be thoroughly checked for known Java exploit signatures.
On-access scanning has the same principle behind but it also works like a gatekeeper. So each action (usually API call) taken by a process is being checked for and allowed or denied. Similarly, each suspicious action triggers more filters and more checks. During the checks the process or thread is waiting for the operation to complete but sometimes the whole process is actively suspended. This might look like significant overhead but once a specific action is verified, it is later cached and performed very quickly or not performed at all. The result is a performance degradation similar to having a machine a couple of percentage points slower. Check the PCMark scores for 20 AV products here.
So the speed optimization comes from very little work being performed on clean looking files which constitute the overwhelming majority of scanned files. The heavy lifting work is being done only on suspicious malware-looking files for which AV might even take seconds to emulate the process or even send it to the cloud for analysis.
The magic is in the progressive classification.