Mixed passive content, sometimes referred to as mixed display content, like serving images, audio, video files, or any other content that can't alter the DOM - thus the use of "passive" in the name, as you mention yourself - through the non-encrypted HTTP and the requesting document via the encrypted HTTPS is prone to attacks that could replace these HTTP served contents with inappropriate or misleading information. Think here for example of misleading the user in believing he's expected to do some action, or is otherwise misdirected by the Man-in-The-Middle (MiTM) replaced contents. The difference here is, that the attacker wouldn't be able to affect the rest of the page, but only the contents loaded via the non-encrypted HTTP protocol.
Additionally, an attacker could track users by inferring information about the user's browsing activities through HTTP loaded contents that are served to the user. These contents might be limited to displaying on only specific pages, and the request for them could tell the attacker what page the user was visiting.
The attacker can intercept HTTP header information that is sent via the unsecured protocol, redirect requests to another server, or change information in the HTTP response (of course, including headers, so also cookies). Request info includes user agent string and cookies associated with the domain the HTTP served contents are served from. The attacker could change any of this information at will to facilitate easier user activity tracking, or misguide the user with false information.
If these contents are served from the same domain as the main page requesting them, then the assumed protection the user receives by opening a HTTPS page might become even more useless, since the attacker can read the user's cookies that don't attach the ;secure
tag via the linked HTTP contents' request headers, indicating to the user agent (browser) to only include such tagged cookies when an encrypted / secured HTTPS channel is used to make additional, linked content requests.