12

Does python's pip package manager cryptographically validate its payload's authentication and integrity for all packages after downloading them and before installing them?

I see a lot of guides providing installation instructions with steps asking the user to install python dependencies with pip install .... I usually don't do this as I trust my OS package manager (ie apt) to actually validate the origin/trust and integrity of the package before installing it.

Does pip provide cryptographic authentication and integrity checks for all items downloaded before installing them by default?

Note: Transport validation via X.509 does not count as a valid auth/integrity check.

Michael Altfield
  • 826
  • 4
  • 19
  • 1
    Interesting question. Could you explain why TLS doesn't count? If your threat model is modification by MITM, it should count, right? So is your threat model that someone has been able to replace the real package with a malicious one on pips server? – Anders Jul 02 '20 at 10:49
  • TLS doesn't count as cryptographic authentication because (without cert pinning, which is rarely used and usually impractical), using https will trust any cert that has been signed by a trusted CA for the given domain. There are thousands of CAs in popular "trusted" root stores, including many that are controlled by government agencies whose governments have a known history of committing cyber attacks, including active MITM attacks. See https://security.stackexchange.com/questions/234052/where-can-i-find-a-list-of-all-government-agencies-with-cas-in-pki-root-stores – Michael Altfield Jul 02 '20 at 11:49
  • What "counts" as cryptographic authentication depends on your threat model. Maybe not relevant for this question anymore, but for the future it helps if you specify what your concerns are in the question so that people know what to adress in their answers. Again, as someone who runs pip on a daily basis I found this very interesting. Thanks! – Anders Jul 02 '20 at 11:58

3 Answers3

5

Sort of...

Firstly, Pypi includes a hash of the file being downloaded, so that any modifications/errors between server and client will be spotted.

Secondly, pip has support for a hash-checking mode where you can specify the required hash for the requested package in requirements.txt in the form:

Foo==1.2.3 --hash=sha256:xxxxxxx

pip will then verify that the downloaded package hashes to this value, and errors if it doesn't. https://pip.pypa.io/en/stable/reference/pip_install/#hash-checking-mode

Thirdly, Pypi has a mechanism where a signature can be uploaded along with a package. twine has support for this.

You can then download the signature alongside the package and verify it. The signature file is found at the same url, but with .asc appended - e.g. https://pypi.python.org/packages/py2.py3/p/pip/pip-7.1.2-py2.py3-none-any.whl and it's signature in https://pypi.python.org/packages/py2.py3/p/pip/pip-7.1.2-py2.py3-none-any.whl.asc

You can manually do the verification by downloading both files and running e.g.

gpg --verify mypackage.whl.asc mypackage.whl

However at present there isn't a mechanism built into the pip tools to do this step automatically on your behalf - though there has been discussion of this as a much-needed feature on several occasions recently among developers.

match
  • 159
  • 1
  • 1
    "Pypi includes a hash of the file being downloaded" Unless the hash is provided out-of-band (which I don't believe pip does for you), then this is entirely useless as the malicious actor whose changing the package's contents can trivially change the hashes too. Moreover, the hash also doesn't provide any security unless it's cryptographically signed (again, the public key for the signature would need to be obtained out-of-band). – Michael Altfield Jul 01 '20 at 23:39
  • 1
    "https://pypi.python.org/packages/py2.py3/p/pip/pip-7.1.2-py2.py3-none-any.whl" How did you fetch this URL? When I visit the PyPI website, the download URL sends me to an object in S3. Merely appending '.asc' to this object does not work. So how can i find which packages have sigs and where I can download those sigs? https://pypi.org/project/pip/#files – Michael Altfield Jul 02 '20 at 00:56
  • [out-of-band](https://csrc.nist.gov/glossary/term/out_of_band) – djvg Feb 01 '22 at 15:51
  • are you sure you want o manually verify tens of dependencies? It is very risky to use pip to install production code, if the pip does not verify metadata and signatures. pypi is a hacker paradise – Wang May 18 '22 at 10:55
5

The short answer is: pip always uses TLS, which is actually fairly useful here. It means that as long as no-one's managed to compromise PyPI itself or steal the site certificate, then you can be certain that the packages you download are the ones that the PyPI admins think are correct. And it's hard to do better than that: after all, the PyPI admins are the only ones who know which users are allowed to upload which packages, so you kind of have to trust them.

As match mentions, there also used to be a way to upload PGP signatures for packages. However, that's been removed, since it was basically just security theater – complicated and makes it feel like you're being secure, but doesn't actually improve security. One of PyPI's main admins has an old post about this: https://caremad.io/posts/2013/07/packaging-signing-not-holy-grail/

What would be better is to use a framework like TUF that can provide guarantees like: "the person who uploaded this was trusted by the PyPI admins at the time of upload, and if PyPI is compromised afterwards then the attacker can't go back and change anything that happened before the compromise". TUF is roughly similar to the package signing used by linux distributions, but a bit more powerful. The PyPI maintainers got a grant to implement this and work is in progress now: https://wiki.python.org/psf/PackagingWG#Warehouse:_Facebook_gift

One challenge is that to bootstrap a cryptographic system like this, you need a key signing ceremony, which was going to take place in-person at PyCon this year... but, well. Please be patient :-)

In the mean time, you can get a similar effect locally by putting package hashes into your requirements.txt: this guarantees that if an attacker somehow sneaks in a fake package after you ran this command, it will be rejected. Or some dependency management tools like pipenv or poetry will do this for you automatically.

  • 2
    "the PyPI admins think are correct" "the person who uploaded this was trusted by the PyPI admins at the time of upload" This is nonsense. **Anybody** can upload a package to PyPi – there are no admins checking the package. Your first link even says so: "All this said, we have not addressed whether it is safe to install this package. I could register a malicious package called “hackme” and sign it using any of the above methods and if you install it, even with the valid signature, you have decided to accept the consequences of running my code." – idmean Jun 07 '20 at 07:29
  • 3
    Anyone can upload a package to PyPI under a new name, but the only people who are allowed to upload a new version of an existing package are the ones who have user accounts in the PyPI database that are marked as maintainers of that package. You can't go uploading a new version of "django" or "requests" because those aren't your project, and PyPI will reject your upload. The point is that package signing is useless if you don't know who is *supposed* to release the package, and you're trusting the PyPI admins to maintain that database for you. – Nathaniel J. Smith Jun 08 '20 at 07:02
  • "pip always uses TLS...as long as no-one's managed to compromise PyPI itself or steal the site certificate, then you can be certain that the packages you download are the ones that the PyPI" Are you implying that pip does TLS certificate pinning? If so, that's great. But pinning certs is rarely done. The default would be to trust any cert in the root store, which makes your statement invalid. – Michael Altfield Jul 01 '20 at 23:10
  • update: found this ticket from 2013 that suggests that PyPI never implemented cert pinning. Please update your answer to note that TLS doesn't provide much use here since there's thousands of CAs trusted by root stores, which includes CAs owned by government agencies that are known to commit cyber attacks including active MITM attacks. https://github.com/pypa/pip/issues/1168#issuecomment-464979351 – Michael Altfield Jul 01 '20 at 23:37
  • Yes, that's the "steal a site certificate" threat. Saying that this "doesn't provide much" is silly though. The CA system is far from perfect, but security is about economics, not perfection, and only a tiny fraction of potential attackers have the resources to pwn a CA and then mount an active MITM attack. – Nathaniel J. Smith Jul 03 '20 at 00:59
3

No. As of 2020, pip does not provide cryptographic integrity or authentication on any of the packages it downloads from PyPI.

In 2013, PEP 458 was presented as a solution to fix this.

In 2019, the Python Software Foundation announced that Facebook donated funding to implement it.

As of mid-2020, this is still a work in progress that can be tracked via this Github Milestone for the Python Packaging Authority.

In the meantime, the best you can do is hope that the developer of the python module either:

  1. Uses a tool like twine and optionally chooses to sign and upload their package with gpg. Pip has no built-in mechanism to do this validation, but you can do it manually.

  2. Releases a cryptographically signed document including the checksums of their releases, which you can then pass into pip with the --hash argument

Michael Altfield
  • 826
  • 4
  • 19