34

PyPI is a third-party software repository for Python packages. Everybody can upload packages to it (see The Python Package Index (PyPI)).

  • How does PyPI prevent people from uploading malware?
  • When I am searching for software, how can I be (more) sure that it is not malware?
  • What can I, as a developer of packages, do to make others feel safer using my packages?
  • Are there "historic" examples of malware in the repositories? How much harm did they do?

I've asked the question for PyPI, but I'll also be interested in similar repositories like npm (JavaScript) or composer (PHP).

I have asked this question for CTAN (tex) in the tex.SE chat. The answer was there are no security measures. They trust people / developers not to upload malware.

MWB
  • 303
  • 2
  • 11
Martin Thoma
  • 3,902
  • 6
  • 30
  • 42

4 Answers4

15

I actually wrote a blog post on this topic recently.

To answer your points:

How does PyPI prevent people from uploading malware?

It doesn't. Any Python code can be uploaded. Arbitrary code execution is possible during package installation. Audit your dependencies carefully.

When I am searching for software, how can I be (more) sure that it is not malware?

Download the tarball and look at the code.

What can I, as a developer of packages, do to make others feel more save using my packages?

If your users trust you, PyPI works perfectly fine. Packages are served over HTTPS and checksums are available to verify against.

Are there "historic" examples of malware in the repositories? How much harm did they do?

I'm not aware of any historic examples but I can easily upload a demo package. PyPI doesn't do anything to prevent this. It acts as a simple index of Python packages, it is up to the users which developers they trust.

  • 7
    Downloading the source code and checking it is not realistic. Think of numpy / scipy. They are HUGE. It would take me weeks to check them. I could, of course, check "easy" malware checks like grepping for 'http' to check if they call websites, but I guess (hope) there are more sophisticated automatic ways to check packages. – Martin Thoma Jan 16 '15 at 13:31
  • 2
    @moose There isn't. It's either audit your dependencies or place your trust on the developers. –  Jan 16 '15 at 13:32
  • 3
    @TerryChia and trust the developers own operational security and the security of the people running the repos, and the background of new developers on the project. . . – Rory McCune Jan 16 '15 at 13:33
  • @RоryMcCune Well, yeah. Computers are broken. Nothing new there. –  Jan 16 '15 at 13:34
  • @TerryChia although what's new'ish is those computers being used to run everything from your power meter to your healthcare devices to your banking to your tax payments :) – Rory McCune Jan 16 '15 at 13:39
  • 1
    The link is broken. – Martin Thoma Jan 12 '17 at 13:02
  • If you think you can use a computer without trusting *someone*, read "Reflections on Trusting Trust" – Ian D. Scott Feb 01 '17 at 18:33
  • "PyPI works perfectly fine. Packages are served over HTTPS" sorry, but this is flawed. X.509 does not provide authentication of the owner of the website/domain. It only provides authentication that the cert from hundreds of trusted CAs (including their subordinate CAs). Some of which are controlled by known-untrustworthy States See https://security.stackexchange.com/q/234052/213165 and https://security.stackexchange.com/a/234098/213165 – Michael Altfield Aug 10 '20 at 07:42
13

There are two threat models here:

  1. Malicious developer uploading malicious packages
  2. Malicious attacker uploading malicious packages that belongs to legitimate developers

PyPI does not make any attempt to try to resolve #1. Auditing code before installs and only installing packages from reputable developers are the only "protection" you have against these. If you found malicious package, you can report it to PyPI maintainers and the package will probably get removed. But effectively, there's no protection against it as PyPI packages are not pre vetted before they are available for installs.

On the second threat model, there are a number of security measures. Newer version of PyPI downloads packages over HTTPS, and packages can optionally be GPG signed. There are proposals to implement theupdateframework (tuf), though I don't know how far that they have gone to.

Installing to your user packages or virtualenv can limit the damages that malicious package can have, by limiting the installer to not use sudo during install. But don't rely on it too much.

Lie Ryan
  • 31,089
  • 6
  • 68
  • 93
  • 1
    3. malicious attacker grabbing a package name similar to a very popular one (i.e. adding/removing a dash from the package name, or feasting on common spelling mistakes... a package that installs in the hundred-thousands could yield hundreds of infections...) –  Nov 04 '16 at 02:15
  • Another threat model: MITM the https connection with a cert signed by a CA controlled by a malicious thrid-party (the State) or careless third party (DigiNotar, Symantec, etc). See https://security.stackexchange.com/q/234052/213165 – Michael Altfield Aug 10 '20 at 07:49
7

As far as I'm aware, in general there are very limited assurances around the code available from online code repositories (e.g. rubygems, npm, nuget, PyPI etc). In a lot of cases they don't support or enforce things like signed code or other integrity based security measures and authentication to the sites to deploy is, in some case, just a username/password combination, so you're reliant not only on the developers not deliberately placing malware in their code but also that those developers have good operational security practices.

An even larger problem is that in a lot of cases the libraries you install have dependencies, in some cases a large number of dependences, so you are reliant that the developers of those dependencies also are not malicious and have good security practices.

As Terry says with many code libs there are hooks to execute arbitrary code on installation (before the library is even used) so just the act of installing can have bad consequences, especially if done as a privileged user.

In terms of historical instances of compromise, well there was the Rubygems compromise in 2013 as an example of a repository being attacked.

In terms of fixing the problem, well there's the Update Framework initiative to try and improve the situation.

Rory McCune
  • 60,923
  • 14
  • 136
  • 217
  • TUF ok, but is there a way to install TUF and all its dependencies in a secure way? Secure meaning: the payload's integrity can be validated with a cryptographic signature providing authentication of the publisher to a single trusted key? This doesn't seem possible for TUF https://github.com/BusKill/buskill-app/issues/6#issuecomment-671087395 Are there any alternatives to TUF that can actually be installed in a secure way so that untrusted code doesn't have to be added to the cold-storage machine where the dev's private keys live? – Michael Altfield Aug 10 '20 at 07:58
  • How does TUF help in this regard? I want to implement this for NPM – Nathan Aw Oct 25 '21 at 04:06
3

As of 2020, pip does not provide cryptographic integrity or authentication on any of the packages it downloads from PyPI.

In 2013, PEP 458 was presented as a solution to fix this.

In 2019, the Python Software Foundation announced that Facebook donated funding to implement it.

As of mid-2020, this is still a work in progress that can be tracked via this Github Milestone for the Python Packaging Authority.

In the meantime, the best you can do is hope that the developer of the python module either:

  1. Uses a tool like twine and optionally chooses to sign and upload their package with gpg. Pip has no built-in mechanism to do this validation, but you can do it manually.

  2. Releases a cryptographically signed document including the checksums of their releases, which you can then pass into pip with the --hash argument

Michael Altfield
  • 826
  • 4
  • 19