3

It is well known that PyPI does not prevent the upload of malicious code.

Unfortunately, automated tools often cannot distinguish between features of a program and malicious code.

In the case of Linux distributions, there is at least the package maintainer who might look at the source code occasionally.

Basically the security of software repositories like PyPI boils down to the idea that somebody would notice malicious code, if enough people look at the source code. So, if I like to be one of the people occasionally looking at the source code, what should I look out for?

Reading every line of code before installing a python package is infeasible.

For a programmer (not a security researcher), what are easy checks/ best practices to identify obvious malicious code-fragments?

Some obvious things to do are:

  • grep for import and see if any module imports something it should not. In particular look for sys, os, http etc... These modules have many legit uses, but a lot of power to do unsafe things.
  • grep for eval and the like.
  • open a random file and see if it looks reasonable.
  • Pay particular attention to setup.py

What is the quickest way to have a highest chance of detecting malicious code in python scripts?

Anders
  • 64,406
  • 24
  • 178
  • 215
TheEspinosa
  • 131
  • 1
  • 4
  • 4
    Short of reviewing the full source/binary you can never know for certain. If you are concerned perhaps a better approach is running the application in a sandboxed environment that only allows access to data / APIs that it should use. – Hector Nov 08 '17 at 12:24
  • 1
    @Hector: I'd suggest posting this as an answer. If there were an automated way of examining code to ensure it didn't contain malware, someone would be selling it right now and making a metric crapton of money doing it. – baldPrussian Nov 08 '17 at 19:22

1 Answers1

1

Short of reviewing the full source / binary you can never know for certain without executing the code - at which point by the time you realize it may be too late. Sure a string manipulation library calling "import http" might be easy to grep for but there are endless ways for a malicious developer to obscurify it.

If you are concerned then the best approach is to try to sandbox the execution environment to restrict the package only to have data to access / APIs it would be expected to use.

With regards to general PyPI packages you need to perform some risk assessment when using a package. If its a major package from a well regarded developer then the risk is low and the chance of someone having detected it if something malicious was there is high. If you are one of 10 people installing the only package posted by a developer then you may want to take a closer look at it.

Hector
  • 10,893
  • 3
  • 41
  • 44