19

I'm learning about basic x86 overflows in C but normally I code with Python. Is there anyway that programs written in higher level languages can suffer from buffer/heap overflow?

Maarten Bodewes
  • 4,562
  • 15
  • 29
blank
  • 225
  • 1
  • 3
  • Is this question specific to Python? Every language implements its own behaviors and syntactic sugar. This specific [answer](https://security.stackexchange.com/a/261173/64787) is not a problem for PHP nor JS, and will simply behave as desired. – MonkeyZeus Apr 14 '22 at 17:00
  • @CaffeineAddiction That's an answer, but if you posted it I would have voted it down, because a memory leak or simply allocating too much memory is not the same thing as accessing memory of bounds. That's what is the general meaning of a heap / buffer overflow. – Maarten Bodewes Apr 14 '22 at 22:01

6 Answers6

39

Overflows don't occur in a language, they occur in a process. Specifically, a "buffer overflow" occurs when memory is allocated on the stack and the program writes outside that memory and into following memory.

Even on a language like Python or C#, such things could happen in theory. However, the runtimes those languages are based on will ensure that most of these scenarios don't happen. Consider the following python code:

cars = ["Ford", "Volvo", "BMW"]
cars[3] = "Mazda"

This will print the following error:

Traceback (most recent call last):
  File "main.py", line 2, in <module>
    cars[3] = "Mazda"
IndexError: list assignment index out of range

So instead of just overwriting some memory, the runtime caught that cars only had three elements and writing to a fourth element is therefore forbidden.

That seems like overflows are impossible, right? Well, not exactly. The python runtime itself is just a process and thus susceptible to all kinds of vulnerabilities, including buffer overflows.

For example, CVE-2021-3177 has been found last year and has the following summary:

Python 3.x through 3.9.1 has a buffer overflow in PyCArg_repr in _ctypes/callproc.c, which may lead to remote code execution in certain Python applications that accept floating-point numbers as untrusted input, as demonstrated by a 1e300 argument to c_double.from_param. This occurs because sprintf is used unsafely.

Now, how to interpret this is a matter of semantics. One could say Python is vulnerable to overflows, because you could write a python program that causes a buffer overflow. Or you could say it's not vulnerable to overflows, because the overflow itself actually occurred in a C program, which just happens to be interpreting python code.


The short answer is this:

High-level languages generally guard against such vulnerabilities, but the underlying runtimes of these programs are still vulnerable.

  • 6
    TL;DR usually appears that the *start* of the answer; most people are scanning long answers looking for a summary at the end. – chepner Apr 14 '22 at 12:00
  • 5
    Your TL;DR defines "buffer overflow" too narrowly. Not all buffer overruns have to be stack-smashing attacks, overrunning into global variables or heap allocations is also possible and while less useful for remote code execution, they may create other vulnerabilities. – Ben Voigt Apr 14 '22 at 18:28
  • It's worth understanding that python is a Turing complete language. As such you can use python to simulate an entire computer and simulate c running on it and so buffer overflow. The Python language protects you from buffer overflow when you use it's abstractions that are designed to prevent that. There will always be a way to reach past those abstractions and shoot yourself in the foot. Mostly by creating your own abstractions. – candied_orange Apr 14 '22 at 20:28
  • 3
    I think that there are some issues with the terminology in the answer. Not sure if you can call an implementation of a language the same thing as a language. Buffer overflows can happen on a machine, e.g. in the kernel, before a higher level concept such as a "process" is defined. Similarly, I would not say that a library implemented in C is a "program". Of course I do agree with the general gist, so enough for an upvote :) – Maarten Bodewes Apr 14 '22 at 22:05
  • @MaartenBodewes Ultimately, the question boils down to semantics. But I think the answer as is gives a "good enough" insight to be helpful. –  Apr 14 '22 at 23:44
  • The way I navigate the semantics is to say that the relevant difference here between Python and C is that if there is a buffer overflow in your Python program, then it is down to a bug in Python, or in some library like numpy that includes C code or whatever. Whereas in C, a bug *in your own code* can cause buffer overflow. So there is a meaningful sense in which C offers the programmer the opportunity to overflow buffers wheras "Python as designed" does not. A particular bug in CPython, however, might mean that Python as implemented in fact does. – Steve Jessop Apr 16 '22 at 12:49
  • And of course you can always write a Python program that mishandles an `IndexError` and does something bone-headed, which you might reasonably describe as a "buffer overflow bug" even though in the low-level, C or operating system sense, no buffer overflow actually occurred. And as Peter Green points out, `cytpes` is part of Python and also offers the programmer this C-style facility of overrunning buffers any time you like. So Python is not so closed by design as I described it, it's just that if you're using `ctypes`, the clue is in the name that this isn't typical Python code any more. – Steve Jessop Apr 16 '22 at 12:51
  • There are (at least) two other examples where highier-level languages do not necessarily protect one against overflows. If you implement circular queue in Python and handle overflow incorrectly you may overwrite data thus having a buffer overflow error. You won't necessarily overwrite random memory (this Python prevents) but it still may be used by attacker depending on circumstances. Second is if you reused I/O buffer between users. For example if you try to save on reallocating lists and reuse the same buffer over again it may contain data from other users, that may be exposed... (TBC) – Maciej Piechotka Apr 16 '22 at 23:24
  • in a heartbleed fashion. Again - you cannot access random memory but you can access the array in an unexpected way. – Maciej Piechotka Apr 16 '22 at 23:25
14

"level" of a programming languages is not a particularly well-defined concept.

C++ for example would generally be regarded as a higher-level language then C but it still leaves the user open to the same memory safety problems, including buffer overflows, that C suffers from.

Python on the other hand does try to protect it's users from such mistakes, the regular python programmer never sees a raw pointer, they only see reference counted object references, the standard python collection objects are protected against overflows.

Still, there are ways to create buffer overflows in python. In particular there is a module in the standard library called "ctypes". The intended use of these functions are to allow interoperability with C code, but in order to do so they must provide mechanisms to work with C style raw pointers which cannot have bounds checking applied.

For example I was able to produce a segmentation fault with the following python code.

from ctypes import *
pointer(c_char(b'a'))[10000000]

That is a contrived example but because python is a relatively slow language, most python code calls into code written in other languages, most commonly C and C++ to do the "heavy lifting". Buffer overflows can happen either in the C and C++ libraries themselves or in the glue code (which may be written in either C or python) that interfaces between python and C.

In an extreme case a hastily written glue code could even return something like a ctypes pointer object to the end user's python code.

Peter Green
  • 4,918
  • 1
  • 21
  • 26
  • 4
    Even then, C used to be considered a High-Level language as opposed to writing code in assembler directly (which was not uncommon back then when C was developed). –  Apr 14 '22 at 12:12
2

tl;dr: (most) high-level languages specifically protect you from that, but a very rare bug could make those protections fail.

Long version:

High-level languages are (generally) designed as not to allow intentional or unintentional buffer/heap overflows (among other things that could represent vulnerabilities or could lead a programmer to introduce a bug). But in the end there is always some low-level language involved, so the only thing protecting you is the high-level language's design.

This is because in the end these high-level languages will end up compiling, transpiling or translating your code to a lower-level language that does allow buffer/heap overflows, and/or your code (as is or compiled, transpiled or translated to another high-level language that also doesn't allow buffer/heap overflows) will end up running in another program (such as the JVM for Java) that is written in a language that does allow buffer/heap overflows (or at some point something is running at a low enough level to allow this).

We typically use very well-tested tools for 99.9% of what we do with 99.9% of high-level languages, but nobody can't guarantee that there is not a 0-day vulnerability in one of these tools that could allow you or a malicious actor to create a buffer/heap overflow, against the language's specific design and intent. This risk increases if you use less-tested tools (for whatever reason, though in my experience this is VERY uncommon), but it shouldn't be significant.

Note: I wrote this as if tools like the JVM or the Java compiler were a single tool that's part of the Java language (and the analogous for other high-level languages). This is technically not true, there are many distributions of these tools, and neither of them is a part of the language itself, but they are a part of the language ecosystem and they are (almost always) needed for the language to be used and useful. A vulnerability in the Java compiler and/or the JVM could lead to a buffer/heap overflow and technically it wouldn't be in the Java language itself, but I think you were asking in general, not caring about the specific distinction between the language and the tools that make it useful.

Blueriver
  • 121
  • 4
2

The property you are looking for is called "memory safety", meaning that all memory access is well-typed and within bounds.

Most high-level programming languages are specified to provide memory safety. Failure to live up to this promise would be a bug in the language implementation.

Obviously this guarantee does not extend to code written in unsafe languages, even if you load and invoke that code from a memory-safe language.

Also, note that "well-typed" is defined with respect to the type system of the programming language. If you have constraints beyond that, you must check those yourself. For instance, if you were to write a class "heap" backed by one huge array where programmers can store "logical objects", the runtime would check accesses with respect to the bounds of that array, not the bounds of the "logical object" therein.

meriton
  • 1,449
  • 1
  • 10
  • 13
0

For C and C++, for faster code, most release builds do not include bounds checking on indexes or pointers (debug builds will check for this), making it possible to access just about anything within a process address space.

Back in the 1970s, there was a version of APL called APL/SV (shared variables) that ran in supervisor mode on an IBM mainframe. There were several ways to glitch a user's work area so that the data for one array included the control information for a second array, allowing the second array size to be set to the entire address space of the mainframe, allowing for read and write to any location on the mainframe. When not being abused, APL/SV allowed variables to be shared between users (similar to shared memory between processes on modern systems), which allowed for multiplayer games like Battleship or some type of space war game.

rcgldr
  • 101
  • 4
  • Not sure if APL can be categorized as a "modern language" (which is what the question is asking about) – belkarx Apr 15 '22 at 07:11
  • @belkarx - Dyalog still releases versions of APL, but my post was specific to APL/SV which isn't used anymore. Are there any languages other than C | C++ that turn off bounds checking on release builds for performance reasons (possibly Fortran?). – rcgldr Apr 15 '22 at 08:02
-1

I mean, yeah, yes they can overflow, I guess. It rarely happens, though. One needs to work his/her behind quite a lot to achieve such runtime errors, compared to what would be needed to cause such disruption in C. Language could protect you from those, but you could always find a sufficiently non-obvious way to cause infinite recursion or something similar that would cause the whole language to overflow. Moral of this is, no language is completely safe from overflows: Python, C#, Java (including its derivative script version) Django... all of those are pretty safe, but not 100% of the time.

user276955
  • 17
  • 6
  • 2
    I feel like this answer would benefit from an example. Saying "It's possible" isn't nearly as good as showing *how* it's possible. –  Apr 15 '22 at 00:10
  • @MechMK1 Good suggestion, might include it, thanks for feedfack. – user276955 Apr 15 '22 at 00:23