2

I'm curious to know if there are any significant threats presented by files which are uploaded and read but are never saved to disk? I've read countless articles about file uploads regarding storage and retrieval of the file itself but I haven't found any information on simply keeping the file in memory. My specific situation is as follows:

I have an HTTP endpoint which will accept CSV files from users. For a file to be accepted, it must be in a predetermined format and encoding. The file bytes will be read and converted to a CSV OOP model, which will then verify valid characters and persist the values (not the CSV file) to a database via parameterized queries (I realize that this is a form of persistence to disk but in this scenario, the data would be sanitized/verified by this point). Any invalid character (markup, code, invalid whitespace, etc) in the byte stream will cancel the entire process and return an error. Any invalid/unknown CSV field will cause the same cancellation/error.

It seems that most of the risks with file uploads are associated with the file's permissions on disk, surreptitious contents intended for execution, and the like. All of which involve the file being persisted as-is (or approximately so) and then handled by the OS/runtime in some exploitable way.

If a file is only read into memory from the network and then garbage collected, what vectors might an attacker exploit?

UPDATE 2109-03-07: This question states (erroneously on my part) that the parsing would occur before checking characters against whitelist. In the actual scenario, characters are checked before parsing. Leaving the original scenario as-is to maintain the context addressed by Euphrasius.

user2864874
  • 223
  • 1
  • 2
  • 4

2 Answers2

1

A dangerous code present in a file can be ran no matter where the file resides. But if the file is non-exe/script one (like a text) then there must be something else infiltrated to be able to execute the content.

Advanced attacks can store the file pretty much anywhere, run code from there and then remove the initial file. Doing that in-memory leaves a lot less traces compared to saving it to a drive.

If the file is in any type of executable format, then things are easier since nothing else is required to activate the content.

If you do protect yourself against CSV injections (note; make sure not to have Equals to (“=”), Plus (“+”), Minus (“-“), At (“@”) at any cell beginning) you should be fine from that side but that would not prevent someone from putting code in a .CSV that actually looks harmless but if correctly assembled makes up a script ran later. So you should also make sure there's no possibility for something to be in already and able to interpret / run further data. If there is no way for anyone to access anything except doing that CSV upload, then this should not be a vulnerability.

Overmind
  • 8,779
  • 3
  • 19
  • 28
1

If parsing and storing fragments in the database is the only thing you do, I can think of the following attack scenarios:

  • Attacking the parser: An attacker could use unexpected ascii/unicode-chars or bytes and try to derail your flow of execution during the parsing process.

  • Attacking the DBMS with file content: An attacker could store values in the csv, that might, once parsed, cause SQL-injection upon insertion to the database.

Howeever, defense should be quite similar in both cases:

Input Validation: The received file has to be considered as malicious input. Your parser has to validate strictly while failing softly if something unexpected shows up in the CSV. Implementing a whitelist approach of allowed chars and restricting to an absolute minimum should be a good idea.

  • I see what you mean by attacking the parser but I'm not sure I understand the database file content scenario. In my case, the files will be small enough that I could inspect each byte before even sending to the parser. In addition, I don't plan to store any files themselves in the database (I realize I didn't make this clear in my question - will update now). I'll only be storing the values from the CSV after whitelisting/parsing. Would this harden against the database file issue you mentioned or does that involve something else? – user2864874 Mar 08 '19 at 03:21
  • @user2864874 I've updated my response to further clarify my points. – Euphrasius von der Hummelwiese Mar 08 '19 at 06:12