For context; I have a web application that allows users to upload a PDF file from which the web app extracts certain information by parsing it. The app then sends this information to another server for further processing.
The web app is based on Python (Django & FastAPI) and runs on a Linux-based operating system inside a Docker container (which has root privileges).
The PDF file is not stored, it is received at an endpoint as a regular HTTP request with the file contained in the form data (multipart/form-data); this file is then converted to HTML and parsed (the file is never stored on the server, only handled in-memory). The resulting data are sent to another server for storage in an SQL database.
My questions are as follows:
- Is parsing the file in an interpreted language such as Python considered to be 'executing' it?
- Does handling this file in this manner pose any risk to the server if the file contains malware?