22

Before uploading a photo or image to a forum, I may typically strip the metadata to remove identifying material with exiftool. The thing is, the Linux file system itself seems to leave some metadata on a file:

cardamom@pluto ~ $ ls -la
insgesamt 1156736
drwx------ 145 cardamom cardamom      20480 Mär 16 08:58  .
drwxr-xr-x   9 root  root        4096 Apr 21  2021  ..
-rw-r--r--   1 cardamom cardamom     123624 Mai 24  2018  IMG_20200627_215609.jpg

So I feel tempted to change the user and group of a file as well. Is that a good idea? There is always a user called nobody and a group called nogroup who look like they were almost made for the purpose.

Is that everything or is there more metadata that Linux is leaving on its files?

Andy Lester
  • 339
  • 2
  • 6
cardamom
  • 359
  • 2
  • 9
  • 39
    "There is always a user called `nobody` and a group called `nogroup` who look like they were almost made for the purpose.." – They are made for the *exact opposite purpose*. *No* file or directory should be owned by them. The purpose of them is that you can run a server or program under those IDs and be 100% sure that the server cannot access anything in the filesystem because there is nothing owned by it. (More precisely, that it can only access anything in the filesystem which is already world-readable). – Jörg W Mittag Mar 17 '22 at 06:46
  • systemd-homed stores files on disk as nobody:nogroup and maps them to your user id when you login using namespaces. So it's not that you shouldn't store files owned by nobody. You can't depend on user IDs alone for that sort of protection. – Ananth Mar 18 '22 at 02:45
  • 1
    @Ananth: Wow, if that's true it's a huge vuln - your files would be accessible to other users and to compromised daemons running as nobody. Do you have a citation for the claim? – R.. GitHub STOP HELPING ICE Mar 18 '22 at 14:54
  • Ok @JörgWMittag I won't then bother with `sudo chown nobody:nogroup IMG_20200627_215609.jpg` before it uploading it as 'cardamom' will not be uploaded with the photo anyway by the sounds of it – cardamom Mar 18 '22 at 20:25
  • 1
    @R..GitHubSTOPHELPINGICE the files are stored in an encrypted LUKS container first, and projected as the logged in user using user namespaces. My larger point is that user namespaces means that its quite safe to have nobody:nogroup own files as long as your architecture is sound. – Ananth Mar 19 '22 at 15:52
  • You can also look at it this way: there can be metadata stored _inside_ the file data itself and metadata stored externally to the file data. For example, the _name_ of your file itself contains metadata, namely the date and time when the photo was taken. `exiftool` (generally) only manipulates the contents of the file, not the name, owner, timestamp, permissions, size, etc of the file (the standard unix metadata). Whether any external metadata is transferred when the file is uploaded is a property of the file transfer program and its various constraints. – jrw32982 Mar 21 '22 at 21:06

2 Answers2

64

the linux file system itself seems to leave some metadata on a file

User, group etc are meta data stored in the file system. They are not part of the file and thus will not be included when uploading the file in the browser.

This can be different in other data transfer method though. When copying or moving files between local file systems or remote file systems (NFS, SMB, ...), information like user, group and permissions might be transferred. They might also be included when storing the file in archives: some formats like Tar or Cpio include permissions and user and group id or even names.

Steffen Ullrich
  • 184,332
  • 29
  • 363
  • 424
  • 4
    Unless you use a upload protocol which can transfer this meta-data (like rsync, sftp, SMB, WebDAV. AFP or nfs to some extend). I think JS/WebBrowser does not have that ability. – eckes Mar 17 '22 at 07:25
  • 7
    @eckes: I think what you describe is not commonly referred to as "uploading" but as a transfer of data between file systems (i.e. copy, move, ...). In this case meta information at the file system level like ownership and permissions might be transferred and this might even be the expected behavior. – Steffen Ullrich Mar 17 '22 at 07:31
  • 1
    I know many people uploading files to their web server with sftp, but sure if you don’t use such protocols you don’t need to worry about them. – eckes Mar 17 '22 at 07:32
  • 2
    @eckes: I've extended the answer to make more clear where these information get included and where not. – Steffen Ullrich Mar 17 '22 at 09:08
  • 2
    Tar and cpio both store uid/gid information. user and group names are not stored in the archive. – doneal24 Mar 17 '22 at 15:02
  • 1
    @doneal24: thanks, you are right. And this leads to confusion if a tar file is created on one system and extracted on another with different mappings between uid and name. I've corrected the answer. – Steffen Ullrich Mar 17 '22 at 15:36
  • 6
    Tar actually **does** store user name in the archive. You need to use the flag `--numeric-owner` if you want it to store just the UID/GID. Run `mkdir foo && tar c foo | hexdump -C` and you'll see your user and group names stored in the archive as ASCII. Tested on GNU tar 1.30. – forest Mar 17 '22 at 22:37
  • The basic/portable/standard zip header does not include uid/gid, but an optional 'extra' field does, and the (widely used) infozip implementation on Unix writes it unless `-X/--no-extra`. @forest: as correctly explained in the wikipedia link, original/basic tar stores numeric uid/gid; the now-almost-universal POSIX version 'UStar' adds names, which gtar --numeric-owner suppresses. – dave_thompson_085 Mar 18 '22 at 01:42
  • @dave_thompson_085: thanks for the details, I've corrected the answer. – Steffen Ullrich Mar 18 '22 at 05:22
  • 1
    @forest I stand corrected. The original `tar` format did not contain usernames. The newer `UStar`, used by most tars now, does have the username. – doneal24 Mar 18 '22 at 14:58
  • Ok, so when uploading _just_ the photo from a **browser**, don't worry about cleansing group and user, just rename to something generic and purge the exif data with exiftool. Only worry if including it in an archive (which is more when emailing to someone trusted, not uploading, so less relevant). Possibly group and user 'metadata' go with when using some kind of terminal uploader like `scp` or `curl --upload-file` but that is really not the use case I meant, just meant with a browser and a mouse up to some kind of website. – cardamom Mar 18 '22 at 20:34
  • @cardamom: file upload with curl will not send user information and permissions either, it's the same as in the browser. – Steffen Ullrich Mar 19 '22 at 05:58
14

An important piece of metadata you seem to forget is the file name: it is accessible to the JS in the browser, and from a name like IMG_20200627_215609.jpg one can deduce when the photo was taken even if you remove the EXIF.

If you don't trust the website you're uploading the photos to, you should consider renaming your files to something like image1.jpg before uploading.

Dmitry Grigoryev
  • 10,072
  • 1
  • 26
  • 56
  • 1
    It's also accessible directly on the server once the file is uploaded, not just in javascript. In fact, the protocol supports giving the full path to the file, but modern browsers provide a fake path instead, which is always the same – TRiG Mar 21 '22 at 16:37