2

I'm running a Django site on a Debian 6 system, with a gunicorn server and nginx 0.7.67 handling static files. The filesystem locale is set to sv_SE.UTF-8.

I got a problem where another user uploaded a file with a filename containing unicode characters. This caused the server to throw a 404 when trying to serve the uploaded file. When I uploaded the same file from my system, the site now serves the file correctly. However, it didn't delete the old file, though they seem to be the exact same file in every regard. Below is the current directory listing.

-rwxr-xr-x 1 www-data www-data 1188260 25 jan 22.53 Läxa 15_geometri.pdf
-rwxr-xr-x 1 www-data www-data 1188260 27 jan 10.45 Läxa 15_geometri.pdf

How can there now be two identical files with the same (apparent) name? What can have been the cause of the 404 in the first place, i.e. what's wrong with the first upload? The URL is the same as before, only now it doesn't throw a 404.

Kenny Rasschaert
  • 8,925
  • 3
  • 41
  • 58
Samuel Linde
  • 51
  • 1
  • 4

2 Answers2

0

They have different encodings for filename. What is the output when you do in directory:

$ file -i *
iElectric
  • 358
  • 1
  • 5
  • 14
0

Visually the same unicode string may differ due to unicode normalization. You can check if names are different by trying to do ls > a.txt and analyzing file as binary data, byte by byte.

PS. I hope ls doesn't perform any unicode normalization itself and delivers filenames "as is"...