This is related to this Stack Overflow post:
glob() can't find file names with multibyte characters on Windows?
I'm having issues with PHP and files that have multibyte characters on Windows. Here's my test case:
print_r(scandir('./uploads/'));
print_r(glob('./uploads/*'));
Correct Output on remote UNIX server:
Array
(
[0] => .
[1] => ..
[2] => filename-äöü.jpg
[3] => filename.jpg
[4] => test이test.jpg
[5] => имя файла.jpg
[6] => פילענאַמע.jpg
[7] => 文件名.jpg
)
Array
(
[0] => ./uploads/filename-äöü.jpg
[1] => ./uploads/filename.jpg
[2] => ./uploads/test이test.jpg
[3] => ./uploads/имя файла.jpg
[4] => ./uploads/פילענאַמע.jpg
[5] => ./uploads/文件名.jpg
)
Incorrect Output locally on Windows:
Array
(
[0] => .
[1] => ..
[2] => ??? ?????.jpg
[3] => ???.jpg
[4] => ?????????.jpg
[5] => filename-äöü.jpg
[6] => filename.jpg
[7] => test?test.jpg
)
Array
(
[0] => ./uploads/filename-äöü.jpg
[1] => ./uploads/filename.jpg
)
Here's a relevant excerpt from the answer I chose to accept (which actually is a quote from an article that was posted online over 2 years ago):
From the comments on this article: http://www.rooftopsolutions.nl/blog/filesystem-encoding-and-php
The output from your PHP installation on Windows is easy to explain : you installed the wrong version of PHP, and used a version not compiled to use the Unicode version of the Win32 API. For this reason, the filesystem calls used by PHP will use the legacy "ANSI" API and so the C/C++ libraries linked with this version of PHP will first try to convert yout UTF-8-encoded PHP string into the local "ANSI" codepage selected in the running environment (see the CHCP command before starting PHP from a command line window)
Your version of Windows is MOST PROBABLY NOT responsible of this weird thing. Actually, this is YOUR version of PHP which is not compiled correctly, and that uses the legacy ANSI version of the Win32 API (for compatibility with the legacy 16-bit versions of Windows 95/98 whose filesystem support in the kernel actually had no direct support for Unicode, but used an internal conversion layer to convert Unicode to the local ANSI codepage before using the actual ANSI version of the API).
Recompile PHP using the compiler option to use the UNICODE version of the Win32 API (which should be the default today, and anyway always the default for PHP installed on a server that will NEVER be Windows 95 or Windows 98...)
I can't confirm whether this is my problem or not. I used phpinfo()
and did not find anything interesting, but I wasn't sure what to look for. I've been using XAMPP for easy installations, so I'm really not sure exactly how it was installed.
I'm using Windows 7, 64 bit - so forgive my ignorance, but I'm not even sure if "Win32" is relevant here. How can I check if my current version of PHP was compiled with the configuration mentioned above?
- PHP Version: 5.3.8
- System: Windows NT WES-PC 6.1 build 7601 (Windows 7 Home Premium Edition Service Pack 1) i586
- Build Date: Aug 23 2011 11:47:20
- Compiler: MSVC9 (Visual C++ 2008)
- Architecture: x86
- Configure Command:
cscript /nologo configure.js "--enable-snapshot-build" "--disable-isapi" "--enable-debug-pack" "--disable-isapi" "--without-mssql" "--without-pdo-mssql" "--without-pi3web" "--with-pdo-oci=D:\php-sdk\oracle\instantclient10\sdk,shared" "--with-oci8=D:\php-sdk\oracle\instantclient10\sdk,shared" "--with-oci8-11g=D:\php-sdk\oracle\instantclient11\sdk,shared" "--enable-object-out-dir=../obj/" "--enable-com-dotnet" "--with-mcrypt=static" "--disable-static-analyze"
In case it's relevant or reveals any useful information, here's a screen shot of my phpinfo()
(mbstring section):
How can I find out if my PHP install was "compiled with the UNICODE version of the Win32 API"? (and does that actually make any sense?)