10

This is related to this Stack Overflow post:

glob() can't find file names with multibyte characters on Windows?

I'm having issues with PHP and files that have multibyte characters on Windows. Here's my test case:

print_r(scandir('./uploads/')); 
print_r(glob('./uploads/*'));

Correct Output on remote UNIX server:

Array
(
    [0] => .
    [1] => ..
    [2] => filename-äöü.jpg
    [3] => filename.jpg
    [4] => test이test.jpg
    [5] => имя файла.jpg
    [6] => פילענאַמע.jpg
    [7] => 文件名.jpg
)
Array
(
    [0] => ./uploads/filename-äöü.jpg
    [1] => ./uploads/filename.jpg
    [2] => ./uploads/test이test.jpg
    [3] => ./uploads/имя файла.jpg
    [4] => ./uploads/פילענאַמע.jpg
    [5] => ./uploads/文件名.jpg
)

Incorrect Output locally on Windows:

Array
(
    [0] => .
    [1] => ..
    [2] => ??? ?????.jpg
    [3] => ???.jpg
    [4] => ?????????.jpg
    [5] => filename-äöü.jpg
    [6] => filename.jpg
    [7] => test?test.jpg
)
Array
(
    [0] => ./uploads/filename-äöü.jpg
    [1] => ./uploads/filename.jpg
)

Here's a relevant excerpt from the answer I chose to accept (which actually is a quote from an article that was posted online over 2 years ago):

From the comments on this article: http://www.rooftopsolutions.nl/blog/filesystem-encoding-and-php

The output from your PHP installation on Windows is easy to explain : you installed the wrong version of PHP, and used a version not compiled to use the Unicode version of the Win32 API. For this reason, the filesystem calls used by PHP will use the legacy "ANSI" API and so the C/C++ libraries linked with this version of PHP will first try to convert yout UTF-8-encoded PHP string into the local "ANSI" codepage selected in the running environment (see the CHCP command before starting PHP from a command line window)

Your version of Windows is MOST PROBABLY NOT responsible of this weird thing. Actually, this is YOUR version of PHP which is not compiled correctly, and that uses the legacy ANSI version of the Win32 API (for compatibility with the legacy 16-bit versions of Windows 95/98 whose filesystem support in the kernel actually had no direct support for Unicode, but used an internal conversion layer to convert Unicode to the local ANSI codepage before using the actual ANSI version of the API).

Recompile PHP using the compiler option to use the UNICODE version of the Win32 API (which should be the default today, and anyway always the default for PHP installed on a server that will NEVER be Windows 95 or Windows 98...)

I can't confirm whether this is my problem or not. I used phpinfo() and did not find anything interesting, but I wasn't sure what to look for. I've been using XAMPP for easy installations, so I'm really not sure exactly how it was installed.

I'm using Windows 7, 64 bit - so forgive my ignorance, but I'm not even sure if "Win32" is relevant here. How can I check if my current version of PHP was compiled with the configuration mentioned above?

  • PHP Version: 5.3.8
  • System: Windows NT WES-PC 6.1 build 7601 (Windows 7 Home Premium Edition Service Pack 1) i586
  • Build Date: Aug 23 2011 11:47:20
  • Compiler: MSVC9 (Visual C++ 2008)
  • Architecture: x86
  • Configure Command: cscript /nologo configure.js "--enable-snapshot-build" "--disable-isapi" "--enable-debug-pack" "--disable-isapi" "--without-mssql" "--without-pdo-mssql" "--without-pi3web" "--with-pdo-oci=D:\php-sdk\oracle\instantclient10\sdk,shared" "--with-oci8=D:\php-sdk\oracle\instantclient10\sdk,shared" "--with-oci8-11g=D:\php-sdk\oracle\instantclient11\sdk,shared" "--enable-object-out-dir=../obj/" "--enable-com-dotnet" "--with-mcrypt=static" "--disable-static-analyze"

In case it's relevant or reveals any useful information, here's a screen shot of my phpinfo() (mbstring section):

phpinfo screen shot

How can I find out if my PHP install was "compiled with the UNICODE version of the Win32 API"? (and does that actually make any sense?)

Wesley Murch
  • 155
  • 2
  • 12
  • 5
    Upvoted because Wesleys have to watch out for eachother. – Wesley Mar 31 '12 at 02:53
  • Have you done anything in your script with regards to encoding? I had the opposite of this problem with my win7-64 install! Php would read the umlats & all that & the crap legacy program I was communicating with breaks when it gets those. – Krista K Jun 10 '12 at 23:46
  • Sorry to bail on this question, I just didn't get the quick and dirty working answer I was hoping for, and eventually stopped developing this project on Windows. I'll be installing PHP 5.4 soon locally (on windows) so the question may no longer be valuable to me, if anyone wants to suggest an accepted answer I'm all ears. In the meantime, upvotes and thanks all around. – Wesley Murch Aug 08 '12 at 05:06

4 Answers4

3

I think you should download an oficial binary from PHP Windows repository and install it (take note of the installation path).

After that you will need to configure apache to use the new binary instead of the one it carried by default. It is simple:

  • Find your httpd.conf file in the WAMP folder (something like C:\wamp\bin\apache\ApacheXXX\conf\httpd.conf) - it may be also possible to go through trayicon.

  • Ok, now that you found it locate a string matching LoadModule php5_module

  • Good, just replace this line with your new php5_module which is probaly in c:/php/php5apache2_2.dll (you saved the installation path!). Resulting in something like LoadModule php5_module "c:/php/php5apache2_2.dll"

Voila. Reset wamp server and test your application with the lastest version of php build specially for windows.

I'm not sure this will solve your problem but surely is a real way to go. If you have problems on the php setup, read this article.

Good luck!

2

Here is some code I worked on to handle a mbstring problem I was running into. I ended up iterating through every combination of encodings and options until one of them presented the output I needed. I have the feeling this kind of procedure might help you find the answer you're seeking.

Do not rely upon documentation, as in my case, the results were not what I thought the options and encodings would do. I recall in my testing, I would get the rectangles, ?s, and things like A~. My testing was exactly as yours, print_r the info. In my case, my script is importing customer and sales info into Quickbooks, which cannot handle UTF-8. (Either QB itself can't or the QODBC Driver can't) Tildes, graves, and umlats are out of the question.

setlocale(LC_CTYPE, 'en_US.UTF-8');
$xmlstr=file_get_contents($file);           
// convert character encoding to get rid of accents, etc
// see http://www.php.net/manual/en/function.mb-detect-encoding.php#89915
// note that unlike ASCII//TRANSLIT and ASCII//TRANSLIT//IGNORE do not work
// in windows 7.
$xmlstr=iconv('UTF-8', 'ASCII//IGNORE', $xmlstr);   

That link above is http://www.php.net/manual/en/function.mb-detect-encoding.php#89915 and if Google finds you here, definitely go read that.

Krista K
  • 519
  • 7
  • 20
2

It seems as though this question has been out there for a while and whether or not php was compiled with unicode flags does not affect it's unicode support, but if you need to determine whether a given PE image was likely compiled against the Unicode version of the Windows API, you can use dumpbin to examine the kernel32.dll imports used. This is not exactly something I would do pragmatically, but in a pinch, could work for diagnostics.

For example, a Unicode executable could list:

               4C CreateFileMappingW
               45 CreateDirectoryW
               33 CompareStringW
              12E GetCurrentDirectoryW
               AF ExpandEnvironmentStringsW
              2F0 SetFileAttributesW

noting the number of functions ending in W, aka Wide for unicode characters.

For a ANSI executable or DLL, you may see something closer to:

              30A SetCurrentDirectoryA
              15E GetFileAttributesA
              171 GetLastError
               4B CreateDirectoryA
              319 SetFileAttributesA

with most of the functions ending in A, we can see the executable was most likely compiled with ANSI flags.

Mitch
  • 2,343
  • 14
  • 22
1

I believe you'll want to check to see if PHP was compiled with mbstring (or has the mbstring module installed and enabled if you're using modules). Having that extension enabled should solve your issues. This page should tell you everything you need to know to get it working.

Aaron
  • 702
  • 2
  • 10
  • 19
  • Thanks for the suggestion, but I believe mbstring is installed correctly. I added a little info regarding this to the end of my post. I'm more interested in learning about the comments I cited from the article *"YOUR version of PHP which is not compiled correctly, and that uses the legacy ANSI version of the Win32 API"*, how to find out if this is the case, and whether or not this is relevant. – Wesley Murch May 01 '12 at 19:27
  • I don't think unicode support in PHP has much to do with unicode support in the API that PHP uses to do its business. I suspect the latter is the issue rather than the former. (Sorry that I don't have an answer to the problem though; I'm disgusted by how completely awful PHP is after trying sane languages so I don't have as much experience with it). – gparent May 01 '12 at 19:38