0
We are processing files that our clients generated on their local Windows machines which use the CP-1252 character set. Occasionally, while processing one of these files in our backend (running on CentOS), we get runtime errors (it's a Java backend, so RuntimeExceptions
). If we remote in to the server and rename the file (using UTF-8) and re-run it, the file processes perfectly fine.
Is there any way to "add" CP-1252 to CentOS's available character sets so that this stops happening?
Can you post the Java run-time exception that you receive? And call stack? Is the issue that there is a CP-1252 character in the file name that is being processed by a Java program? – HeatfanJohn – 2012-08-20T21:40:35.627
@HeatfanJohn - I will need a few hours before I can get access to the appropriate logs to get the exact stacktrace, but yes, you nailed it. It happens when there is a CP-1252 character in the file name and the system chokes. Simply SSHing in to the server, renaming it and re-processing the file fixes it, but is a sub-optimal (manual!) solution. – pnongrata – 2012-08-20T21:42:48.980
Do you have any control over the code that creates that file that is processed by your Java back-end or over the source code to the Java application that processes the file? – HeatfanJohn – 2012-08-20T21:56:54.963
Only the backend but not the (client-side) file generator. But the Java backend is 100% under our control. – pnongrata – 2012-08-20T22:01:25.593
How come you can't fix the Java program to read the data as bytes and then pass it through a decoder? – Ignacio Vazquez-Abrams – 2012-08-21T05:20:11.510