2
I have seen a number of corrupt data files recently - all from a single customer - that have what looks like garbage at the end.
The files (including the corrupted data) are an EXACT multiple of 16384 (latest was 114688 bytes long).
I feel there should be a simple explanation that would point directly to the problem. Something to do with file allocation cluster sizes and disk cacheing.
The data added to the end of the file is generally a chunk of data from earlier in the file repeated.
Back in my DOS 3 days I would have said the file was not being closed properly but this is happening to different files generated by different processes on (I think) different servers.
There may be a common factor such as a particular hard drive or server but at the moment suggesting "it's a hardware problem" would not be an acceptable answer.
OS - Not sure, could be a variety of OSes.
Process - Could be a file copying issue but if the file is copied again, the same corruption occurs.
Language - So far all tools generating the data files are written in Java.
If the files were recovered in a disc check, the end-of-file mark would not be recovered, but set instead to the end of the disc allocation, so the file would include the whole disc space, including whatever random data were after the end-of-file mark. – AFH – 2014-07-17T10:20:34.897
@AFH - Good call - but the files were copied directly off the drives where they were output to. As far as I know there is no recovery process involved. – OldCurmudgeon – 2014-07-17T11:26:10.013
You mention all tools are written in java. Are all these tools in house / written by the same developer(s) or are some of them third party tools? – Jason C – 2014-07-17T14:33:27.550
Also is the occurrence of the problem correlated with the JRE version? – Jason C – 2014-07-17T14:34:52.987
(Where I'm going is, my gut hunch is failure to properly close buffered output streams in the Java tools, or an error in another software buffering implementation. It is telling that Java is common, and that the data at the end is often from within that file rather than random old hard drive junk. All tools being developed by the same dev would be evidence for a common programming error. Same JRE would be evidence for a programming error that only manifests on certain JRE versions, hence being originally unnoticed.) – Jason C – 2014-07-17T14:38:38.890
1@JasonC - Tools exhibiting the problem were all written in-house but the main culprit was last updated 03/03/2014 and first installed 2009, the newest was installed just days ago. The problem has only shown itself in the last week. No known correlation with JVM - sorry to be vague here. – OldCurmudgeon – 2014-07-17T14:46:41.617
1@JasonC - I know for a fact that the newest tool exhibiting the problem certainly closes its FileOutputStream - I am looking at the code right now. The old tool generates hundreds of files per night and has done for years, just six files have been corrupt in this way in the past week. – OldCurmudgeon – 2014-07-17T14:49:37.843
You say the files may be generated by different servers. Are they stored on the originating servers when the problem is observed or are they all being stored in some common central location, or are they being transferred to some other location besides the originator (even if it's not central)? This is going to be a tough problem to diagnose unless we can find something common about the issue (in addition to the currently strong correlation with Java, which may be a red herring); no matter how minor / unlikely it is -- especially if we want to avoid a "hardware problem" blanket answer. – Jason C – 2014-07-17T14:59:42.380
@JasonC - I will pose some questions to their technical team. – OldCurmudgeon – 2014-07-17T15:31:48.153