How do I verify if a conversion from docx to pdf is correct?


I can't check thousands of PDF files to see if the source DOCX is identical.

Is there a way to do this automatically?


Posted 2018-08-15T14:31:46.973


How are you defining "correct"? Do you want a 100% identical graphical representation? Or do you need a textual comparison? How are the documents exported to pdf? What options are used? What have you tried so far? – Mokubai – 2018-08-15T16:37:59.933

100% correctness is not possible because there are visual differences between two PDF files created with an Adobe printer and an Adobe DC conversion. I tried to print both files, docx and pdf, as an image and then compare these images, there were of course differences which were minimal, I am not looking for small pixel differences. I need more a check if the conversion was free of errors like lost images, letters or heavily changed formatting etc.. So bigger differences. A pdf validation says nothing about the content of the file(verapdf). – None – 2018-08-16T07:23:33.950

No answers