How do you remove a specific line/key of metadata from a pdf

0

I have read in multiple forums that the Universal PDF portion of the metadata of pdf books are malformed and cause errors when trying to read from it. How do you remove a specific key and value from the pdf and will that corrupt the data?

Heres the data

File Type                       : PDF
File Type Extension             : pdf
MIME Type                       : application/pdf
PDF Version                     : 1.6
Linearized                      : No
Universal                       : PDF
The                             : process
Code Mantra                     : Acrobat
Author                          : ModDate
LLC                             : http://www.codemantra.com
Create Date                     : 2004:08:26 09:42:01+05:30
EBX PUBLISHER                   : University of Toronto Press
Page Layout                     : SinglePage
Page Count                      : 419
Page Mode                       : UseOutlines
Has XFA                         : No
XMP Toolkit                     : 3.1-702
Code Mantra 002 C0020 LLC       : http://www.codemantra.com
Universal 0020 PDF              : The process that creates this PDF constitutes a trade secret of codeMantra, LLC and is protected by the copyright laws of the United States
Modify Date                     : 2012:09:11 15:27:50+05:30
Metadata Date                   : 2012:09:11 15:27:50+05:30
Creator Tool                    : Acrobat 5.0 Paper Capture Plug-in for Windows
Document ID                     : uuid:ccee9833-967a-4d92-b5fa-12faa7d620c4
Instance ID                     : uuid:51e5148e-3afa-45df-82b8-26d43c7e6ffc
Format                          : application/pdf
Title                           : 
Creator                         : .

Any help would be appreciated

digitaluniverse

Posted 2019-10-16T10:36:06.417

Reputation: 1

Another direct option would be to use PDFtk free, which comes with a GUI and a command line program for Windows that could edit the metadata.

– StarGeek – 2019-10-18T16:31:48.437

Answers

0

This answer assumes you want to use exiftool for this. There are probably other tools that might do the job better in the case of PDFs, especially if you want to target individual items, but not remove all of them.

First, you need to determine the tag name (see exiftool FAQ #2). The output you show lists the tag descriptions, not the tag names. Run this command to list the tags by name.
exiftool -s File.PDF

Once you have the tag names you want to remove, your command would be
exiftool -TAG= <FileOrDir>
You can clear multiple tags and list multiple files and directories in that command.

If get an error along the lines of Warning: Tag 'xxx' is not defined, then you have custom tag, which exiftool cannot individually remove. From the output you posted, this is probably the case. You can use exiftool to remove all the embedded metadata with
exiftool -All:All= <FileOrDir>

You might still have problems due to the way exiftool edits the files (see the exiftool PDF page). You may need to re-linearize the file to complete the project. That can be done with QPDF with the command
qpdf --linearize in.pdf out.pdf

StarGeek

Posted 2019-10-16T10:36:06.417

Reputation: 782