How to change internal page numbers in the meta data of a PDF?

42

31

I have a pdf document I created through non-Acrobat means (printing to pdf, then merging a bunch of pdfs), but I'd like to manually change the page numbers (i.e. the first several pages are simply title pages, the page that is labeled "page 1" is really the 7th sheet of the pdf). What's the simplest (and ideally, free) way to do this?

To be clear, I am not trying to change the numbers on the pages themselves, but the page numbers in the "metadata" that the pdf stores (the pages themselves are already numbered correctly; I just want "go to page 1" to go to the page labeled 1, which could be sheet 7).

For what it's worth, I'm on Windows, though I have access to Macs as well.

YGA

Posted 2011-01-13T03:31:03.573

Reputation: 1 489

I'm not sure if I understand your description+requirement fully. Can you provide a link to a sample PDF you want to modify? – Kurt Pfeifle – 2011-01-14T14:17:42.537

is there a command line tool to do that, e.g. on a big pdf file without actually opening the txt file? – jj_p – 2013-09-20T13:50:02.830

like e.g. pdftk? – jj_p – 2013-09-23T07:01:32.637

Answers

44

What you want is indeed called page labels and can easily be added directly in the PDF's source code. Rename the file extension from pdf to txt and open the file in a text editor (this can be slow, depending on the file size, be patient). The information about page labels is stored in a node called the document catalog which looks something like this:

3 0 obj
<< /Type /Catalog
   /Pages 1 0 R
>>
endobj

It may contain more confusing stuff, but this is the basic structure. There is only one catalog, so in a large file you can search for the node that contains /Catalog. Now you can make your desired changes by inserting the /PageLabels entry:

3 0 obj
<< /Type /Catalog
   /Pages 1 0 R
   /PageLabels << /Nums [ 0 << /P (cover) >>
                          % labels 1st page with the string "cover"
                          1 << /S /r >>
                          % numbers pages 2-6 in small roman numerals
                          6 << /S /D >>
                          % numbers pages 7-x in decimal arabic numerals
                        ]
               >>
>>
endobj

There are 3 lines starting with numbers, called page indices. Page 1 has the index 0, page 2 the index 1 and so forth. They always describe ranges, so the line with 1 <<...>> applies to all pages from index 1 to 5 and the line with 6 <<...>> applies to all pages from 6 up to the last page. A label for 0 <<...>> must always be defined.

You can find more information about page labels and PDF source code in the PDF standard or in a wiki on PDF standards.

Dane Jacob Hampton

Posted 2011-01-13T03:31:03.573

Reputation: 543

1

Great information. Here is a link to another useful source: Specifying consistent page numbering for PDF documents from the W3C.

– Adam Mackler – 2015-03-18T14:52:33.327

2Are you sure it works just like this? From looking at the raw content of some PDF files it seemed like some index numbers that point to positions in the file after the catalog would have to be updated if the length of the preceding content changes.. – O. R. Mapper – 2015-10-14T23:02:00.573

1Don't use Sublime Text! It breaks the document even if you don't make any changes. Using Notepad++ worked for me. – Draex_ – 2016-12-01T14:26:45.873

A tip to those who use Notepad++: start it with -noPlugin to be sure that plugins like edtorconfig won't break anything on save. – thorn – 2017-10-11T08:42:09.450

@O.R.Mapper is right. This method will break the pdf's xref "Cross-Reference Table", which should contain byte positions for each pdf "object" -using the terminology from the pdf standard. The second problem is if your text editor mixes Unix and Windows line endings, it can break some readers, eg. the verapdf PDF/A validator. To fix the xref issue, if you work with ghostscript, doing something with gs on the pdf rebuilt the xref in my case. Supposedly opening it in Acrobat might also rebuild the xref.

– n611x007 – 2018-05-15T23:46:05.760

4Marvellous! This is the only place on the web I have found such direct and useful information. We don't all have Acrobat Reader, after all. – Noldorin – 2012-07-22T00:23:12.087

User, take care to remember adding /PageLabels << /Nums [, ] and also the >> at the end! I've tricked myself at least twice! :) Forever gratitude for the answer. – n611x007 – 2013-06-23T17:16:40.150

3With example /St 8 or /St 2, you set a start point for the displayed label; but choose any number in place of 8 (or 2), which must be >= 1. For example, 1 << /S /r /St 12 >> will number pages from (actually) 2-6 as (displayed) xii-xvii - because '12' corresponds 'xii'. – n611x007 – 2013-06-23T17:30:22.947

1thanks for the answer, but in my experience this method sometimes works and sometimes doesn't; also, I happened to find more than one Catalog: how do you explain that? – jj_p – 2013-09-28T17:45:06.240

I've also noticed that this method doesn't work sometimes, but couldn't pinpoint the reason yet. Every document should have only one /Catalog node, because it is the root of a document’s object hierarchy. Maybe merging multiple PDFs leads to more than one /Catalog node. I tested inserting two nodes in a document and it worked, but only the second one was recognized. – Dane Jacob Hampton – 2013-10-26T12:53:17.680

6

If I understand you correctly, here is how it should work:

gs \
  -o modified-pagelabels-50pages.pdf \
  -sDEVICE=pdfwrite \
  -c "[ /Page 1 /Label (i)     /PAGELABEL pdfmark" \
  -c "[ /Page 2 /Label (ii)    /PAGELABEL pdfmark" \
  -c "[ /Page 3 /Label (III)   /PAGELABEL pdfmark" \
  -c "[ /Page 4 /Label (four)  /PAGELABEL pdfmark" \
  -c "[ /Page 5 /Label (v)     /PAGELABEL pdfmark" \
  -c "[ /Page 6 /Label (|||||) /PAGELABEL pdfmark" \
  -f 50pages.pdf

However, I seem to remember, that this didn't reliably or fully work last time I tried this (about 2 years ago).

UPDATE: My memory wasn't failing me. I now tried again and filed a bug report for Ghostscript (bug 691889) concerning this. Follow the link to the bug report to see the details.

Kurt Pfeifle

Posted 2011-01-13T03:31:03.573

Reputation: 10 024

5

NOTE 1: The accepted answer is still mostly correct, but has some gaps. It is lacking in that many PDF files are not directly editable as text. Even when they are, such editing can sometimes damage the PDF making it unreadable. One solution, that will work for both Unix and Microsoft Windows is qpdf which can translate PDF files into "QDF", a text-editable form which is still a valid PDF file. The qpdf package comes with fix-qdf that recalculates offsets after a QDF file has been edited to correct any damage.

NOTE 2: Uncomfortable with text editors? Try using a GUI editor such as jpdftweak first. Sometimes the GUI pdf editors work, in which case, yay, you're done. However, when they fail, as has often been the case for me, you can try this more robust alternative. Either way, please do not down vote my answer for being less than elegant.


HOW TO Edit PDF Page Numbers Using Qpdf

Summary:

  1. qpdf -qdf foo.pdf foo.qdf
  2. edit foo.qdf

     0 << >>           % No label on first pages
     6 << /S /D >>     % Start numbering from 7th page.
    
  3. fix-qdf foo.qdf >bar.qdf
  4. test bar.qdf
  5. qpdf bar.qdf bar.pdf

Detailed steps

Step 1.

Convert the document to the easily editable QDF format. Run qpdf from the command line like so:

qpdf -qdf foo.pdf foo.qdf

Note: If you do not have qpdf installed already, Microsoft Windows executables can be downloaded from https://github.com/qpdf/qpdf/releases Unix systems, such as Ubuntu and Debian GNU/Linux can install it by typing apt install qpdf.

Step 2.

Edit the QDF document using a text editor such as notepad++, emacs, or gedit. Search for the word /Catalog and note the <<angle brackets>> it is inside. Nearby, you'll find the current /PageLabels (if any).

We'll be adding each section that should be differently numbered to the /PageLabels. The format is start-page << style >>. Note that white-space does not matter and that the first page of the document is 0. Unless otherwise specified, a new section always starts out numbering pages from 1.

Examples

Here is a full example of what PageLabels may look like, with comments added:

/Type /Catalog
/PageLabels <<
  /Nums [
    0           % From the first page of the document,
      <<
        /S /r   % ...use the lowercase roman numeral style.
      >>
    6           % From seventh page onward,
      <<
        /S /D   % ...use ordinary digits (arabic numerals)
      >>
  ]
>>

If the file has no PageLabels, add them after /Type /Catalog. For example, one might change,

1 0 obj
<<
  …
  /Type /Catalog
>>
endobj

into,

1 0 obj
<<
  … 
  /Type /Catalog
  /PageLabels
      << /Nums [
    0 << >>                 % No label for cover
    1 << /S /r >>           % i, ii for index
    3 << /S /D /St 15 >>    % 15, 16, 17, ... for article
    31 << /S /D /P (A-) >>  % A-1, A-2, A-3... for appendix
       ]
  >>
>>
endobj

OPTIONAL: STARTING FROM A DIFFERENT NUMBER WITH /St

Each section restarts numbering at 1 unless you tell it otherwise using /St. Notice how in the above example, the fourth page starts at 15.

OPTIONAL: USING A DIFFERENT STYLE WITH /S

The /S operator takes an argument that lets you pick the numbering style,

  • /D digits (1, 2, 3...)
  • /R uppercase Roman (I, II, III...)
  • /r lowercase Roman (i, ii, iii...)
  • /A uppercase alphabetical (A, B, C, ...., X, Y, Z, AA, AB, AC,...)
  • /a lowercase alphabetical (a, b, c, ...., x, y, z, aa, ab, ac,...)

If one omits the /S operator, then that section of pages will have no numbering. For example:

0 << >>         % No label for cover

OPTIONAL: ADDING A PREFIX TO EACH PAGE WITH /P

You can show any string of text before the page number by specifying a word in parentheses after /P:

  31
  <<
    /S /D
    /P (A-)     % label appendix pages A-1, A-2, A-3
  >>

Specifying a prefix without a style (/S), will give you pages that have only the word without any number. This can be useful, for example, if you'd like a cover page to simply have the label "Cover".

     0 << /P (Cover) >>        % No number, just "Cover"

Step 3.

Run fix-qdf to make your edits valid PDF and put the output in bar.qdf.

fix-qdf foo.qdf > bar.qdf

Step 4.

Open bar.qdf in your PDF viewing program and check that it is numbered correctly.

Step 5.

Convert the QDF file back into a normal PDF, like so:

qpdf bar.qdf bar.pdf

Ta da. You're done. You now have a document with correctly labeled page numbers in bar.pdf.

hackerb9

Posted 2011-01-13T03:31:03.573

Reputation: 579

4

There is a little python script, that can do the job: https://github.com/lovasoa/pagelabels-py

In your case call something like:

./addpagelabels.py --delete file.pdf
./addpagelabels.py --startpage 1 --type 'roman lowercase' file.pdf
./addpagelabels.py --startpage 7 --type arabic file.pdf

DG'

Posted 2011-01-13T03:31:03.573

Reputation: 329

This did the job exactly as I needed. Thanks! – telotortium – 2019-04-15T16:49:47.537

3

jPdf Tweak is an Open Source graphical utility that lets you edit page labels in PDF files. The documentation page provides step-by-step instructions.

CherryBerry

Posted 2011-01-13T03:31:03.573

Reputation: 81

I used this to add my custom page labels as "empty" format with text as prefix. Worked well! – Matt Sephton – 2017-08-10T21:53:20.757

This is a way better answer than text editing things by hand – endolith – 2019-05-10T14:46:16.030

Please add the step-by-step instructions here instead of relying on an external link. Thanks! – hackerb9 – 2019-05-15T00:51:36.813

1

I found direct editing of the file (as uncompressed by pdftk) not to work if there are already '/titles' set in the '/outlines' region. The direct-editing technique described in a post above is demonstrated on Youtube: https://www.youtube.com/watch?v=zoH1Z_hSpak

But the 'update' feature of pdftk may be more intuitive (and more reliable when '/titles' already exist in the '/outlines' region of the PDF file) via editing the 'doc_data.txt' file used here: https://www.pdflabs.com/blog/export-and-import-pdf-bookmarks/

Bob

Posted 2011-01-13T03:31:03.573

Reputation: 11

1Hi @Bob, Link-only answers are low quality. They will be useless if the target site moves or disappears. Please edit your answer and quote the relevant part of the solution here. – C0deDaedalus – 2018-05-27T19:18:00.910

1

For removing the old ones, probably the easiest cross-platform way is just to crop the old ones off. You could to this, for example, with BRISS.

Adding the new ones using free tools is more tricky. Personally I'd probably do it with pdflatex, as in this StackExchange answer, though that might be a rather involved solution unless you have other uses for pdflatex.

I think it can be done, however with jPdfTweak instead.

frabjous

Posted 2011-01-13T03:31:03.573

Reputation: 9 044

1

The method given by Dane H. does work with Acrobat Reader (or, to be precise, the current version of Adobe Reader). One minor point to note: the field at the top will only accept 8 characters so you can't enter something like 'subject index' into it if such a label has been used. But you can instead use menu item View > Page Navigation > Go to..., or the key equivalent.

Another tip: pdf specification always assigns page numbers consecutively, so in the case of a document produced by scanning pairs of pages the two sets of numbers get out of step (unless you laboriously number each page individually). But you can with little effort set up your document so the convention 'go to page n gets you to pages 2n and 2n+1' applies.

user308637

Posted 2011-01-13T03:31:03.573

Reputation: 11

1

Danes answer is the best, the formats changed a little now, this might be helpful:

%PDF-1.6

29241 0 obj

<</Metadata 1685 0 R/Outlines 29461 0 R/PageLabels<</Nums[0<</S/D>>3<</S/D/St 6>>4<</S/D/St 10>>5<</S/D/St 12>>15<</S/D/St 70>>16<</S/D/St 72>>17<</S/D/St 80>>18<</S/D/St 82>>19<</S/D/St 90>>23<</S/D/St 96>>25<</S/D/St 99>>29<</S/D/St 110>>31<</S/D/St 130>>32<</S/D/St 133>>35<</S/D/St 137>>36<</S/D/St 140>>37<</S/D/St 145>>39<</S/D/St 150>>40<</S/D/St 152>>42<</S/D/St 155>>43<</S/D/St 160>>46<</S/D/St 165>>47<</S/D/St 167>>48<</S/D/St 170>>49<</S/D/St 180>>50<</S/D/St 190>>52<</S/D/St 300>>53<</S/D/St 305>>54<</S/D/St 319>>56<</S/D/St 380>>57<</S/D/St 390>>58<</S/D/St 500>>67<</S/D/St 515>>68<</S/D/St 525>>70<</S/D/St 550>>71<</S/D/St 553>>72<</S/D/St 560>>73<</S/D/St 600>>76<</S/D/St 620>>78<</S/D/St 650>>82<</S/D/St 670>>85<</S/D/St 700>>95<</S/D/St 714>>117<</S/D/St 900>>162<</S/D/St 1000>>178<</S/D/St 1200>>209<</S/D/St 1500>>263<</S/D/St 1555>>270<</S/D/St 1563>>389<</S/D/St 1681>>522<</S/D/St 1813>>]>> /PageMode/UseOutlines/Pages 29177 0 R/Type/Catalog>>

endobj

daniel

Posted 2011-01-13T03:31:03.573

Reputation: 673

0

BeCyPDFMetaEdit http://www.becyhome.de/becypdfmetaedit/description_eng.htm

You can add/remove/change internal page numbers scheme in the "pages" tab of this freeware tool.

And be caution, PDF xchange viewer doesn't show the page number scheme, and foxitreader have a right result. I have not test the Acrobat reader.

Sulisu

Posted 2011-01-13T03:31:03.573

Reputation: 51