Questions tagged [pdf]

PDF (Portable Document Format) is a filetype that represents a document. The PDF format was originally a closed format developed by Adobe, but became an open standard in 2008.

PDF (Portable Document Format) is a filetype that represents a document. The PDF format was originally a closed format developed by Adobe, but became an open standard in 2008.

PDF is fairly widely used for the storage and transfer of documents that are principally designed to be viewed and archived (e.g. invoices, quotes, documentation). However, the PDF standard does support editable fields, so PDF documents are sometimes seen acting as forms for the request and submission of data. The inclusion and growth of such interactive functionality has led to some fairly serious security vulnerabilities in PDF renderers, where a malformed PDF document can be used to attack a system with a vulnerable PDF reader.

Systems that automatically generate PDFs from a set of data are quite common, although developing this functionality and interacting with various libraries that allow this can be quite challenging. Similarly, systems that automatically ingest and parse PDF documents can be difficult to develop, given the large and varied PDF specification.

More details about the format can be found over at Adobe.

88 questions
43
votes
9 answers

People think a "hidden" save file dialog box means the computer is frozen

I have had reports of my remote workstation freezing for several months, and it turns out that this is happening: User goes to print something to PDF (or save it). The file dialog box comes up asking where they want the file to go. They click…
Eli
  • 741
  • 2
  • 8
  • 16
17
votes
9 answers

Open Source PDF reader for windows as an alternative to Adobe reader

With the latest javascript vulnerabilities in Adobe reader and bloat it has aquired over the years, I've been thinking of moving the network I'm in charge of to a different product for PDF reading on Windows. The ideal PDF reader should be something…
Tom Feiner
  • 16,758
  • 8
  • 29
  • 24
16
votes
6 answers

Reasonable automatic HTML to PDF conversion (in UNIX/Linux environment)

Is there a way to generate PDF documents from HTML files automatically in Linux where the PDF offers some kind of reasonable level of resemblance to the input file? A command-line tool - as opposed to an interactive GUI of some kind - is key. I have…
Alex Balashov
  • 907
  • 2
  • 9
  • 16
11
votes
4 answers

Fast pdf to jpg conversion on Linux wanted

I am currently using ImageMagick to convert PDFs to JPEG raster images. It is painfully slow and uses up a lot of memory. The command I used was: convert -geometry 1024x768 -density 200 -colorspace RGB foo.pdf bar%02d.jpg I guess that it's slow…
mat3001
  • 305
  • 2
  • 3
  • 8
10
votes
3 answers

Any tools to automate OCR of scanned PDF files in a manner similar to Acrobat's OCR feature?

Open source preferred, but not necessary. I've got Adobe Acrobat 8, and really like the OCR feature which can essentially put an invisible layer of OCR'd text on top of a scanned document. Thus what you see on screen is the original scanned…
Boden
  • 4,948
  • 12
  • 48
  • 70
8
votes
3 answers

Is nginx suited for serving PDFs?

This is a dummy question. I have to give public access to PDFs, let's say 8 MB / file. It seems to me that nginx will serve any kind of files, as long as they are static. But someone tells me nginx isn't suited for this. Can you provide me some…
Elvex
  • 217
  • 2
  • 9
7
votes
11 answers

How to convert a really big HTML file to PDF in Windows

We have a few really large HTML files (60-100 MB) that we cannot convert to PDF with any reliability. Adobe Acrobat 9 crashes - hits the 2GB limit for applications. Open Office converts, but removes some of the anchors (). ActivePDF webgrabber…
PeterStrange
  • 101
  • 1
  • 1
  • 3
7
votes
1 answer

Why are we seeing Apache 206 partial responses for PDF downloads?

When looking in our Apache access log, when users are downloading PDF files from our server, the following often (but not always) happens. The URL is first requested and delivered with a status 200 (ok) and the full reply size, then immediately…
James Tolchard
7
votes
1 answer

How do I configure nginx to serve PDF files?

I am starting with a default installation of nginx. The only modification I've made to my enabled-sites/default file is: root /home/ubuntu/www ...where I have a web site and a /pdf folder that contains my pdf files. If I click a link to a pdf file,…
Seth
  • 423
  • 1
  • 4
  • 8
6
votes
7 answers

Text-file based document preparation system

I am looking for a system to prepare internal technical documents that has the following basic features: source files should be human-readable text files, so they play well with revision control supports basic formatting (e.g. images, tables,…
pdg137
  • 160
  • 5
5
votes
3 answers

Export SharePoint Wiki to PDF from the Command Line

We use a SharePoint wiki* at the office to serve as a knowledgebase for our IT operations. Recently we went through a disaster recovery exercise where we realized we had a key hole in our plans: how do you restore the services if your instruction…
Wyatt Barnett
  • 725
  • 5
  • 14
5
votes
3 answers

Indexing PDF files on Ubuntu

I'm looking for a solution in Ubuntu that indexes PDF (and ps?) files for searching later. The criteria would be: Compatibility: Often extracting text varies, depending on what software was used to create the PDF. Some PDFs can also be "locked",…
pufferfish
  • 2,660
  • 9
  • 37
  • 40
5
votes
6 answers

Capacity Optimization / Deduplication Options for Primary Storage

I'm exploring options for making more efficient use of our primary storage. Our current NAS is an HP ProLiant DL380 G5 with an HP Storageworks MSA20, and one other disk shelf which I'm not sure what it is. The vast majority of our files are PDF…
4
votes
2 answers

pdftk: edit PDF file in-place

Using PDFtk Server, I want to rotate a PDF file 90˚ and save it in-place, to overwrite the input file. I tried the following, but it fails, probably because it starts writing before the file is finished reading. pdftk in.pdf cat 1-endright output -…
Elliott B
  • 200
  • 2
  • 9
4
votes
2 answers

Printing PDF - prints 50Mb from 1Mb PDF file ..

I have this PDF file which is just 1Mb - 30 pages . So when I send it to printer ( HP 1320 ) i see that computer sends almost 50Mb to the printer. How is that possible ? I know that PDF is compressed format, but when i try command line program…
Kubber
  • 165
  • 1
  • 3
1
2 3 4 5 6