We have a report in our web application that being shown in tabular format in HTML. This report has the provision to be downloaded as PDF by clicking the download as PDF
button. The question is about the way this PDF download provision being implemented. I'm told to write a back-end service that would convert raw HTML string in to downloadable PDF. To convert the HTML into PDF we are using The Flying Saucer library for Java. Now the way this is suppose to work is that I will get raw HTML content as string such as:
<table id="new-table">
<thead>
<tr>
<th class="model">Column 1</th>
<th class="description">Column 2</th>
<th class="quantity">Column-3</th>
<th class="listDollars">Column-4</th>
<th class="payout">Column-5</th>
</thead>
<tbody>
<tr id="row-H2285" style="background: #FFFFFF;" class="modelRow">
<td class="model">H2285</td>
<td class="description">F125</td>
<td>16</td>
<td class="list"></td>
<td class="Percent">... and so on
From the front end in the request parameters and I have to convert this HTML string using the flying saucer and return a PDF file. My question is that Is there a way that an attacker can inject malicious code inside this HTML content and send it to the back-end service? Which might be harmful to any one who opens the PDF file?
I have googled for any security issues in the flying saucer library but could'nt find anything. But I did find this question from this site itself on How to inject malicious code in pdf or jpeg and there's another one Detecting malicious javascript in PDF