0

XML Parsing Error: not well-formed Location: https://awstats.example.org/reports/www.example.org/2011/06/awstats.www.example.org.xml Line Number 603, Column 34:

<tr><td class="aws">- Toile du Qu\uffffbec</td><td>363</td><td>363</td></tr>

The above isn't quite how it renders thanks to markdown wierdness; instead you get the unicode failbox of FFFF. I'm not sure why this is a problem, as vim renders it fine, and the document itself says

<?xml version="1.0" encoding="utf-8"?>

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">

Is there a setting in awstats I need to enable to properly handle non-US characters?

jldugger
  • 14,122
  • 19
  • 73
  • 129

2 Answers2

4

You can switch awstats to use 4.01 Transitional with the following configuration:

(from awstats.model.conf)

# If you prefer having the report output pages be built as XML compliant pages
# instead of simple HTML pages, you can set this to 'xhtml' (May not work
# properly with old browsers).
# Change : Effective immediatly
# Possible values: html or xhtml
# Default: html
#
BuildReportFormat=html
emsearcy
  • 69
  • 4
1

That DOCTYPE instructs the browser to use XHTML Strict. XHTML dictates that if there are any errors in the document, to show an error instead of degrading gracefully. This is one major reason why no one really uses XHTML (Strict). XHTML is a grammar for an XML document. As such any valid XHTML document is also a valid XML document. The XML specification does not allow the Unicode surrogate blocks 0xFFFE and 0xFFFF. See here.

I don't know if there is anything you can do to fix AWStats. As a test you could try changing the DOCTYPE to anything other than XHTML Strict. Try HTML 4.01 or HTML5. Then instead of giving nothing but an error the browser may still show something. Try replacing the 0xFFFF character with a character entity reference. At any rate, I wonder why you have 0xFFFF there. It looks like that is supposed to be an accented e, which is certainly not 0xFFFF.

dsh
  • 303
  • 1
  • 6