Managed Hosting



Project Home Known Issues Contact Project

Seeker Issue: PDF Parsing

Name: PDF Parsing
ID: 1
Project: Seeker
Type: Bug
Area: Code
Severity: Normal
Status: Open
Related URL:
Creator: Daniel Schmid
Created: 07/07/08 10:54 AM
Updated: 04/08/09 10:10 AM
Description: On certain PDF (could not find out the difference) parsing during index breaks with ErrorMessage
"An error occured while Parsing an XML document." at line 43 pdf.cfc"
History: Created by ananda (Daniel Schmid) : 07/07/08 10:54 AM

Comment by cfjedimaster (Raymond Camden) : 07/07/08 11:05 AM
Can you list the names of a few of the bad PDFs? What OS are you on?

Comment by phipps_73 (Dave Phipps) : 04/08/09 10:05 AM
Have also seen this error. Haven't been able to track down which pdf file(s) were causing the error as the temp file is empty (perhaps that is the clue) but without looking at the tempfile content it is difficult to work out which file was being processed. I simply modified pdf.cfc to include a try/catch block around the xmlParse and in the catch create an empty struct called myxml. This allowed the indexing to continue and ignore any "broken" files.

Comment by cfjedimaster (Raymond Camden) : 04/08/09 10:10 AM
What I think I'll do is simply update the pdf in there with the latest from pdfUtils. That may fix it.

To add a comment to this bug, please login using the link above.