Prince PDF Errors exporting a book imported from Lumen/OpenStax


#1

Pressbooks Version: 5.2
Server: RHEL7
PHP: 7.1
PHP MAX EXEC time: 600
PHP upload_max_filesize = 200M
PHP post_max_size = 200M
WordPress: 4.9.5
All themes at latest versions
Pressbooks PDF Export Method: Prince 11.3
Book Theme: McLuhan

Greetings:
We have a book that was imported from Lumen WXR XML into Pressbooks.
Prior to our update,this book exported to PDF successfully (at Pressbooks version 4.9)

We receive a “could not load input file” error from Prince when we process the book through Pressbooks.

I ran comparisons on the XML and CSS for the book, comparing them to other similar books that export successfully. I found no differences.

I debugged the XML and found Prince did not like a Line Separator Unicode character in the code. I stripped that out, also removed LF in the code. What is strange is that other books we publish have these characters in the XML.

That error disappeared, but we still received ‘failed to load input file’ errors.

I tried to process the XML via Prince command line (both with and without the stylesheet), and received more ‘robust’ errors:

[root@prodliblinux rawprincetests]# prince -s mcluhan.css IntroductiontoPsychologyPartIItabsremoved2.xml -o IntroductiontoPsychologyPartIItabsremoved2.pdf
prince: IntroductiontoPsychologyPartIItabsremoved2.xml:1: error: attributes construct error
prince: IntroductiontoPsychologyPartIItabsremoved2.xml:1: error: Couldn't find end of Start Tag rss line 1
prince: IntroductiontoPsychologyPartIItabsremoved2.xml:1: error: Extra content at the end of the document
prince: IntroductiontoPsychologyPartIItabsremoved2.xml: error: could not load input file
prince: error: failed to load all input documents

I can’t upload the XML file here, but I’d be happy to post or send it for ‘another pair of eyes’ . It looks like the problem lies in the RSS markup at the end of the XML file. Any thoughts?


#2

@rootl Can you email me the file? ned@pressbooks.com. Thanks.


#3

Hi @rootl, I can reproduce this with the file you sent me. Can you visit the book URL while logged into your network, and append the following: /format/xhtml

So, if your book URL is https://mypressbooksnetwork.org/mybook you should visit https://mypressbooksnetwork.org/mybook/format/xhtml.

This should load an XHTML file of the book as it is sent to Prince for conversion. In my local development environment, I get a 504 Gateway Time-out (likely because of the size of the book). I’m trying to boost the PHP max execution time and FastCGI read timeout on my development environment (I’m using nginx) to see if I can get it to work. Basically it looks like Prince is trying to load the document and timing out. If you get an error visiting that URL as well, it’s a good starting point for further troubleshooting.


#4

Just as a follow-up, I increased max execution time from 180s to 300s and was able to load the /format/xhtml url. Then I added a parameter to Prince to increase Prince’s timeout value to match PHP’s timeout value (see https://www.princexml.com/doc/command-line/#cmd-network, the --http-timeout option). I’m going to open an issue so that we match Prince’s timeout to PHP’s timeout, which should resolve this in a future release.


#5

@rootl pull request: https://github.com/pressbooks/pressbooks/pull/1248