Friday, June 26, 2009

Converting RTF to PDF in an Oracle Database

There are a number of commercial products out there which converts RTF to PDF, but there is also a free alternative called Ted. Actually, calling this a converter is a gross understatement, it is a fully fledged RTF editor which also can convert RTF to PDF.

I will describe how to install Ted, and how to use it with RTF-documents stored in the database. This combined with generating dynamic RTF-documents can give a rich document solution for your Oracle Apex applications.

This is a Linux/Unix specific solution. So if you are stuck with some other operating system, go do something about it ;-)

Downloading and Installing Ted
Ted runs on Unix/Linux, and you can download the software here. Ted is released under GNU Public License, and restrictions may apply.

The easiest way of installing Ted is to use RPM package file.
  • Download RPM package file to /tmp
  • Install RPM-package as user root (or system user with sufficient privileges)
    rpm -ivh .rpm
If you get an error stating that packages are missing, do not despair, simply download and install the package indicated (or get it from your installation CD). The installation is pretty much the same as described in the bullets above.

Preparing os for RTF conversion
Download Ted the rtf2pdf.sh script which converts rtf-files to pdf-files.
  • Create an os-directory:
    mkdir /u01/app/ted
  • Copy the Ted conversion script to the directory
    cp /u01/stage/ted/rtf2pdf.sh /u01/app/ted/.
  • Grant execute privileges on the script
    chmod +x rtf2pdf.sh
I have done the above operations with user oracle of group oinstall, and in this demonstration user oracle will be used to execute os-operations. This is probably not the best production solution, but will serve for my demonstration purposes.

Creating directory in the Oracle database
Defining a directory in Oracle is quite easy:
  • Connect as user SYS
  • Create directory:
    create directory RTF2PDF as '/u01/app/ted';
  • Grant privileges to RTF2PFDF
    grant all on directory RTF2PDF to <schema_user>
How to execute os-commands from the Oracle database
There are basicly three alternatives; dbms_scheduler, Java and C. The three alternatives and use of dbms_scheduler is described by Steven Feuerstein here. He also has a good description of the Java and C approaches.

dbms_scheduler is the superior alternative to me, it's easy to use, and as opposed to Java, is compatible with Oracle XE (even though there are some issues with this release). The only beef with dbms_scheduler are the implicit commits, I have not investigated when or why any further as I have barely scratched the surface of dbms_scheduler. I'll get around to it some rainy day.

What actually made me revisit dbms_scheduler was this recent blog post by Tobias Arnold, following feeds like orna.info and apexblogs.info has proven invaluable for a monkey like me :-)

Dr. Tim Hall has an excellent site called oracle-base.com with loads of examples and articles on the Oracle database. My dbms_scheduler code and setup is based on his article Scheduler Enhancements in Oracle Database 11g Release 1.

Preparing for use of dbms_scheduler
Before your schema user can take advantage of dbms_scheduler and execute host commands, there are a few steps that must be done:
  • Connect as user sys
  • Grant create job to schema user:
    grant create job to <schema_user>;
  • Grant the ability to execute host commands/external jobs to schema user:
    grant create external job to <schema_user>
  • Create credential (this is new in 11g, and saves a lot of work):
begin
-- basic credential.
dbms_scheduler.create_credential(
credential_name => 'LOCALORACLE',
username => '',
password => '');
end;
  • Grant credential to schema user:
    grant execute on LOCALORACLE to <schema_user>
The os-user specified must have execution privileges on rtf2pdf.sh, and r+w on the directory where the script resides for this example.

PL/SQL code to invoke conversion
I have written a sample package for converting a rtf-document to a pdf-document. Sadly, I can't use Oracle's hosted environment on apex.oracle.com to showcase the functionality. I do not have the privileges to perform the required steps described above, and rightly so. Feel free to download and test it yourself. I have uploaded the package to my sample application at apex.oracle.com, you can download the sample code here.

The code is for demonstration purposes only, so my usual disclaimer apply: I take no responsibility what so ever for what might happen when you use it and there is no warranty of any kind expressed or implied. Phew... I still hope for world domination when you compile the package though ;-)

The package contains procedures/functions for file handling and invoking rtf2pdf.sh, but can easily be adapted for a more generic use.

Converting RTF to PDF
Lets say you have a table called RTF_DOCUMENTS that looks like this:
create table rtf_document
(
rtf_document_id number not null,
file_name varchar2 (255) not null,
rtf_file clob,
pdf_file blob,
constraint rtf_document_pk primary key (rtf_document_id)
);

And you want to create a procedure to convert the contents of column RTF_FILE to a pdf-document in column PDF_DOCUMENT, you can use the following procedure:
create or replace procedure convert_rtf_doc (
p_rtf_document_id in rtf_document.rtf_document_id%type
, p_dir_name in varchar2 default 'RTF2PDF'
) is
cursor c_doc
is
select rtdo.rtf_document_id
, rtdo.file_name
, rtdo.rtf_file
, rtdo.pdf_file
from rtf_document rtdo
where rtdo.rtf_document_id = p_rtf_document_id;
l_pdf_file blob;
begin
for r_doc in c_doc
loop
l_pdf_file := rtf2pdf.rtf2pdf(p_rtf_file_name => r_doc.file_name
,p_rtf_file => r_doc.rtf_file
,p_dir_name => p_dir_name);
update rtf_document rtdo
set rtdo.pdf_file = l_pdf_file
where rtdo.rtf_document_id = p_rtf_document_id;
end loop;
end;
/

Beware of the implicit commit by dbms_scheduler! A select for update in c_doc cursor would result in "fetch out of sequence"-error.

Alternatives to Ted
There are always alternatives, this is not different. There are some Open Source projects out there, and this guy (that's sexist, but I still assume cfSearching is a he) has done some research on the topic. His angle was different from mine though, and he went the ooxml-way, which is not a stupid idea.

Open Office could be an alternative, and has a more active community surrounding it. The CLI surrounding the OO-libraries in server mode will probably do the job.

In conclusion
In all honesty, I have not yet investigated to which extent the rendering of pdf-files is correct based on the more complex elements of rtf. This, of course, I should have done first. For me the journey is the most fun when (as in this case) there are no expectations attached to the end solution.

Tightening the PL/SQl-code referenced here for production goes without saying!

No comments:

Post a Comment