Showing posts with label oracle. Show all posts
Showing posts with label oracle. Show all posts

Friday, November 5, 2010

Paranoia 101 - or protecting your investment

In corporate application development, a versioning and backup strategy is pretty mandatory. But what about all those small personal projects? The only thing you can be really certain of when using a laptop/desktop for your pet projects, is that eventually your machine is going to die. How much work gets lost when it does?

I use SVN for version control as a rule. I also use the APEX export and splitter utility to capture my local Oracle APEX applications into subversion repositories. Sometimes I even remember to put some of my more important repositories on Amazon S3 for safe storage. The APEX export is (very) loosely based on a blog post by John Scott back in 2006.

A word of cation: This just my naive approach, use it at your own risk! Be sure to test whether a restore actually works, equally important as the actual backing up :-)

The Start of an "Automated Approach"
While the manual steps described above will get you there, it is also a bit more work than I appreciate. In this post I will describe a more automated approach to ease your local APEX application backups (and anything else you put in your svn repository as well). A bit of pain to set it up, but once there, it is all plain sailing.

I do my development in Windows mainly, so all OS specifics described below are set in that context. This approach is viable for your favorite nix OS also. A tweaking of the batch scripts should do it.

Subversion
You need a command line tool to manipulate subversion repositories, download Subversion from http://subversion.apache.org/packages.html. I currently use the one from CollabNet. It should not matter much which one you choose. There are a number of online resources on how to install svn and create repositories (yes, even for Windows...).

Include the subversion bin directory to your path environment variable. After installation you should be able to execute svn commands from a cmd window:
svn help
svnadmin help

APEX Export Utility
This comes with the APEX installation files, and has the ability to extract single APEX applications (or whole instances) from a command line. An APEX splitter tool is also included, which together with a subversion repository enables you to track all the tiny changes in your applications.

In Oracle APEX 4.0, it accepts more modern Oracle jdbc libraries, in APEX 3.2 you had to locate classes12.jar (predates even the jdbc libraries that comes ships with Oracle XE). You can download all jdbc libraries here: http://www.oracle.com/technetwork/database/features/jdbc/index-091264.html

Since technetwork is constantly shifting about: If the link does not work, try to locate the Software Download Index, and search for JDBC.

Amazon S3
You need to set up an Amazon account before you can use Amazon Simple Storage Service (S3). If you have ever ordered books from Amazon, you probably already have an account, but it may be necessary to confirm payment methods.

Storage prices are not that high (at the moment). My reasoning is that I will suffer more if I don't backup my stuff to safe storage, than a tiny monthly bill could ever make me.

In order for this to work, you have to obtain an Acceskey:
  • Log on to http://aws.amazon.com/account/
  • Go to Security Credentials
  • Go to Access Credentials
  • If you do not have any previously generated accesskeys, then click Create a new Access Key
  • Copy both Access Key ID, and Secret Access Key

Beware to keep this information to yourself.

Create a bucket where you would like your backups to be stored, and take a note of the name.

The thing with Amazon S3, is its interfaces (REST style HTTP or SOAP) and storage objects organized in buckets. This basically means that you will require a third party tool (or do some serious programming yourself) to easily use the service.

JetS3t - Synchronize
I settled for JetS3t. I will not vouch for this tool in any way. I do not know who develops it, I do not know if it sends my credentials to other people or companies. I basically just hope and pray.

When credentials potentially can be exploited in a very direct manner to get at your most precious, and even conducting illegal activities on your behalf, it is time to draw a deep breath. Maybe uploading your file manually wasn't such a bad idea anyway? As I said, I just hope and prey. I hope "someone" will scan the source code, and "someone" will scream if anomalies appears. I hope.

I use a part of JetS3t called Synchronize. It has a very easy cli that suites my need for an automated backup process. Installation is just unzipping, but you have to tweak the properties a bit for it to work.

These are the changes I made in jets3t\configs\jets3t.properties-file:
s3service.default-bucket-location=EU (well, bacause it's the easiest to me)

And these are the changes in jets3t\configs\synchronize.properties
accesskey=<your accesskey>
secretkey=<your secret accesskey>

You can also point Synchronize to an encrypted file containing your access keys using the --credentials switch.

7-Zip
To zip my svn repository dumps, I use 7-Zip. Very easy, straight forward open source zipping tool with a command line interface. Make sure you can reach 7z from a command prompt before proceeding, else you have to modify the batch scripts below and add the path.

Java
Make sure you have java 1.5 or higher installed in you system, and that you can write java -version from a command prompt.

Batch scripts
This is where it gets interesting.

I start off by creating a directory where my backup scripts will reside. It will also serve as a staging area before files are uploaded to Amazon S3. This is the current structure I have:
  • backup
    • Apex (I need this directory for temporary check out/in)
    • oracle (this is where I copy the export/splitter utilities from <apex install dir>/apex/utilities/oracle)
      • apex
        • APEXExport.class
        • APEXExportSplitter.class
    • classes12.jar (jdbc drivers)

I then place four backup scripts directly into the backup directory.

backup_apex_app.bat
This script is the one responsible for checking out the APEX files from svn, extracting the current version of the APEX applications from the database, and checking it into svn again. It takes a parameter (%1%) containing APEX application id. All other necessary parameters will be set in the master.bat below.
REM ******** Start Backup APEX apps ************
set APEX_APP_ID=%1%

REM ** svn checkout
svn checkout file:///%APEX_REPOS_PATH% %APEX_SPLITTER_DIR%

REM ** Export APEX applications
set CLASSPATH=%CLASSPATH%;%APEX_BACKUP_DIR%\classes12.jar;
java oracle.apex.APEXExport -db %DB_JDBC_CONN_STR% -user %DB_USER% -password %DB_USER_PWD% -applicationid %APEX_APP_ID%
java oracle.apex.APEXExportSplitter f%APEX_APP_ID%.sql

REM ** Copy files to svn directory
xcopy %APEX_BACKUP_DIR%\f%APEX_APP_ID%.sql %APEX_SPLITTER_DIR% /Y
xcopy %APEX_BACKUP_DIR%\f%APEX_APP_ID% %APEX_SPLITTER_DIR%\f%APEX_APP_ID% /S /E /Y

REM ** Remove superflous files
rmdir %APEX_BACKUP_DIR%\f%APEX_APP_ID%\ /s /q
del %APEX_BACKUP_DIR%\f%APEX_APP_ID%.sql

REM ** Add and check in files to repository
cd %APEX_SPLITTER_DIR%
svn add * --force
svn ci -m "Automated backup"
cd %APEX_BACKUP_DIR%

backup_svn_repos.bat
This is the script responsible for dumping the svn repository and zipping the dump file. I suppose this could have been done with HOTCOPY as well. Old habits and such... You can include filters here to extract only the project you are interested in. All necessary parameters will be set from the master.bat script below.
REM ******** Start dumping and zipping repository ************

REM ** Remving old dumpfile
del %APEX_BACKUP_DIR%\%APEX_REPOS_DUMPFILE%.zip

REM ** Dumping svn repository
svnadmin dump %APEX_REPOS% > %APEX_BACKUP_DIR%\%APEX_REPOS_DUMPFILE%.dump

REM ** zipping repository dump
7z.exe a %APEX_REPOS_DUMPFILE%.zip %APEX_REPOS_DUMPFILE%.dump

REM ** Removing old dump file
del %APEX_BACKUP_DIR%\%APEX_REPOS_DUMPFILE%.dump

backup_svn_to_s3.bat
This script is responsible for sending the zipped repository dump to Amazon S3.
REM ******** Backup SVN repository to S3 ************

REM ** Synchronizing svn repository dumpfile
%JETS3T_PATH%\bin\synchronize.bat UP %BUCKET_NAME%/%BUCKET_PATH% %APEX_BACKUP_DIR%\%APEX_REPOS_DUMPFILE%.zip

master.bat
This is the script which ties it all together. If you want to test the separate scripts, just comment out the scripts (rem) you don't want to call. Chances are there is going to be some issues with the parameters the first run, so testing in small stages is key when debugging (it produces quite a bit of output during a normal run).

In this script it is necessary for you to adjust all parameters to your own environment. Example values are included in the sample below.

rem *********** Setting environment parameters *******************

rem ** backup directories
set APEX_BACKUP_DIR=<your backup dir, like c:\development\backup>
set APEX_SPLITTER_DIR=%APEX_BACKUP_DIR%\apex

rem ** path to apex application directory inside svn repository
set APEX_REPOS=<the actual path of your svn repository, like c:\SVN\myapprepos>
set APEX_REPOS_PATH=<Full path to your APEX folder inside your svn repository, note the forward slashes, like c:/SVN/myapprepos/trunc/apex>
set APEX_REPOS_DUMPFILE=<name of your repository dump file, like svn_repos, without post-fix/file type>

rem ** connect string format host:port:sid
set DB_JDBC_CONN_STR=<jdbc connect string, like localhost:1521:orcl3>
set DB_USER=<username, like scott>
set DB_USER_PWD=<password, like tiger>

rem ** S3
set JETS3T_PATH=<full path to your jets3t unzip dir, like C:\development\backup\jets3t>
set BUCKET_NAME=<name of the Amazon S3 bucket you created earlier, like mysafebackup.backup.svn>
set BUCKET_PATH=<name of path inside bucket, like svn_repos>

rem ************ Calling batch files ************************
call backup_apex_app.bat <application id>
call backup_apex_app.bat <other application id>
call backup_svn_repos.bat
call backup_svn_to_s3.bat

pause

Good to Go!?!
The only thing remaining to fully automated backups, is to schedule a regular execution of the script through the scheduler.

I know I felt a whole deal better after I started using this backup strategy :-)

Tuesday, March 2, 2010

DOCX Part II: How to Index Document Content With Oracle Text

Here I will demonstrate how to index content of CLOB data using Oracle Text. Other than the database objects used in the example, the post addresses the use of Oracle Text in a general way, and will work for any CLOB column. There is no ground breaking code here, but it completes the example on how to handle Microsoft Open Office XML Format (DOCX) documents. The examples and content here are based on my previous post.

The code here is tested with Oracle database release 11.1.0.6, but should work with releases down to 10g. When I say "tested", remember I am a developer, so it basically means it does compile and it did give the expected result on one run trough...

See it in Action
I have included a search page in my demo application. Upload a DOCX document, and try it for yourself: Go to demo application.

About Oracle Text
Oracle Text (formerly Oracle interMedia) is Oracles full-text retrieval technology, and part of Standard, Enterprise and even XE editions of the database. See Oracle Text on oracle.com for more information.

In my case I will use it to index text content stored in a CLOB column, and utilize it's search capabilities to enable users to retrieve relevant results based on search phrases.

Indexing CLOBs
Or BLOBs, or BFILEs, for that matter. It is very easy to get up and running with a text index. The following code is all that is needed to get a text index up and running against the TEXT_CONTENT column of the SAMPLES_DOCX table:
create index samples_docx_ctx on samples_docx
(text_content)
indextype is ctxsys.context
/
That's it! If you want to get fancy, there are a number of options and ways to enhance the indexing, look at the documentation for more information on the subject (lexers, sections, stopword lists, etc.).

Searching with Oracle Text
With the text index in place, the next step is to use the index to retrieve search results. In my example I will use a technique called "Progressive Relaxation" as described by Roger Ford here. It uses a query template to implement a progressive relaxation of the search tokens.

The following script creates a package to handle and use the new text index:
create or replace package samples_docx_search_p
as
   -- result types
   type t_result_rec is record (
      score         number,
      filename      samples_docx.filename%type,
      snippet       varchar2(4000));
   type t_result_tab is table of t_result_rec;
   -- synchronize search index
   procedure sync_search_index (
      p_ctx_index_name     in          varchar2 default 'SAMPLES_DOCX_CTX'
   );
   -- search docx content
   function search_text (
      p_tokens    in          varchar2
   ) return t_result_tab pipelined;
end;
/

create or replace package body samples_docx_search_p
as
   -- synchronize search index
   procedure sync_search_index (
      p_ctx_index_name     in          varchar2 default 'SAMPLES_DOCX_CTX'
   ) is
   begin
      ctx_ddl.sync_index(p_ctx_index_name, '2M');
   end;
   -- search docx content
   function search_text (
      p_tokens    in          varchar2
   ) return t_result_tab pipelined
   as 
      l_max_rows         integer := 10;
      l_counter          integer := 0;
      l_ret_rec          t_result_rec;
      l_tokens           varchar2(4000) := lower(p_tokens); 
      l_query_template   varchar2(32000):='<query>'
                              ||chr(10)||'   <textquery> heregoesthetokens'
                              ||chr(10)||'     <progression>'
                              ||chr(10)||'       <seq><rewrite>transform((TOKENS, "{", "}", " "))</rewrite></seq>'
                              ||chr(10)||'       <seq><rewrite>transform((TOKENS, "{", "}", "AND"))</rewrite>/seq>'
                              ||chr(10)||'       <seq><rewrite>transform((TOKENS, "?{", "}", "AND"))</rewrite></seq>'
                              ||chr(10)||'       <seq><rewrite>transform((TOKENS, "{", "}", "OR"))</rewrite></seq>'
                              ||chr(10)||'       <seq><rewrite>transform((TOKENS, "?{", "}", "OR"))</rewrite></seq>'
                              ||chr(10)||'     </progression>'
                              ||chr(10)||'   </textquery>'
                              ||chr(10)||'</query>';
      l_query            varchar2(32000);
   begin
      if l_tokens is null
      then
         return;
      end if;
      -- create query temlpate based on tokens
      l_query_template := replace(l_query_template, 'heregoesthetokens', l_tokens);
      -- restructure tokens for use with snippet
      l_tokens := replace(l_tokens, ' ', ' | ');
      for c in (select rowid
                     , filename 
                  from samples_docx 
                 where contains (text_content, l_query_template, 1) > 0)
      loop
         -- crude custom score, just listing the results as they come
         l_ret_rec.score := l_counter;
         l_ret_rec.filename := c.filename;
         -- create snippet (with tokenbased highlight) of content to return to the user
         ctx_doc.set_key_type('ROWID');
         l_ret_rec.snippet :=  ctx_doc.snippet('SAMPLES_DOCX_CTX', c.rowid, l_tokens, '<b>', '</b>', false);
         l_counter := l_counter + 1;
         -- return row
         pipe row (l_ret_rec);
         -- exit when max number of rows exceeded
         exit when l_counter >= l_max_rows;
      end loop;
   end;
end;
/
Some parts of the code are more cryptic than others.

First up is the "SYNC_SEARCH_INDEX" procedure. It's sole purpose is to synchronize the search index with the TEXT_CONTENT column. When you perform DML on the TEXT_CONTENT column, changes to the text index does not automatically propagate to the search index (unless the index is explicitly told to do so). So, in this example, if you insert a new row, the TEXT_CONTENT will not show up through Oracle Text searches until you have told it to synchronize the index. Indexing can be a resource demanding operation, so it makes sense to separate table DML from the indexing job.

In short: Make sure you synchronize the text index after DML on the indexed column has been performed.

Next is the "SEARCH_TEXT" function, particularly the query template. In the past I have used Oracle Text in its most simple form, but query templates adds a new dimension to controlling your searches. Basically you define how you want to use the search tokens (search phrase) to find matches in the text index. It is written in XML-form, and the <seq>-tag states the order of which you want to process your criteria. For more information on query templates, take a look at the documentation.

A short explanation on the query template:
  • <rewrite>transform((TOKENS, "{", "}", " "))</rewrite>: Matches when whole phrase (tokens in order) is present within document
  • <rewrite>transform((TOKENS, "{", "}", "AND"))</rewrite>: Matches when each token is present within document
  • <rewrite>transform((TOKENS, "?{", "}", "AND"))</rewrite>: Matches when each token is present within docoument, allowing typos (fussy search)
  • <rewrite>transform((TOKENS, "{", "}", "OR"))</rewrite>: Matches when any of the tokens are present within document
  • <rewrite>transform((TOKENS, "?{", "}", "OR"))</rewrite>: Matches when any of the tokens are present within document, allowing typos (fussy search)

Although it will return matches according to the sequence, the "score" within the sequence will be random (as far as I could see at least), so the "score" column in the example above needs a bit of work. I also learned the hard way that structure of the XML in the query template is not validated, and it swallows typos without a word.

Test the Code
You can test the code directly from SQL*Plus or your favorite weapon of choice. The following code will search for "test":
select filename
,      snippet
from   table(samples_docx_search_p.search_text('test'))
order  by score
/

Wildcards
My query template does not support searching for parts of words, there are some reasons for this behavior.
  • Wildcard searches requires more resources
  • The results returned will be unpredictable. Imagine searching for "ora*", both "oracle" and "orange" will match.
  • Wildcard searches are more suitable for look-ups in structured data
What did I mean by that last statement? As I see it, users today are using a wide range of search engines and site searches every day, and wildcard matching are in most cases available only through the "advanced search"-link (that is rarely used, if ever). Let alone that people do not expect this kind of behavior anymore. If you want to help the user, add ajax autocomplete to your search field, and use the Oracle Text index. In the traditional back office application, it is another matter (look-ups and such).

Create a Search Page in Oracle APEX
I will not do a detailed explanation here, but creating a report based on a query will do the trick. If you choose "Yes" for "Enable search" in the wizard it will save you some work by creating the search dialog on the page. Clean up the query in the report region so that tour query looks something like this:
select filename
,      snippet
from   table(samples_docx_search_p.search_text(:PXX_REPORT_SEARCH))
order  by score
When keeping the order by, I assume that you will let the score column determine the result order. For good measure you can create a row template for the result representation in the APEX report so you can have heading and snippet spread over two rows.

Update Sample Upload Page
If you created the upload page described in my previous post, include a new page process (After Submit, and after the "extract_text"-process). The process should be of PL/SQL-type, and the code should look something like this:
samples_docx_search_p.sync_search_index;
It does not necessary have to be a page process, but it is important to synchronize the text index after DML.

How to Handle DOC and DOCX
...And PDF, XLS, PPT, etc. Well, the DOCX is the exception here. Oracle Text does not understand the format yet (unless you are running on release 11.1.0.7 or higher). If you create an Oracle Text index on a BLOB column containing other file types which it does understand, they will be indexed automagically. If you want to index the content of a DOCX document in the same index, use the technique described in my last post, convert the output (plain text) into a BLOB, and insert the BLOB into the table with the other documents. Or said in a different way: you except and handle DOCX documents differently than the known document types.

Querying an Oracle Text index works the same for BLOB and CLOB columns.

There is definitely room for improvement in the code, but it serves well enough for demonstration purposes.

Enjoy :-)

Monday, March 1, 2010

DOCX Part I: How to Extract Document Content as Plain Text

In this post I will demonstrate how to extract plain text from Microsoft Open Office XML Format (DOCX) documents stored as a BLOB data type in an Oracle database. It is part one of two posts concerning how to extract and index unstructured content stored in DOCX files. I found snippets of solutions several places, but no complete copy-paste solution, so perhaps this will save some of you a bit of work.

The solution is tested on Oracle database Enterprise Edition release 11.1.0.6 with Apex 3.2.1 running on Windows XP, but should work down to 10g Standard Edition (not Oracle XE, since part of the code is in Java).

See it In Action
I have updated my demo application with a couple of pages so you can try it out yourselves: Go to demo application. There is a limited space available at apex.oracle.com, so please take care when uploading (and it is very easy to test in your own environment).


When will You Ever Need This?
If you have an Oracle database version prior to 11.1.0.7, Oracle Text is not able to index DOCX documents (as it does DOC documents). From version 11.1.0.7 and on, Oracle uses technology from Stellent, and DOCX is indexed as the other formats. So if you want to index DOCX text content in Oracle Text in a version prior to 11.1.0.7, then this could be a way to do it.

Oracle is working on a back-port to 10.x, but I have no status when (if) this will be available. Microsoft Office 2007 has been around since.. Take a wild guess! So these things obviously takes some time.

About Microsoft Open Office XML Format
As seen from the eyes of a PL/SQL developer, that is. First of all, XML sounds promising, you can do a ton of things with a valid XML in an Oracle database. You get the first nasty surprise when opening a DOCX-document in your favorite text editor; it is a zip archive! The next is when you realize that utl_compress can't help you uncompressing it either.

So Google next, and realizing this cannot be easily done in a pure PL/SQL solution, Google yields this gem from Ted Neward. It is a DOCX (Open XML) walk through as seen from the eyes of a Java developer. Very educational.

How to Unzip in PL/SQL
You probably can, but that would probably also involve a lot of work. The easy way is to take hold of a Java method that already does what you want. After searching the net I came up with this post from peterv6i which does exactly what I want (and more, it can also add files (BLOB) to an existing zip archive).

My short version of this (as I only need to extract content), follows below. The script creates a java class with a method to extract a file from a zip archive, and a PL/SQL wrapper to the getFileFromZip method. The database user must have JAVA_DEPLOY and JAVAUSERPRIV (I think) roles.
create or replace java source named "ZIPImpl" 
AS 
import java.io.*; 
import java.util.zip.*; 
import java.sql.*; 
import oracle.sql.*; 
public class ZIPImpl 
{ 
public static void getFileFromZip(oracle.sql.BLOB srcBlob, oracle.sql.BLOB dstBlob[], java.lang.String name) { 
try { 
   OutputStream outBuffer = dstBlob[0].getBinaryOutputStream(); 
   InputStream inBuffer = srcBlob.getBinaryStream(); 
   ZipInputStream zip = new ZipInputStream(inBuffer); 
   ZipEntry entry; 
   byte[] tmpBuffer = new byte[2048]; 
   while((entry = zip.getNextEntry()) != null) { 
      if (entry.getName().compareTo(name)==0) { 
         int n; 
         while ((n = zip.read(tmpBuffer)) >= 0) 
           outBuffer.write(tmpBuffer, 0, n); 
      } 
   } 
   outBuffer.close(); 
 } 
 catch (SQLException e) { 
   System.err.println(e); 
 } 
 catch (IOException e) { 
   System.err.println(e); 
 } 
} 
}; 
/

alter java source "ZIPImpl" compile 
/      

create or replace package zip as 
   procedure unzip( 
      p_src       in          blob
   ,  p_dst       in out      blob
   ,  p_filename  in          varchar2); 
end; 
/

create or replace package body zip as 
   procedure unzip( 
      p_src       in          blob
   ,  p_dst       in out      blob
   ,  p_filename  in          varchar2) 
   as language java 
   name 'ZIPImpl.getFileFromZip(oracle.sql.BLOB, oracle.sql.BLOB[], java.lang.String)'; 
end; 
/ 
How to Extract Plain Text Content from DOCX
So, now you have a procedure to extract a file from a zip archive, next is to attack the content of the file. The DOCX file consists of several files, but the text content of the document resides in "word/document.xml". As the file is an XML document, I will perform an XML/XSL transformation of the file to be left with only plain text.

The script below creates a table SAMPLES_DOCX which will hold the original DOCX file stored as a BLOB, and a CLOB containing the plain text. It also creates a package with three procedures:
  • Store the DOCX (as uploaded into APEX_APPLICATION_FILES)
  • Download the original file (strictly not necessary for this exercise, but nice to have all the same)
  • Extract and store plain text from DOCX

The XSL used to transform the document.xml file is an (extremely) abbreviated version of the XSL posted here in the Oracle Technical Forums.
create table samples_docx (
   filename       varchar2(255) not null
,  mime_type      varchar2(255)
,  orig_file      blob
,  text_content   clob)
/

alter table samples_docx add constraint samples_docx_pk primary key (filename)
/

create or replace package sample_docx_p as
   -- get file from apex_application_files and store it in samples_docx table
   procedure store_docx (
      p_file_name    in       varchar2);
   -- download procedure for original docx file
   procedure retrieve_docx (
      p_file_name    in       varchar2);
   -- extract plain text from docx file (this is the meat)
   procedure store_text (
      p_file_name    in       varchar2);
end;
/

create or replace package body sample_docx_p as
   -- get file from apex_application_files and store it in samples_docx table
   procedure store_docx (
      p_file_name    in       varchar2)
   is
      l_file         blob;
      l_mime_type    apex_application_files.mime_type%type;
      l_name         apex_application_files.name%type;
   begin
      -- get file from apex files
      select  name
            , mime_type
            , blob_content
       into   l_name
            , l_mime_type
            , l_file
       from   apex_application_files
      where   name = p_file_name;
      -- insert record into samples table
      insert into samples_docx ( filename
                               , mime_type
                               , orig_file)
       values   (   l_name
                  , l_mime_type
                  , l_file);
      -- delete file from apex files when done
      delete from   apex_application_files
           where   name = p_file_name;
   end store_docx;
   -- download procedure for original docx file
   procedure retrieve_docx (
      p_file_name    in       varchar2)
   is
      l_file         blob;
      l_mime_type    apex_application_files.mime_type%type;
      l_name         apex_application_files.name%type;
      l_size         number;
   begin
      -- get file from apex files
      select  filename
            , mime_type
            , orig_file
            , dbms_lob.getlength(orig_file)
       into   l_name
            , l_mime_type
            , l_file
            , l_size
       from samples_docx   
      where   filename = p_file_name;
      -- return file
      owa_util.mime_header( nvl(l_mime_type,'application/octet'), false);
      htp.p('Content-length: ' || l_size);
      htp.p('Content-Disposition:  attachment; filename="'||replace(replace(substr(l_name,instr(l_name,'/')+1),chr(10),null),chr(13),null)|| '"');
      owa_util.http_header_close;
      wpg_docload.download_file(l_file);
   end retrieve_docx;
   -- perform xsl tranformation of document.xml
   function get_text (
      p_docx_xml     in       xmltype
   ) return clob
   is
   l_clob   clob;
   -- xsl monkeyed from http://forums.oracle.com/forums/thread.jspa?messageID=3368284
   -- abbreviated quite a bit, check out original posting by "user304344" for the original
   l_xsl    xmltype := xmltype('<?xml version="1.0" encoding="utf-8"?>'
                     ||chr(10)||'<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" '
                     ||chr(10)||'xmlns:w="http://schemas.openxmlformats.org/wordprocessingml/2006/main"'
                     ||chr(10)||'xmlns:v="urn:schemas-microsoft-com:vml"'
                     ||chr(10)||'exclude-result-prefixes="w v">'
                     ||chr(10)||'<xsl:output method="text" indent="no" encoding="UTF-8" version="1.0"/>'
                     ||chr(10)||'<!-- document root -->'
                     ||chr(10)||'<xsl:template match="/">'
                     ||chr(10)||'<!-- root element in document --> '
                     ||chr(10)||'<xsl:apply-templates select="w:document"/> '
                     ||chr(10)||'</xsl:template>'
                     ||chr(10)||'<!-- ****************************start document**************************** -->'
                     ||chr(10)||'<xsl:template match="w:document">'
                     ||chr(10)||'<xsl:for-each select="//w:p">'
                     ||chr(10)||'<xsl:apply-templates select="*/w:t"/> '
                     ||chr(10)||'<xsl:text>|¤¤</xsl:text> '
                     ||chr(10)||'</xsl:for-each> '
                     ||chr(10)||'</xsl:template>'
                     ||chr(10)||'<!-- get all text nodes within a para -->'
                     ||chr(10)||'<xsl:template match="*/w:t">'
                     ||chr(10)||'<xsl:value-of select="."/>'
                     ||chr(10)||'</xsl:template>'
                     ||chr(10)||'<!-- **************************** end document**************************** -->'
                     ||chr(10)||'</xsl:stylesheet>');
   begin
      -- "|¤¤" is just a hack to get linebreaks, should be an easier way to achieve this
      select replace(xmltransform(p_docx_xml, l_xsl).GetClobVal(), '|¤¤', chr(10))
        into l_clob
        from dual;
      return l_clob;
   end;   
   -- extract plain text from docx file (this is the meat)
   procedure store_text (
      p_file_name    in       varchar2)
   is
      l_docx            blob;
      l_docx_unzip      blob;
      l_doc             clob;
      l_dest_offset     integer := 1;
      l_src_offset      integer := 1;
      l_lang_context    integer := dbms_lob.default_lang_ctx;
      l_warning         integer;
   begin
      -- get original file
      select orig_file
        into l_docx
        from samples_docx
       where filename = p_file_name;
      -- create lob locators
      dbms_lob.createtemporary(l_docx_unzip,false);
      dbms_lob.createtemporary(l_doc,false);
      -- use java to unzip the docx file and retrieve document.xml
      zip.unzip(l_docx, l_docx_unzip, 'word/document.xml');
      -- convert blob to clob
      dbms_lob.converttoclob
               ( dest_lob => l_doc
               , src_blob => l_docx_unzip
               , amount => dbms_lob.lobmaxsize
               , dest_offset => l_dest_offset
               , src_offset => l_src_offset
               , blob_csid => nls_charset_id('AL32UTF8') --in my case, it is stored as UTF8
               , lang_context => l_lang_context
               , warning => l_warning
               );
      -- transform clob via xsl to get clean text
      l_doc := get_text(xmltype(l_doc));
      -- update the column containing document text
      update samples_docx
         set text_content = l_doc
       where filename = p_file_name;
      -- clean lob locators, should be repeated in exception block
      dbms_lob.freetemporary(l_docx_unzip);
      dbms_lob.freetemporary(l_doc);
   end;
end;
/
Bringing it Together in Oracle APEX
To test the code, you can create a simple upload page in APEX:
  • Create new page
  • Form
  • Form on a procedure
  • <schema name> where you installed the application
  • Choose "sample_docx_p.store_docx" as stored procedure name
  • Accept defaults (or change to your liking)
  • Choose desired tab options
  • Leave Invoking Page/Button Label blank if you do not want an invoking page
  • Choose branch pages (can be the same as the one you are creating)
  • Accept defaults
  • Click Finish
  • Click Edit Page
  • Choose PXX_FILE_NAME, and change "Display as" from "Text item" to "File Browse..."
  • Apply changes

To add post processing to extract DOCX file content:
  • Click Create under Processes
  • Choose PL/SQL
  • Give the process a name like "extract_text"
  • In the "Enter PL/SQL Page Process" dialog, paste "sample_docx_p.store_text(:PXX_FILE_NAME);" (XX being your page number)
  • Create process (and ensure the process comes after "Run Stored Procedure" process created by the page wizard

The last process could be implemented as a trigger on the samples_docx table.

Run the page and test by uploading a DOCX document (or test it in my demo application).

Building More
If you have a need to manipulate DOCX documents, this could probably be achieved by using the complete "ZIPImpl" as posted in this article, and using substitution variables in the same manner as I have described with RTF documents in an earlier post.

The zip/unzip may also come in handy when handling other office files (like Open Office ODF).

This is only a proof of concept, and the code is not production ready. Really! I'm not just saying that out of habit.

The next (and final) part of the DOCX "series" will address how to index the plain text content using Oracle Text.

Thursday, February 4, 2010

Nice URL in a Public Facing Oracle APEX Application

UPDATE: Try out Morten Bråtens excellent post on creating a REST web service with PL/SQL for the simplest answer to beautifying APEX URLs. The main difference between our solutions is that he uses a PL/SQL-procedure to actually do the rewrite, and I use an Apache map file. If I only had known... Ah, well, you live, you learn.

The Oracle APEX URL construct can be a bit cryptic for the uninvited, people and spiders alike. Here is a (a, not THE) technique to alleviate this. It is loosely based on a similar solution I made for an ancient Oracle Portal solution quite a few years back. You need an Apache (OHS/OAS or regular Apache) in front for this to work. The API is tested for Apache 2.2 and Oracle XE/11g, but it should run equally well on OHS/OAS (Apache 1.3-2.0).

Why Do You Need Nice URLs?
This is the place to start; Why would you consider rewriting the native APEX URL? There are a few good reasons why, but generalizations won't work here. The question is why do YOU need this for YOUR application? Consider it, and reconsider it, because there are no silver bullets here. Any path you choose will inevitably cause more work, add complexity and even more gray hairs when your carefully thought out assumptions suddenly changes. Not to mention adding overhead to your webserver when processing requests. Perhaps you will even start loosing hair for all I know. The more sophisticated application, the more work added for beautifying the URLs.

For an intranet environment I can see no good reasons to do this. None.

The Alternatives
There are already some designs out there to help you on your way. Patrick Wolf has a list of the more popular ones. Kris Rice's post Better APEX URLs also uses a map file. The <iframe> solution is also worth considering, depending on your need. There are also some very creative Apache rewrite solutions described in the OTN APEX-forum.

My Attempt
Let's just call it that, an attempt. It did not turn out as streamlined as I had hoped for, but for my need (and the effort I was prepared to put into it), it suffices.

I wanted to create content based nice URLs, and the content in question here resides in an Oracle database table. I also wanted to control the output from the database to reduce the number of moving parts.

This is how it works:
  • You can define a "page type" based on a table with a primary key, and create mapping between APEX URL and a nice URL based on that
  • You can manually create mapping between APEX URL and a nice URL to fully control both APEX and nice URL
  • The APEX to nice URL mapping will be written to an Apache map file
  • Apache checks the content of the map file for a potential rewrite
  • If the map file changes, Apache will cache the new version automagically
See examples below the first script to get a feel for it. The rewrites will not get in the way of the normal APEX URLs, so it will not break existing functionality.

The Database Code
The following script will create two tables, one to hold the mapping between source table (where content resides) and the APEX page(s) where content will be displayed. The other table is where the actual nice to APEX URL-mapping resides, the content of this table is what gets written to the Apache rewrite map file.

The script will also create a directory where the URL map file will be written, and a package to handle all the in-betweens. The schema user must have the appropriate privileges to create these objects for the script to succeed. Replace path to APACHECONF directory to desired location.
-- change path to suite your environment
create or replace directory APACHECONF as 'C:\Oracle\OraXE\Apache\conf'
/

create table app_page_type (
   page_id              number
,  source_table         varchar2(32)
,  source_table_pk_col  varchar2(32)
,  url_path             varchar2(255)
,  pk_page_parameter    varchar2(255)
,  name_column          varchar2(32)
)
/

create unique index app_page_type_uk
   on app_page_type (page_id, source_table)
/   

create table app_page_url_mapping
(
   page_id             number not null
 , source_table        varchar2(32)
 , source_table_id     number
 , apex_url            varchar2(4000) not null
 , nice_url            varchar2(4000) not null
 , nice_title          varchar2(255) 
)
/

create unique index app_page_url_mapping_uk
   on app_page_url_mapping (page_id, source_table, source_table_id)
/   

create or replace package app_page_p
as
   -- create url-friendly construct, 255 chars long
   -- based on a name.
   function get_nice_name (
      p_name               in     varchar2
   ,  p_source_table_id    in     varchar2 default null
   ,  p_url_end            in     varchar2 default '.html'
   ) return varchar2;
   -- get nice url based on source table, can be used
   -- directly from APEX application
   function get_nice_url (
      p_page_id            in    number
   ,  p_source_table       in    varchar2
   ,  p_source_table_id    in    varchar2
   ) return varchar2;
   -- get nice url from APEX page_id, will only return
   -- mappings without source table defined
   function get_nice_url (
      p_page_id            in    number
   ) return varchar2;
   -- create mapping based on page type and source table
   procedure create_mapping (
      p_source_table       in       varchar2
   ,  p_source_table_id    in       varchar2
   ,  p_name               in       varchar2 default null);
   -- create complete manual mapping  
   procedure create_manual_mapping (
      p_page_id            in       number
   ,  p_apex_url           in       varchar2
   ,  p_nice_url           in       varchar2
   ,  p_nice_title         in       varchar2 default null);
   --remove mapping when source is removed  
   procedure remove_mapping (
      p_source_table       in       varchar2
   ,  p_source_table_id    in       number
   );
   -- remove manual mapping to page_id
   procedure remove_manual_mapping (
      p_page_id            in       number
   );
   -- synchronize map file with mapping table
   procedure synchronize_mapfile;
end;
/
          
create or replace package body app_page_p
as
   -- global variables
   -- change variables to suite your environment
   g_app_id       number        := 1000; --APEX app_id for public application
   g_app_name     varchar2(255) := 'My test app'; --descriptive name of application
   g_dad_path     varchar2(255) := '/pls/apex'; --dad or location
   g_map_file     varchar2(255) := 'map.txt'; --name of map file
   g_tmp_map_file varchar2(255) := 'map_tmp.txt'; --name of temporary map file
   g_map_dir      varchar2(255) := 'APACHECONF'; --name of directory
   -- get nice url from source_table
   function get_nice_url (
      p_page_id            in    number
   ,  p_source_table       in    varchar2
   ,  p_source_table_id    in    varchar2
   ) return varchar2 is
      l_ret    varchar2(255);
   begin
      select tab.nice_url
        into l_ret
        from app_page_url_mapping tab
       where tab.page_id = p_page_id
         and tab.source_table = p_source_table
         and tab.source_table_id = p_source_table_id;
      return l_ret;
   exception
      when no_data_found
      then
         return null;
   end;
   -- get nice url from page_id, manually created
   function get_nice_url (
      p_page_id            in    number
   ) return varchar2 is
      l_ret    varchar2(255);
   begin
      for r in (select tab.nice_url
                  from app_page_url_mapping tab
                 where tab.page_id = p_page_id
                   and tab.source_table_id is null)
      loop
         l_ret := r.nice_url;
         exit;
      end loop;
      return l_ret;
   exception
      when no_data_found
      then
         return null;
   end;
   -- create url-friendly construct, 255 chars long
   function get_nice_name (
      p_name               in     varchar2
   ,  p_source_table_id    in     varchar2 default null
   ,  p_url_end            in     varchar2 default '.html'
   ) return varchar2 is
      l_ret    varchar2(4000) := p_name;
      l_sub    number;
   begin
      -- special national translation (excerpt)
      l_ret := translate(lower(l_ret), 'æøå ', 'aoa_');
      l_ret := regexp_replace(l_ret, '([^[:alnum:]|_])', '', 1, 0, 'i');
      l_sub := nvl(length(to_char(p_source_table_id)||p_url_end),0);
      l_ret := substr(l_ret, 1, 255-l_sub);
      l_ret := l_ret||p_source_table_id||p_url_end;
      return l_ret;
   end;
   -- lookup column value for source_table and source_table_id  
   function get_source_name (
      p_source_table       in       varchar2
   ,  p_source_table_id    in       number
   ,  p_source_pk_col      in       varchar2
   ,  p_source_name_col    in       varchar2
   ) return varchar2
   is
      l_sql    varchar2(4000) :=   'select '||p_source_name_col||' pk_col'||chr(10)
                                 ||'  from '||p_source_table||chr(10)
                                 ||' where '||p_source_pk_col||' = :1';
      l_ret    varchar2(4000);
   begin
      execute immediate l_sql into l_ret using in p_source_table_id;
      return l_ret;
   end;
   -- create the actual mapping
   procedure create_mapping (
      p_source_table       in       varchar2
   ,  p_source_table_id    in       varchar2
   ,  p_name               in       varchar2 default null
   ) is
      l_orig_name     varchar2(4000);
      l_nice_name     varchar2(255);
      l_apex_url      varchar2(4000);
      l_nice_url      varchar2(4000); 
      l_nice_title    varchar2(4000);
   begin
      -- delete previous entries
      delete from app_page_url_mapping
       where source_table = p_source_table
         and source_table_id = p_source_table_id;  
      for r in (select pt.*
                  from app_page_type pt
                 where pt.source_table = p_source_table)
      loop
         if p_name is null
         then
            l_orig_name := get_source_name(r.source_table, p_source_table_id, r.source_table_pk_col, r.name_column);
         else
            l_orig_name := p_name;
         end if; 
         l_nice_name := get_nice_name(l_orig_name, p_source_table_id, '.html');
         l_apex_url  := g_dad_path||'/f?p='||g_app_id||':'||r.page_id||':0::::'||r.pk_page_parameter||':'||p_source_table_id;
         l_nice_url  := r.url_path||'/'||l_nice_name;
         l_nice_title := substr(g_app_name||' - '||l_orig_name, 1, 255);
         -- insert new entry
         insert into app_page_url_mapping (
            page_id
         ,  source_table
         ,  source_table_id
         ,  apex_url
         ,  nice_url
         ,  nice_title)
         values (
            r.page_id
         ,  r.source_table
         ,  p_source_table_id
         ,  l_apex_url
         ,  l_nice_url
         ,  l_nice_title);
      end loop; 
      -- synchronize url-mapping file
      synchronize_mapfile;      
   end;
   -- create manual mapping  
   procedure create_manual_mapping (
      p_page_id            in       number
   ,  p_apex_url           in       varchar2
   ,  p_nice_url           in       varchar2
   ,  p_nice_title         in       varchar2 default null  
   ) is
   begin
      -- educated guess on delete
      delete from app_page_url_mapping
       where page_id = p_page_id
         and source_table is null;
      -- insert new entry
      insert into app_page_url_mapping (
         page_id
      ,  apex_url
      ,  nice_url
      ,  nice_title)
      values (
         p_page_id
      ,  p_apex_url
      ,  p_nice_url
      ,  p_nice_title);
      -- synchronize map file
      synchronize_mapfile;
   end;
   procedure remove_mapping (
      p_source_table       in       varchar2
   ,  p_source_table_id    in       number
   ) is
   begin
      delete from app_page_url_mapping
       where source_table = p_source_table
         and source_table_id = p_source_table_id;
      synchronize_mapfile;
   end;
   procedure remove_manual_mapping (
      p_page_id            in       number
   ) is
   begin
      delete from app_page_url_mapping
       where page_id = p_page_id
         and source_table is null;
      synchronize_mapfile;
   end;
   -- normal write to file procedure
   procedure write_to_file (
      p_file_name       in          varchar2
   ,  p_directory       in          varchar2
   ,  p_content         in          clob
   ) is 
      l_file          utl_file.file_type;
      l_buffer        raw(32000);
      l_amount        binary_integer := 32000;
      l_pos           integer := 1;
      l_blob          blob;
      l_blob_left     number;
      l_blob_length   number;
      l_file_content  blob;
      l_src_offset    integer := 1;
      l_dest_offset   integer := 1;
      l_lang_ctx      integer := dbms_lob.default_lang_ctx;
      l_warn          integer;
   begin
      -- converting from clob to blob
      dbms_lob.createtemporary(l_file_content, false);
      dbms_lob.converttoblob(l_file_content, p_content, dbms_lob.getlength(p_content), l_dest_offset, l_src_offset, 1, l_lang_ctx, l_warn);
      l_blob_length := dbms_lob.getlength(l_file_content);
      l_blob_left := l_blob_length;
      -- open the destination file.
      l_file := utl_file.fopen(p_directory,p_file_name,'WB', 32760);
      -- if small enough for a single write
      if l_blob_length < 32760 then
         utl_file.put_raw(l_file,l_file_content);
         utl_file.fflush(l_file);
      else -- write in pieces
         l_pos := 1;
         while l_pos < l_blob_length
         loop
            dbms_lob.read(l_file_content,l_amount,l_pos,l_buffer);
            utl_file.put_raw(l_file,l_buffer);
            utl_file.fflush(l_file);
            -- set the start position for the next cut
            l_pos := l_pos + l_amount;
            -- set the end position if less than 32000 bytes
            l_blob_left := l_blob_left - l_amount;
            if l_blob_left < 32000 then
               l_amount := l_blob_left;
            end if;
         end loop;
      end if;
      utl_file.fclose(l_file);
      dbms_lob.freetemporary(l_file_content);
   exception
     when others then
       dbms_lob.freetemporary(l_file_content);
       -- close the file if something goes wrong.
       if utl_file.is_open(l_file) then
         utl_file.fclose(l_file);
       end if;
       raise;
   end;
   -- synchronize map file
   procedure synchronize_mapfile
   is
      l_map    clob;
   begin
      l_map :=          '#       Generated file, manual changes will suddenly disappear'||chr(10);
      l_map := l_map || '#       Last generated: '||to_char(sysdate, 'dd.mm.yyyy hh24:mi:ss')||chr(10);
      for r in (select *
                  from app_page_url_mapping)
      loop
         l_map := l_map||r.nice_url||' '||r.apex_url||chr(10);
      end loop;
      write_to_file(g_tmp_map_file, g_map_dir, l_map);
      -- if everything goes well, we rename and overwrite
      utl_file.frename(g_map_dir, g_tmp_map_file, g_map_dir, g_map_file, true); 
   end; 
end;
/

Configuring Apache
Apache will handle the actual rewrites based on the generated map file. Include the following in your httpd.conf (after backing it up)
## Rewrite URLs that exists in map.txt
  RewriteMap nicetoapex txt:c:/oracle/oraxe/apache/conf/map.txt
  RewriteCond %{REQUEST_URI} /(([^/]+)(/.*)*)$
  RewriteCond %2 =articles.html [OR]
  RewriteCond %2 =article [OR]
  RewriteCond %2 =albums.html [OR]
  RewriteCond %2 =album
  RewriteCond ${nicetoapex:%{REQUEST_URI}|NOT_FOUND} !NOT_FOUND
  RewriteRule ^(/.*) ${nicetoapex:$1}%{QUERY_STRING} [P]
Note the forward slashes in the map file path (even for Windows installs). The code must be included directly into your VirtualHost-directive (if you have one), and not your default APEX Location-directive or some such. The reason is that the rewrite expects the nice-URL, not something that looks like '/pls/apex'.

Also note that I have included some criteria for when the map file will be checked, this is to exclude any number of requests not related directly to APEX pages. This is a very manual step, but can be minimized if you decide to prefix all your nice URLs with a common string. If you prefix your URLs you also will not force Apache to go through all the conditions before the attempted rewrite. A prefix kind of defeated the purpose of nice URLs for me, so I will update and reload Apache when changes are needed. A bit masochistic, I know...

The very cryptic "!NOT_FOUND" line above simply states that if the URL is not part of the map file, then don't rewrite (${nicetoapex:$1} will then have no value). There are probably easier ways to achieve this, but my knowledge of Apache is limited to what google serves me at the first page of the search result...

Apache will cache a copy of the map file for every start/restart/reload, and if the file is updated (mtime changed) it will re-read the file automagically. That is a nice feature, and saves you a ton of hassle.

You are not home free yet, APEX submits pages with relative paths (which is quite natural), but that will break your nice URL hierarchy. To fix this add:
## Handle calls to wwv_flow.accept
  RewriteCond %{REQUEST_URI} ^/wwv_flow.accept
  RewriteRule ^/(.*) /pls/apex/$1 [P]
This will rewrite all requests ending with wwv_flow.accept and point it to your dad-location. Depending on your application, it may be necessary to include more rewrite conditions for this rewriterule (like wwv_flow.show).

For good measure you can also include a rewrite to handle the default "start page" of your site with the following:
## Redirect server home page to Apex application
  RewriteRule ^/$ http://localhost:8080/pls/apex/f?p=1000:1:0 [P,L]
  RewriteRule ^/index.html$ http://localhost:8080/pls/apex/f?p=1000:1:0 [P,L]
Restart Apache for every change of httpd.conf for the changes to take effect.


Mapping URLs From Table Content
First off you have to define a relationship between the APEX-pages showing the content, and the database table where the content resides. The table APEX_PAGE_TYPE holds this information.
insert into app_page_type (
     page_id
   , source_table
   , source_table_pk_col
   , url_path
   , pk_page_parameter
   , name_column)
  values   (
     3                -- APEX page_id
   , 'ARTICLES'       -- Source Table
   , 'ARTICLES_ID'    -- Source table primary key
   , '/article'       -- Desired URL-prefix
   , 'P3_ARTICLE_ID'  -- APEX page item id that holds the primary key
   , 'TITLE')         -- Source table column that content title
/
commit
/
In the example above I want to generate a nice URL for articles in the ARTICLES-table. APEX page_id=3 will show the content, based on the value of page item P3_ARTICLE_ID. This bit you only have to do once for each page type/source table association.

When I insert/update a row in the ARTICLES-table with ARTICLE_ID=0, I can generate a nice URL with the following statements:
begin
   app_page_p.create_mapping( 'ARTICLES' -- source table name
                            , 0);        -- source table id
end;
/
commit
/
This is the code you normally would run following a publication of an article. If the TITLE-column has value 'APEX 4.0 - It is finally here!', your map-file should now contain the following:
#       Generated file, manual changes will suddenly disappear
#       Last generated: 03.02.2010 22:19:22
/article/apex_40__it_is_finally_here0.html /pls/apex/f?p=1000:3:0::::P3_ARTICLE_ID:0
You can have more than one page type associated with the same database table, and URLs for all pages will be generated from the statement above. If all works now, you should be able to point your browser to http://<yoursite>/article/apex_40__it_is_finally_here0.html

Manually Mappig URLs
Some APEX pages will probably not be directly linked to a database table, to accommodate this, the API offers a possibility to manually create page mappings with the following statements (note, there is no need for a page type to be defined first, as there is no content table to associate with that APEX page):
begin
   app_page_p.create_manual_mapping(2                          -- APEX page_id
                                   , '/pls/apex/f?p=1000:2:0'  -- APEX URL
                                   , '/articles.html'          -- Nice URL
                                   , 'Articles');              -- Nice name
end;
/
commit
/
If you ran this after the previous example, your map file should look like this:
#       Generated file, manual changes will suddenly disappear
#       Last generated: 03.02.2010 22:21:01
/articles.html /pls/apex/f?p=1000:2:0
/article/apex_40__it_is_finally_here0.html /pls/apex/f?p=1000:3:0::::P3_ARTICLE_ID:0
And likewise you should be able to see http://<yourhostname>/articles.html in a browser.

Pitfalls, Improvements and Considerations
What is the first thing you noticed about the API? No APP_ID-parameters! This version does not support more than one public application, but extending the code to accommodate this should be fairly trivial if the need should arise.

The second observation is the lack of support for more than one primary key column. The API must be modified to support this.

The third observation is the lack of APEX parameter support. Including RP or other such parameters will probably get onto your wish-list pretty soon. The API can be altered to achieve this, either through generating more mappings, or attaching parameters in a normal form to the nice-URL, and rewriting it to the correct APEX syntax in the Apache.

In the current version, there is no support for URL-hierarchy. If you want to incorporate an URL-path based on an hierarchy defined by the mapping API, you have to extend the API, or map it yourself by APP_PAGE_TYPE and APP_PAGE_URL_MAPPING.

You may also consider using the Apache httxt2dbm-utility to create a binary format DBM-file of the original map file, which is considerably faster than mapping with a txt-file. Adding a dbms_scheduler job to execute an external program should do the trick. Apache rewrite syntax is almost identical, you just have to tell it the map-type has changed.

I seriously wish there was an easy way to just delete a line from an existing file. I for one, could not conjure up a good way to do this from PL/SQL, hence the complete rewrite of the file. Maybe Perl could do a better job of it? Any old how, this portion of the API is well suited for a more asynchronous execution. In a normal production environment, the Apache is located in a DMZ, and not on the same server as the database. In that case the map file has to be moved from the database server to the webserver. The API does not support moving files through FTP/SFTP, but you can check out http://www.orafaq.com/scripts/plsql/ftpclient.txt for a plain PL/SQL FTP-solution, or be creative with an external procedure.

A Small Bonus
If you want to set the APEX page title according to the source table, you can modify the page template and include a function to return the nice_name column of the APP_PAGE_URL_MAPPING-table. The function will then be common for all pages in your application. The title of your webpage continually crops up as one of the more important tags for certain search engines.

In Conclusion
If you decide to use the API, then I suggest you start thinking about how to design your applications to use it, especially considering branching, tabs and other native APEX components that gravitate towards the normal APEX URL construct. If your application branches out with the old URL, there would be little point in attempting to rewrite the URLs. As I stated earlier, there are no silver bullets, and there will be more work than just using native APEX URLs.

If you are aiming at more sophisticated public applications, I would seriously consider NOT using it. This API (in it's present state at least) is aimed at more simple applications.

Oh, and did I mention you use this code at your own risk? Well, you do!

Enjoy :-)

Thursday, January 7, 2010

Oracle Spatial, APEX and Google Maps

In this post I will demonstrate how to use Oracle Spatial to store shapes, and utilize Oracle APEX and Google Maps API to visualize and manipulate the shapes.


The code is tested for Oracle 11g, but should work equally well for Oracle XE (no advanced Spatial operations), although I have noticed that Oracle XE is a bit more picky when it comes to valid shapes.

Demo Application
If there is too much text for you to read through, check out my demo Application on apex.oracle.com to see it in action :-)

The demo application is tested on Firefox ONLY, so expect issues when using other browsers!

Database objects
The code below will create a simple table with one column containing a sdo_geometry column called shapes, and a supporting package to mediate between APEX/Google Maps and the database table:
create table sample_polygon
(
   sample_polygon_id     number not null
 , name                  varchar2 (80) not null
 , description           varchar2 (4000)
 , zoom_level            number
 , shape                 mdsys.sdo_geometry
)
/

alter table sample_polygon add (
  constraint sample_polygon_pk
 primary key
 (sample_polygon_id))
/ 

create sequence sample_polygon_pk_seq
/

create or replace trigger trg_sample_polygon_bi
   before insert
   on sample_polygon
   referencing new as new old as old
   for each row
declare
   l_new_id       number;
begin
   if :new.sample_polygon_id is null
   then
      select   sample_polygon_pk_seq.nextval
        into   l_new_id
        from   dual;
      :new.sample_polygon_id   := l_new_id;
   end if;
exception
   when others
   then
      raise;
end trg_sample_polygon_bi;
/

create or replace package sample_polygon_p
as
   -- return string with coordinates
   function get_ordinates (
      p_shape                 in    sample_polygon.shape%type
   ) return varchar2;
   -- update table with shape
   procedure create_geometry (
       p_sample_polygon_id    in    sample_polygon.sample_polygon_id%type
   ,   p_name                 in    sample_polygon.name%type
   ,   p_description          in    sample_polygon.description%type
   ,   p_zoom_level           in    sample_polygon.zoom_level%type
   ,   p_shape                in    varchar2);
end;   
/

create or replace package body sample_polygon_p
as
   function get_ordinates (
      p_shape                 in    sample_polygon.shape%type
   ) return varchar2 is       
      l_poly   mdsys.sdo_ordinate_array;
      l_coords varchar2(4000);
      l_point  varchar2(20);
   begin
      l_poly := p_shape.sdo_ordinates;
      if l_poly.exists(1)
      then
         for i in l_poly.first..l_poly.last
         loop
            l_coords := l_coords ||','||replace(l_poly(i), ',', '.'); 
         end loop;
         l_coords := substr(l_coords, 2, length(l_coords));
      end if;
      return l_coords;
   end;
   procedure create_geometry (
       p_sample_polygon_id    in    sample_polygon.sample_polygon_id%type
   ,   p_name                 in    sample_polygon.name%type
   ,   p_description          in    sample_polygon.description%type
   ,   p_zoom_level           in    sample_polygon.zoom_level%type
   ,   p_shape                in    varchar2
   ) is
      l_sql       varchar2(32000);
      l_sample_polygon_id     sample_polygon.sample_polygon_id%type := p_sample_polygon_id;
   begin
      if l_sample_polygon_id is null
      then
         insert into sample_polygon
         (  name
         ,  description
         ,  zoom_level
         ) values
         (  p_name
         ,  p_description
         ,  p_zoom_level) returning sample_polygon_id into l_sample_polygon_id;
      else
         update sample_polygon
         set    name = p_name
         ,      description = p_description
         ,      zoom_level = p_zoom_level
         where  sample_polygon_id = l_sample_polygon_id;
      end if;
      -- 2003: Two dimentional polygon
      -- 4326: SRID
      -- 1,1003,1: one polygon (exterior polygon ring)
      l_sql := 'update sample_polygon'||chr(10)||
               'set    shape = sdo_geometry (2003'||chr(10)||
               '                            , ''4326'''||chr(10)||
               '                            , null'||chr(10)||
               '                            , sdo_elem_info_array(1,1003,1)'||chr(10)||
               '                            , sdo_ordinate_array( '|| p_shape ||')'||chr(10)||
               '                            )'||chr(10)||
               'where  sample_polygon_id = :1'; 
      execute immediate l_sql using in l_sample_polygon_id;
   end;
end;
/
You must have Oracle Spatial installed in your database (is part of default installation, including XE), and schema must have appropriate rights to invoke Spatial functions. If the above script delivers no error-messages you should be OK.

Why Spatial?
Well, why not? You already paid for it in you license, and quite frankly I can see no easier way to store your shapes. There are also a number of valuable functions that comes with Spatial, such as:
  • Location inside shape
  • Distance to nearest part of shape from location
  • Conversion between coordinate types
  • The ability to view shapes in other client tools like ESRI

I have very limited experience with Oracle Spatial, and the syntax for creating shapes as an SDO_GEOMETRY-data type was (is) a bit daunting.

Lessons learned are:
Take a close look at which coordinate system you are storing the points in. I have used the most common 4326, but if I understood correctly Google actually uses 3857 or 900913. This requires some further investigation.

The Google Maps API
There are already some examples on how to integrate Oracle APEX and Google Maps, and likewise, there are many examples on how to interact with Google Maps using the Google Maps API. I will list some of the resources on the subject:


The first thing to do when trying out this code, is to sign up and get a Google Maps API Key!

A tiny bit of Javascript
The meaning of "tiny" being somewhat stretched here. Even if you are not familiar with Javascript, it is surprisingly easy to pick up. This script might not be the best place to start, but I encourage you to look at the source code from my demo application in detail to see what it does.

I will not do a full listing here, but highlight some of the more important functions and give a brief top-down view of what goes on.

On page rendering:
  • Javascript function buildMap is called
  • Invokes Google Maps API
  • Draws a map
  • Adds onClick-listener to the map to enable the user to create shapes

On click in map:
  • Draws a new marker on the map
  • Adds a new point to points array (this is what gets stored in the database)
  • Redraws the polygon on the map

On click on a marker in the map
  • Removes the marker
  • Removes the point from the points array
  • Redraws the map

Edit mode:
  • Redraws map to ensure completeness of polygon
  • Removes markers
  • Makes polygon editable
  • Removes onClick-listener on Map (Adding new points/markers in the map is no longer possible)

Exit edit mode:
  • Repopulates points array from polygon
  • Makes polygon uneditable (is that even a word?)
  • Redraws polygon on map with markers

Create new shape:
  • Removes all overlays from map (markers and polygon)
  • Resets all arrays and variables
  • Reattaches onClick-listener to map

For more details, you have to dig into the source code (view source in the demo application).

Merging Oracle Spatial and Google Maps
This not advanced mathematics, I have two functions in the database:
  • One which extracts ordinates (as they are called, don't ask me why) from the stored SDO_GEOMETRY-object, and return the ordinates as a comma-separated string
  • One which accepts a shape as a comma-separated string and creates a SDO_GEOMETRY-object

I also have two javascript functions:
  • One which accepts a string of comma separated coordinates, splits it into an array og "Google points"
  • One which accepts an array of "Google points", and returns a comma separated string (well, not exactly, it "returns" by setting an APEX item...)

Bringing it All Together
Oracle Apex is the glue between Google Maps and Oracle Spatial, but the actual coding in APEX to achieve this is quite little.

In my demo application I have:
  • Region containing a DIV and all necessary Javascrips to show and interact with Google Maps
  • Region based on a stored procedure to store shapes
  • Region containing a report of stored shapes

That's it, enjoy :-)

Friday, November 13, 2009

Interactive Reports Not Working Properly on New Install

This is just a quick note to self, but may be useful to others experiencing similar problems. I would not recommend the Fix Two solution for production environments before conducting further research though...

Preface
Recently I installed Oracle Apex 3.2 in an existing database. Plain vanilla install on an Oracle 9.2.0.5 database with an Oracle iAS 9.2.0.3 in front. Installation went smooth, demo application works. Shift. Different site, same plain install, but this time on a 10.2.0.4 database, and 10.1.2 application server.

The Demo Effect
What is the first thing you show off when demonstrating Apex? The interactive reports, that is a no-brainer :-) But at both of these installations, something goes wrong. When clicking a report column, I just get the spinning wheel, no other response. What is amiss?

Dissecting the Problem
At the first location, I had no time and no Firebug. At the second location I had both, and two failed installations creating some sort of consistency.

In the Firebug console, I can see a javascript error pops up when a column is clicked. Sometimes it throws string not terminated error, sometimes some other cryptic message, but always the same javascript function. Examining the response in Firebug shows something odd; the response is cut short. Depending on the distinct values of the column I clicked, the response might be cut inside a string (string not terminated error), or in-between. When clicking numeric columns, it works. Hm... Special characters? NLS?

Fix One
Patching the Apex installation to 3.2.1 worked for the installation on the 10g system. IR's started working when the patch was applied.

One down, one to go...

Fix Two (The Dirty Fix)
Examining dads.conf for the 9i installation, I see previously configured dads has a different setting of PlsqlNLSLanguage. Both installs go against databases with NLS_LANG=AMERICAN_AMERICA.WE8MSWIN1252 (don't ask me why, seems like a popular choice for older systems in Norway).

Changing from PlsqlNLSLanguage from AMERICAN_AMERICA.ALUTF8 to AMERICAN_AMERICA.WE8MSWIN1252 did the trick, IR's are now working as expected. I have not noticed anything else breaking (yet), but I am not at all comfortable with the workaround.

The documentation clearly states:
"The character set portion of the PlsqlNLSLanguage value must be set to AL32UTF8, regardless of whether or not the database character set is AL32UTF8."
or for older versions of iAS:
"The character set portion of the nls_lang value must always be set to AL32UTF8, regardless of whether or not the database character set is AL32UTF8."

Luckily, the 9i installation will not go to production in it's current state. Phew...

Tuesday, November 3, 2009

PL/SQL and Gmail (or UTL_SMTP with SSL)

In this post I will describe how to send mail from an Oracle database using UTL_SMTP over SSL using Stunnel. I conducted the test on Windows XP with Oracle database 11gR1, but it should work for nix-operating systems and database versions 9.2 and up. To be quite frank, this is nothing new, but it might be of use anyway.

Preface
I wanted to send emails from my database when some data changes. It was not a corporate solution with access to an internal smtp-host. A simple, accessible, ISP agnostic smtp-server would do. In my case, Gmail fitted the bill, only problem was that Gmail required SSL, which UTL_SMTP does not support. I am up for a challenge (meaning: I am good at complicating (ing, not ed :-)) things), so here goes...

Stunnel
Since UTL_SMTP does not support SSL, I will use a third party tool to "wrap" my connection. There are probably any number of tools which can do this, but Stunnel is quite often referred to, and very easy to install and configure. For nix systems, I suggest checking the Examples-page on stunnel.org, this is a Windows-specific explanation. This part of the post is based on a thread on ez.no.

Installing and configuring Stunnel
  • Go to stunnel.org and download the latest Windows binaries
  • Install Stunnel (take note of the installation path), in my example it is c:\stunnel
  • Edit the file stunnel.conf located in installation folder to (just backup the original, and replace all the original text with the text below):
; Use it for client mode
client = yes

[ssmtp]
accept  = 1925
connect = smtp.gmail.com:465
Here I use port 1925 on my localhost (unused as far as I know) to connect to smtp.gmail.com.

Start Stunnel.exe, and test the configuration:
  • Start cmd
  • Write: telnet localhost 1925
  • You should then see something like "220 mx.google.com ESMTP 5sm18031572eyh.34"
  • Write: quit

Troubleshooting: If you cannot reach smtp.gmail.com, there can be any number of things gone wrong.
  • Try a normal ping to smtp.gmail.com
  • Check to see if stunnel.exe is excepted properly in all firewalls (Windows native and other software firewalls)

Once stunnel is working, and if you are familiar with UTL_SMTP, don't bother reading on. This is the same as UTL_SMTP with any other smtp-host requiring authentication.

Setting up ACL (11g only)
This is more or less monkeyed from Arup Nandas 11g series.

To create an access control list for your application user, and enabling it to connect to localhost on port 1925, do the following:
-- create acl
begin
        dbms_network_acl_admin.create_acl (
                acl             => 'gmail.xml',
                description     => 'Normal Access',
                principal       => 'CONNECT',
                is_grant        => TRUE,
                privilege       => 'connect',
                start_date      => null,
                end_date        => null
        );
end;
/
-- add priviliege to acl
begin
  dbms_network_acl_admin.add_privilege ( 
  acl       => 'gmail.xml',
  principal    => '<YOUR SCHEMA USER>',
  is_grant    => TRUE, 
  privilege    => 'connect', 
  start_date    => null, 
  end_date    => null); 
end;
/
-- assign host, port to acl
begin
  dbms_network_acl_admin.assign_acl (
  acl => 'gmail.xml',
  host => 'localhost',
  lower_port => 1925,
  upper_port => 1925);
end;
/
And you are ready to use UTL_SMTP against smtp.gmail.com.

Wrapping UTL_SMTP
I have created a small test-package based on the old UTL_MAIL example from Oracle. Your schema user must have execute privileges on UTL_SMTP and UTL_ENCODE for this to work:
create or replace package apex_mail_p
is
   g_smtp_host      varchar2 (256)     := 'localhost';
   g_smtp_port      pls_integer        := 1925;
   g_smtp_domain    varchar2 (256)     := 'gmail.com';
   g_mailer_id constant varchar2 (256) := 'Mailer by Oracle UTL_SMTP';
   -- send mail using UTL_SMTP
   procedure mail (
      p_sender in varchar2
    , p_recipient in varchar2
    , p_subject in varchar2
    , p_message in varchar2
   );
end;
/
create or replace package body apex_mail_p
is
   -- Write a MIME header
   procedure write_mime_header (
      p_conn in out nocopy utl_smtp.connection
    , p_name in varchar2
    , p_value in varchar2
   )
   is
   begin
      utl_smtp.write_data ( p_conn
                          , p_name || ': ' || p_value || utl_tcp.crlf
      );
   end;
   procedure mail (
      p_sender in varchar2
    , p_recipient in varchar2
    , p_subject in varchar2
    , p_message in varchar2
   )
   is
      l_conn           utl_smtp.connection;
      nls_charset    varchar2(255);
   begin
      -- get characterset
      select value
      into   nls_charset
      from   nls_database_parameters
      where  parameter = 'NLS_CHARACTERSET';
      -- establish connection and autheticate
      l_conn   := utl_smtp.open_connection (g_smtp_host, g_smtp_port);
      utl_smtp.ehlo(l_conn, g_smtp_domain);  
      utl_smtp.command(l_conn, 'auth login');
      utl_smtp.command(l_conn,utl_encode.text_encode('<your gmail account including @gmail.com>', nls_charset, 1));
      utl_smtp.command(l_conn, utl_encode.text_encode('<your gmail account password>', nls_charset, 1));
      -- set from/recipient
      utl_smtp.command(l_conn, 'MAIL FROM: <'||p_sender||'>');
      utl_smtp.command(l_conn, 'RCPT TO: <'||p_recipient||'>');
      -- write mime headers
      utl_smtp.open_data (l_conn);
      write_mime_header (l_conn, 'From', p_sender);
      write_mime_header (l_conn, 'To', p_recipient);
      write_mime_header (l_conn, 'Subject', p_subject);
      write_mime_header (l_conn, 'Content-Type', 'text/plain');
      write_mime_header (l_conn, 'X-Mailer', g_mailer_id);
      utl_smtp.write_data (l_conn, utl_tcp.crlf);
      -- write message body
      utl_smtp.write_data (l_conn, p_message);
      utl_smtp.close_data (l_conn);
      -- end connection
      utl_smtp.quit (l_conn);
   exception
      when others
      then
         begin
           utl_smtp.quit(l_conn);
         exception
           when others then
             null;
         end;
         raise_application_error(-20000,'Failed to send mail due to the following error: ' || sqlerrm);   
   end;
end;
/
This is NOT production-ready code: First of all, you do not want your credentials in the open, at least obfuscate the package body.

Some notes on the package:
  • Parameters sender and recipient must contain e-mail addresses only, use the get_address function in the original Oracle example for more sophisticated use (you can also look at how to add attachments if you have the need).
  • I had some trouble encoding my account name and password. My initial thought was to use utl_raw.cast_to_raw and utl_encode.base64_encode, but this did not work, so I ended up using utl_encode.encode_text
  • Mime-type is set to "text/plain", set it to "text-html; charset=<something appropriate>" to enhance visual layout

Sending an E-mail
To test it all, try:
begin
   apex_mail_p.mail('<your gmail address>', '<recipient address>', '<Subject>', '<message body>');
end;
/
And you are done!

Well, if you don't get any error messages, that is. If you encounter any exceptions, first of all check your Gmail credentials. Next, check where (in the PL/SQL-code) it fails, and use your favorite search engine to do a combined search on smtp command sent and smtp-error received. Chances are others have worked through the same problems (even if they are not Oracle-related). Last resort is to use telnet and manually type the commands, a bit cumbersome but gives full control.

Happy coding :-)

Wednesday, October 7, 2009

Storing images outside Oracle XE with Apex

Although Oracle XE gives great value for money, 4Gb is not that much in this multimedia age. In this post I will describe how to store, resize and fetch images using Oracle XE and Apex, without having to worry about images eating of the precious 4Gb storage limit. If you don't run XE, you can use Ordimage data type and methods in the database to achieve the same result.

Parts of the solution is OS specific (due to third party image processing applications), and in this case based on a Windows operating system. Adapting the implementation for your favorite nix system should not be that different.

I will try to be structured in my explanation, but I am a developer after all... My pedagogical skills (or lack there of) lead me to explain this from the bottom up, breaking it up in uneven pieces:
  • Resize images with IrfanView using CLI
  • Create OS script for resizing images
  • Create PL/SQL procedure to execute OS script from XE
  • Create PL/SQL procedure to write blob to OS file system
  • Create PL/SQL procedure for custom upload from APEX
  • Create Apex page to upload image
You can download the full source code from my demo application.

Pre-requisites
To follow the example, you must have the following installed and working correctly:
  • Oracle XE
  • Oracle Apex 3.2
  • IrfanView v4.1 or later
  • Apache HTTP Server running as reverse proxy in front of EPG (if you want to view the images after uploading them...)
Some notes on the configuration: The process is only tested for Oracle Apex 3.2, but I see no reason why it should not work for older versions. You can change from Apache to your favorite web server, but some of the steps in this post will be different. This also applies to IrfanView, any other image processing application with command line interface will probably do the job just as well.

Resizing images with IrfanView
As mentioned earlier, I will use IrfanView in this example, it sports a CLI which is fast, and cover my needs. The CLI is also documented with examples, that helps... I am sure there are alternatives to IrfanView for Linux which has the same features, but I have not looked for any (yet).

The syntax for resizing images are quite straight forward:
<installDir>\i_view32.exe <originalImage> /resize=(x,y) /convert=<outputImage>
  • instalDir is full path to where IrfanView was installed
  • originalImage is full path and file name of image to be resized
  • outputImage is full path and file name of resized image
  • /resize=(x,y) - set pixels for resized image
I also add some other switches for good measure:
  • /aspectratio - keep aspect ratio
  • /jpgq=90 - drop quality to 90% of original
  • /resample - for better quality
The command line now becomes:
<installDir>\i_view32.exe <originalImage> /resize=(1024,768) /aspectratio /jpgq=90 /resample /convert=<outputImage>

Try it out on an existing image to make sure it works.

Creating OS script for resizing images
Next up is packing the command lines into a .bat-file. The reason for this is twofold. First, I want to simplify the call from the database which will be executing the script later on. Second, I will resize each image more than once to get both thumbnails and "web friendly" versions of the images. Digital cameras today have an extraordinary amount of pixels, and way to much just for displaying it on a web page.

I will also create a folder structure to store the images in, this is the folder that will be used by Apache to serve the images. I will work with the following directory structure:
  • c:\www\ttr\images\orig
  • c:\www\ttr\images\thumb
  • c:\www\ttr\images\web
Folder "orig" will contain the original images, web contains images resized to a web-friendly 1024x768, and thumbs will contain 150x150 images.

Create a script that looks like this:

set PATH=%PATH%;C:\WINDOWS\system32;
echo Start %date% %time% >> C:\Oracle\OraXE\images\test.txt
C:\Programfiler\IrfanView\i_view32.exe C:\www\ttr\images\orig\%1 /resize=(1024,768) /aspectratio /resample /jpgq=90 /convert=C:\www\ttr\images\web\%1
C:\Programfiler\IrfanView\i_view32.exe C:\www\ttr\images\orig\%1 /resize=(150,150) /aspectratio /resample /jpgq=90 /convert=C:\www\ttr\images\thumb\%1
echo End %date% %time% >> C:\Oracle\OraXE\images\test.txt

and save it as resize.bat in the images folder. Remember to replace the physical path names to your own structure. I have added setting of PATH environment variable to ensure the script can locate any additional files in the system32 folder, and echo Start/End to log how long time the conversion takes.

Place an image in the c:\www\ttr\images\orig folder, and run resize.bat to check if it works.

Execute OS-script from XE
I will cheat! In fact, I will cheat twice while I am at it.

First of all I will be using dbms_scheduler to execute host commands. In order to avoid any hassle setting up the OracleXEClrAgent and user rights, I will create a procedure in SYS schema to create and execute the job. Not using invokers rights here avoids the whole user rights shebang. See how easy it is shooting yourself in the foot?

Second, I will leave the script wide open any kind of host script. As an afterthought, I will grant schema user anything to get it going too! Feel comfortable putting this in production? If you do, go take a cold shower!

Log in as SYS and create a procedure to run arbitrary host scripts (...and such a big gun too...):

create or replace procedure resize_image (
p_script_name in varchar2
, p_image_name in varchar2
, p_directory in varchar2
) as
begin
dbms_scheduler.create_job(
'imgres'
, job_action=>'C:\WINDOWS\system32\cmd.exe'
, number_of_arguments=>4
, job_type=>'executable'
, enabled=>false);
dbms_scheduler.set_job_argument_value('imgres',1,'/q');
dbms_scheduler.set_job_argument_value('imgres',2,'/c');
dbms_scheduler.set_job_argument_value('imgres',3,p_directory||p_script_name);
dbms_scheduler.set_job_argument_value('imgres',4,p_image_name);
dbms_scheduler.enable('imgres');
end;
/
grant execute on sys.resize_image to <app_schema>
/

To test the procedure, make sure there web and thumb OS-directories are empty, and an image is placed in the orig directory.

Log in as your <app_schema> user and execute the following script in SQL*Plus:

begin
sys.resize_image('imgres.bat', '<myImageFile>', 'C:\www\ttr\images\');
end;
/

If it comes up with a user rights error, try granting "create job", "create external job" and/or "manage scheduler" to your schema user. If it still does not work, check dbms_scheduler_job_log for any error messages. If you are unable to correct your job,search the web, there are plenty of people who has run into the same issue.

When this piece works, the rest is a breeze.

Writing BLOB to OS-file
There a ton of examples on how to do this, see Dr. Tim Hall's ftp-package, or the Extract BLOB Demo of Morgan's Library.
My version is in between those two, but will do the job:

create or replace directory IMAGES as 'C:\www\ttr\images\orig\'
/

create or replace procedure write_to_file (
p_file_name in varchar2
, p_directory in varchar2
, p_content in blob
) is
l_file utl_file.file_type;
l_buffer raw(32000);
l_amount binary_integer := 32000;
l_pos integer := 1;
l_blob blob;
l_blob_left number;
l_blob_length number;
begin
l_blob_length := dbms_lob.getlength(p_content);
l_blob_left := l_blob_length;
-- open the destination file.
l_file := utl_file.fopen(p_directory,p_file_name,'WB', 32760);
-- read chunks of the blob and write them to the file
-- until complete.
-- if small enough for a single write
if l_blob_length < 32760 then
utl_file.put_raw(l_file,p_content);
utl_file.fflush(l_file);
else -- write in pieces
l_pos := 1;
while l_pos < l_blob_length
loop
dbms_lob.read(p_content,l_amount,l_pos,l_buffer);
utl_file.put_raw(l_file,l_buffer);
utl_file.fflush(l_file);
-- set the start position for the next cut
l_pos := l_pos + l_amount;
-- set the end position if less than 32000 bytes
l_blob_left := l_blob_left - l_amount;
if l_blob_left < 32000 then
l_amount := l_blob_left;
end if;
end loop;
end if;
utl_file.fclose(l_file);
exception
when others then
-- close the file if something goes wrong.
if utl_file.is_open(l_file) then
utl_file.fclose(l_file);
end if;
raise;
end;
/

To test the procedure, you can run the following as schema user:

declare
l_file blob;
l_content clob := 'This is soon to be a blob';
l_src_offset integer := 1;
l_dest_offset integer := 1;
l_lang_ctx integer := dbms_lob.default_lang_ctx;
l_warn integer;
begin
dbms_lob.createtemporary(l_file, false);
dbms_lob.converttoblob(l_file, l_content, dbms_lob.getlength(l_content), l_dest_offset, l_src_offset, 1, l_lang_ctx, l_warn);
write_to_file('testfile.txt', 'IMAGES', l_file);
dbms_lob.freetemporary(l_file);
exception
when others
then
dbms_lob.freetemporary(l_file);
raise;
end;
/

After running this, you should see a file called "testfile.txt" in the os-directory where the images will be placed later.

Note to self: this would probably be more elegant using BFILE.

Creating a custom upload procedure
Apex is goodhearted enough to take care of all the tedious bits of code to bring the image from your client into the database. If you want to do something more with it, you must create it yourself. Luckily it is not that hard. There are also some good examples of how to do this out there, including one from Oracle in the official documentation.

Here I will bring the pieces together, the procedure below calls on both write_to_file-procedure and sys.image_resize-procedure after inserting the image in a custom table. My table is called MY_IMAGE, take care to change this and other bits to your implementation.

create or replace procedure store_image (
p_file_name in varchar2
, p_description in varchar2
)
is
l_image_id my_image.my_image_id%type;
l_file blob;
l_mime_type apex_application_files.mime_type%type;
l_name apex_application_files.name%type;
l_file_ext varchar2(255) := regexp_substr(p_file_name, '\..*$');
begin
-- get file from apex files
select name
, mime_type
, blob_content
into l_name
, l_mime_type
, l_file
from apex_application_files
where name = p_file_name;
-- insert record into images table
insert into my_image ( filename
, mime_type
, description)
values ( l_name
, l_mime_type
, p_description)
returning my_image_id into l_image_id;
-- insert file to os, use table pk as file name
write_to_file(l_image_id||l_file_ext, 'IMAGES', l_file);
-- resize image, could check for mime-type here
sys.resize_image('imgres.bat', l_image_id||l_file_ext, 'C:\www\ttr\images\');
-- delete file from apex files when done
delete from apex_application_files
where name = p_file_name;
end store_image;
/

In order to test this, you need to create an Apex page with a file upload form region. In the following example, make sure your Apex application uses the same parsing schema as above, or have been granted appropriate rights to execute them.
  • Create a new empty page
  • Create a form based on a procedure
  • Choose procedure STORE_IMAGE
  • Display PXX_FILE_NAME as a Browse item and PXX_DESCRIPTION as textarea
  • Next-Next-Create (or something close to it)
And you are done.

When you run the page, choose an image, click Submit, three versions of the image should now appear in the three images folders (original, web friendly and thumbnail).

Viewing the images
For this, you need the Apache webserver. First of all, you must edit httpd.conf (AFTER backing it up first, of course!), include the lines:

Alias /ttrimg "C:/www/ttr/images"
<Directory "C:/www/ttr/images">
Options Indexes FollowSymLinks MultiViews ExecCGI
AllowOverride All
Order allow,deny
Allow from all
</Directory>

somewhere at the bottom of the file. If you use VirtualHost directives, be sure to include it inside the directive.

This is just a suggestion, options and access rights must be adapted to your needs (the Alias above is very public), the same goes for the location of the images directory. You must restart the Apache for the changes to take effect.

The images can now be reached with the following URL construct:
  • Original image:
    http://<yourserver>:<port>/ttrimg/orig/<my_image.file_name>
  • Web friendly image:
    http://<yourserver>:<port>/ttrimg/web/<my_image.file_name>
  • Thumbnail:
    http://<yourserver>:<port>/ttrimg/thumb/<my_image.file_name>

Cool?
Well, isn't it?!? Not breaking native Apex upload functionality, and not eating of the precious 4Gb. Me like :-)

The whole source code including create user, grants, .bat, apex app, etc. can be downloaded from my demo application.

PS:
This was just an example, there are things I would do before moving on:
  • Waiting for the image to be resized: Scheduler jobs are by nature asynchronous (unless you use the "use_current_session"-thingy), so in order to view your images immediately after upload, you must code your own "pause"-procedure (and probably check dbms_scheduler_job_log)
  • The naming of the OS-files does not say much, there are room for improvement here. This also means my_image-table must be updated accordingly. BFILE again?
  • Directory structure should be reconsidered if you expect a large amount of images, sub folders can be useful
  • Extract and retrieve EXIF information back into the database?