Thursday, May 27, 2010

Using Oracle to search text documents

We can use Oracle*Text utility to search through large amounts of text stored in documents like MS-Word, MS-xls , PDF, XML, HTML, RTF or txt.
Oracle Text(also known as interMedia Text and ConText) is an extensive full text indexing technology allowing you to parse through a large text column and efficiently query free text.

Oracle Text has several index types. However to search large amounts of text, we need to use the CONTEXT Index.

To achieve the search capability, I store the documents in a BLOB column. Using a CLOB is preferable if only plain text documents are being used. Lets assume our table is named "USER_DOCUMENTS" and it has a BLOB column "DOC" that stores the actual file. To create the CONTEXT type index on the "DOC" column we need to:

CREATE INDEX user_documents_index ON user_documents(doc) INDEXTYPE IS CTXSYS.CONTEXT;

Now to perform free text search on the documents we need to use the CONTAINS clause. The Oracle's basic syntax is:

CONTAINS(
[schema.]column,
text_query VARCHAR2
[,label NUMBER])
RETURN NUMBER;

[schema.]column:
Specify the text column to be searched on. This column must have a Text index associated with it.

text_query:
the query expression that defines your search in column.

label:
Optionally specify the label that identifies the score generated by the CONTAINS operator.

Returns:
For each row selected, CONTAINS returns a number between 0 and 100 that indicates how relevant the document row is to the query. The number 0 means that Oracle Text found no matches in the row.

Note:
The CONTAINS operator must be followed by an expression such as > 0, which specifies that the score value calculated must be greater than zero for the row to be selected.

For example, to search for 'oracle' in all the docs of the user_documents table, we will fire:

SELECT SCORE(1), ud.* from user_documents ud WHERE CONTAINS(doc, 'oracle', 1) > 0 ORDER BY SCORE(1) DESC;

The query will return a list of all the docs having the keyword 'oracle' and sort them based on their relevance.

Also, remember to rebuild the index everytime you add docs to the table during development and testing phase. For production systems, based on your load and usage, decide an optimal build schedule.

Hope this helps!

Friday, May 21, 2010

Java Keytool - Self-Signed SSL Certificate

Keytool is a key and certificate management utility.
In this post, I list down a few useful commands that will help you generate self-signed certificates for development purposes. For production systems, do not use keytool to generate certificates. Use those provided by CAs like VeriSign or thawte. Self-signed certificates are challenged by browsers and that creates a poor user interaction every time they go to your site.

Definitions:
Keystore - A keystore is a database (usually a file) that can contain trusted certificates and combinations of private keys with their corresponding certficiates.
Alias - All keystore entries (key and trusted certificate entries) are accessed via unique aliases
cacerts - The "cacerts" file represents a system-wide keystore with CA certificates. It resides in the security properties directory, $JAVA_HOME/jre/lib/security
Certificate - A certificate (also known as a public-key certificate) is a digitally signed statement from one entity (the issuer), saying that the public key (and some other information) of another entity (the subject) has some specific value.

Prerequisites:
-> JDK 1.3+ installed and JAVA_HOME set to the directory where you have installed JDK

Notes:
-> For this example, lets call our alias "my_alias"
-> For this example, lets call our certificate "my_cert.crt"

Go to $JAVA_HOME/bin directory

# Generate the keystore file (the following command will ask few questions, at the end it will generate a .keystore file - changeit is the password, you can put whatever you want to, just dont forget it :))
> keytool -genkey -alias my_alias -keypass changeit -keyalg RSA

# Export the .keystore file to generate the certificate (the following command will ask for the password and then generate a my_cert.crt file)
> keytool -export -alias my_alias -keypass changeit -file my_cert.crt

At this stage we have the certificate file ready, we can use this certificate file and point our server's trustedFile source to it. However for certain services like CAS, the certificate needs to be imported in JDK trusted certificate file - cacerts.

# Import the certificate file to the cacerts file (the following command will ask for the password and confirm the certificate you are trying to import)
> keytool -import -file my_cert.crt -keypass changeit -keystore $JAVA_HOME/jre/lib/security/cacerts

Other useful keytool commands

# List all .keystore certificates
> keytool -list -v

# List one .keystore certificate
> keytool -list -v -alias my_alias

# List all .keystore certificates in a specific keystore
> keytool -list -keystore

# Remove certificate from cacerts file
> keytool -delete -alias my_alias -keystore $JAVA_HOME/jre/lib/security/cacerts

#Remove a certificate from the default .keystore
> keytool -delete -alias my_alias

As always there is "man" help available!

Hope this helps!

Thursday, May 20, 2010

Javascript Cookies

In one of my earlier posts I had handled cookies in JSP. Here are a few methods to handle cookies using Javascript. I have written three methods that will do all that you need to manage cookies
1. Write/Create cookie
2. Read Cookie
3. Delete Cookie (Write the cookie with a prior date)





Hopefully, this helps.