Adding External Analyzers

Below are the contents of an example configuration file which adds a couple of external analyzers to Wilma. The analyzers must be console based programs that can be run from the command line with arguments for the source and destination file. The example given is for a Mac, but only the path names would be different in Windows or Linux (remember that the path separator on Windows is a backslash). This file must be placed in the directory where your Wilma application data lives:

OS/X - /Users/yourname/Application Support/Library/Wilma/externanalyzers.txt
Windows - C:\Documents and Settings\yourname\Application Data\Wilma\externanalyzers.txt
Linux - /home/yourname/.wilma/externanalyzers.txt

However the file can be edited from inside Wilma by choosing "External Analyzers" from the Index menu. If the file does not exist, the example below will be used as a template.

Most of the file consists of comments which explain the instructions it contains.


# this file lists any external programs that can be run to process a file into
# something Wilma can read.
# A # character means the rest of the line is a comment, unless it appears
# doubled ## in which case it is treated as a single #
#
# Each entry has three lines. Blank lines or lines starting with a comment
# character are ignored.  The three lines making up a entry must be, in order:
#
# 1 - the name used to refer to the analyzer
#
# 2 - the extension to be used for the temporary file, which will determine the
#     analyzer to be used on it. If this is three dots "..." then the existing
#     extension and dot will be stripped and any preceding extension will be
#     used.
#
# 3 - the command to be issued to the system, to run the analyzer program.  The
#     symbol %i will be replaced by the path of the input file and %o will be
#     replaced with the path of a temporary file that will be used by the
#     analyzer determined by field 2.  It will usually be a good idea to
#     enclose both the %i and %o in quotes so paths with spaces are handled
#     correctly.  The temporary file will be deleted when Wilma is done with
#     it, so the command should not delete the original input file (i.e use -k
#     with something like bzip2)

# pdftotext does a much better job than the internal analyzer at figuring out
# pdf files. Note each analyzer must have the three separate lines discussed
# above. Note that all the examples are commented out and you would need to
# remove the # character from the beginning of the lines for them to have
# effect.

#pdftotext
#txt
#/usr/local/bin/pdftotext "%i" "%o"

# bzip2 assumes extension like xxx.tar.bz which would become a temporary ending
# in xxx.tar the -k flag is vital or the indexing process will destroy the
# files

#bzip2
#...
#/usr/bin/bzip2 -k -d -c "%i" >"%o"

# the examples below were submitted by a user running Ubuntu Linux.  Windows users
# might be able to find some of these by using the Cygwin (www.cygwin.com) tools
# that mimic many Unix functions

#ps2txt
#txt
#/usr/bin/ps2ascii "%i" >"%o"

#pdf2txt
#txt
#/usr/bin/pdftotext "%i" "%o"

#doc2txt
#txt
#/usr/bin/catdoc -w "%i" >"%o"

#rtf2txt
#txt
#/usr/bin/catdoc -w "%i" >"%o"

#wri2txt
#txt
#/usr/bin/catdoc -w "%i" >"%o"

#ppt2txt
#txt
#/usr/bin/catppt "%i" >"%o"

#xls2csv
#txt
#/usr/bin/xls2csv "%i" >"%o"

# the last command must have a newline at the end.