Index Options


The options page is used to designate additional information about the indexing operation:

Zip Encryption Options Language Options General Options Include/Exclude files includedlg.gif (23087 bytes)

General Options

  • Treat Zip Files as directories - check this option if you want files inside Zip archives to be indexed. These files can then be searched and viewed just like any other files. Note that you should not add the zip files themselves to the include files list.
  • Include Word Counts - check this box if you want Wilbur to keep track of how many times each word appears in each file. This value is displayed as one of the columns in the file list pane and the results can be sorted by this ranking. To save space, only the first 256 occurrences are counted (one byte’s worth). If the size of the index is an issue, you can clear this box to save some space.
  • Track All Files - when this box is checked Wilbur will index information on all files in all directories it visits, not just the files in the include list.  Files not in the include list won't have their contents indexed, but file name, file folder, size, date modified and attribute information will be indexed and can be searched for.

Wilbur will not include file information for folders that are not in either the include list or one of their subdirectories.  If you wanted to include all files on your machine, but did not want to index the contents of the files, you could use a fake include like c:\*.xxx to force Wilbur into all folders.  Of course if you are already using something like c:\*.doc this would not be necessary.

  • Minimum Word Length - this is the smallest word that Wilbur will index. The default value is three, but could be increased to cut down on the number of nonsense words that Wilbur indexes and hence reduce the size of the index. Of course this would mean that searches for things like IBM would no longer be possible. You can also make this value smaller, but risk including a lot of inappropriate stuff when indexing binary files such as word processing documents.
  • Maximum Word Length - this is the largest word that Wilbur will index. Like the minimum, this can be modified as appropriate for the material you are indexing. For example programmers indexing source code would probably want fairly large values since variable and routine names can often be quite long. A value of zero has a special meaning. It causes Wilbur to use a value of 100 characters on material that appears to be pure text and a value of 20 for files which appear to contain binary information. This was the behaviour of Wilbur versions prior to 1.5.

Additional Characters to Index

For more control over the characters considered significant the following options are provided:

Numbers - the options available are:

  • No Numbers - the default
  • Trailing numbers only - the number characters can actually be anywhere in the word as long as the word is not started by numbers.
  • All numbers - number characters are just as significant as alphabetic characters. Of course in some material this will greatly increase the number of unique words indexed.

Other Characters

You can explicitly specify characters that are to be considered valid. If your language is not among the few listed above, just enter the additional characters required here.

Characters placed in the ‘Others anywhere’ box are valid anywhere in a word while characters in the ‘Others not starting word’ box are not valid as the first character in a word.

For instance someone who wanted to search for the term C++ in resume files could accomplish this by placing a single plus sign in the ‘Others not starting word’ box. Obviously you would not want to do this if you were indexing program source code since the plus sign would often be the terminator for a variable name.

Note that if you include characters such as * or ? which have special meaning in searches, they will lose their special meaning and be treated just like other characters.


Copyright © 1999 RedTree Development Inc. All rights reserved.
Information in this document is subject to change without notice.
Other products and companies referred to herein are trademarks or registered trademarks of their respective companies or mark holders.