Character sets
SiteSearch Indexer supports the ANSI character set (usually the standard in Windows), as well as ISO-8859-1 (Latin 1) and Windows-1252. This version of SiteSearch Indexer supports Unicode-encoded web pages in web-crawler mode but will interpret Unicode as garbage characters in local-file mode. This is because web-crawler mode makes use of the Microsoft® Internet Explorer web browser to conduct a crawler session, but local html files are opened as text.
Utilities are available to convert from one format to another. For instance, if you open a Unicode encoded html page in Wordpad and try to save it, you will notice that the "Save as type" field is set to "Unicode Text document". You can change that to "Text Document" and save your file as ANSI.
Blocking page content from being indexed
If you want a page to be indexed, but want to block some of its content from being indexed, use the <NOINDEX> tag in your html. Text and html within a NOINDEX tag set will not be indexed for search. This is ideal for navigation and other repetitive text that may detract from the effectiveness of a full-text search. The NOINDEX tag will not alter the appearance or functionality of your page.
Usage:
<NOINDEX>Here is some text or html that I do not want indexed for search</NOINDEX>
The Keywords META tag
The only META tag that SiteSearch Indexer makes use of is the Keywords meta tag. The content of your Keywords meta tag will be indexed for search, but will not appear in your search results page. So, it is a good place to hide searchable content.
Usage:
<meta name="Keywords" content="donuts, bagels, breakfast sandwiches">
Optimizing a web-crawler session
This tip is especially useful for users who wish to index a website over a dial-up connection (modem).
Since SiteSearch Indexer uses Microsoft® Internet Explorer
to conduct a crawler session, you can use some of the browser's settings to optimize your crawler session for speed.
Before you begin your crawler session, launch Microsoft® Internet Explorer, click the Tools menu, then click Internet Options, and the dialog pictured below will appear. Click the Advanced
tab, and un-check the following settings:
|
|
![]() |
||||||||||||||||
Working with Adobe® Acrobat®
It is always a good idea to optimize Acrobat PDF files before
release as they are normally faster to load and take a little less disk space. Plus, unoptimized pdf files may contain "garbage characters"
that may have negative effects on your search. To optimize, simply select File/Batch Processing/Fast Web View (in Acrobat 5). You will
be prompted to select a folder to perform the batch process.
Be sure your Security Options are set to No Security.
Applying Document Security encrypts the characters in your pdf file and may cause your
search to malfunction.