Friday, July 25, 2008
 
   
 
O-SE: how do I know if content has been indexed?   Minimize
Location: BlogsOpenDNN blogOpen-SearchEngine    
Posted by: Xepient Solutions 3/5/2008
The search engine log may may show that the page was indexed, but when you do a search on a keyword, there are no results. How can you verify that the content has really been indexed?

The search engine log may may show that the page was indexed, but when you do a search on a keyword, there are no results.  How can you verify that the content has really been indexed?

Visual Inspection:

To make sure that the words are contained in the index that the engine produces you can visually inspect the indexes.

To do that, look for the <dnn root>\DesktopModules\XSSearchInput\index directory where DNN is installed.

In that directory you will see a series of sub-directories (with the names of the URLs that youy are spidering) with names ending in .out.

Those directories are were the indexes are stored. The indexes are a series of text files ... Just view them in notepad and look for the keywords you expect to find.  The same is true for word (MSOffice) and .pdf documents, that will be translated to pure text and stored into these indexes.

If they are not there it's because they were not indexed. As O-SE utilizes Lucene as the indexing engine, it has little or no control over what words do make it and what words do not.

Also, it may depend on the html parsing routines we use in order to clean the contents of the spidered pages, but if the page is well formed, all words should pass the filtering.

 

Permalink |  Trackback

Your name:
Title:
Comment:
Security Code
Enter the code shown above in the box below
Add Comment   Cancel 
     
Search the Blog   Minimize
     
Blog List   Minimize
     
Blog Archive   Minimize