The search engine log may may show that the page was indexed, but when you do a search on a keyword, there are no results. How can you verify that the content has really been indexed?
The search engine log may may show that the page was indexed, but when you do a search on a keyword, there are no results. How can you verify that the content has really been indexed?
Visual Inspection:
To make sure that the words are contained in the index that the engine produces you can visually inspect the indexes.
To do that, look for the <dnn root>\DesktopModules\XSSearchInput\index directory where DNN is installed.
In that directory you will see a series of sub-directories (with the names of the URLs that youy are spidering) with names ending in .out.
Those directories are were the indexes are stored. The indexes are a series of text files ... Just view them in notepad and look for the keywords you expect to find. The same is true for word (MSOffice) and .pdf documents, that will be translated to pure text and stored into these indexes.
If they are not there it's because they were not indexed. As O-SE utilizes Lucene as the indexing engine, it has little or no control over what words do make it and what words do not.
Also, it may depend on the html parsing routines we use in order to clean the contents of the spidered pages, but if the page is well formed, all words should pass the filtering.