Thursday, August 7, 2008

Indexing pdf documents with FOXIT

Foxit PDF IFilter is designed to help users to index a large amount of PDF documents and then quickly find text within these documents. The PDF documents can be files, email attachments or database records.

Steps ( Fixing the icon : REQUIRED on all Web Front Ends ):

1. Add the new filter-extension:
Central Administration -> Search Settings -> File Types -> New File Type ( Add extension 'pdf' , without dot )

2. Search for the PDF icon file on Google.

3. Copy the GIF or PNG file that you downloaded for the icon to the following folder on the server:

Drive:\Program Files\Common Files\Microsoft Shared\Web Server Extensions\12\Template\Images

4. Navigate to Drive:\Program Files\Common Files\Microsoft Shared\Web Server Extensions\12\Template\XML

4. Look for DOCICON.xml file.

5. Open the DOCICON.xml file using preferred Editor ( i.e. notepad ).

6. Add an entry for the .pdf extension

<Mapping Key="pdf" Value="NameofIconFile.gif" >
7. Save the DOCICON.xml

8. IISReset

Steps ( Fixing the search : Only on Index Server ):

1. Download Adobe Reader ( in-build ifilter ) or Only IFilter

For 64 bit here's the Download Link

2 Install Adobe Reader (v8 or v9) or IFilter on Index Server only.

3. Modify the following Registry keys by changing their "Default" value to the new CLSID of the Adobe IFilter: {987f8d1a-26e6-4554-b007-6b20e2680632}

HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Office Server\12.0\Search\Setup\ContentIndexCommon\Filters\Extension\.pdf
Default -> {987f8d1a-26e6-4554-b007-6b20e2680632}

HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Shared Tools\Web Server Extensions\12.0\Search\Setup\ContentIndexCommon\Filters\Extension\.pdf
Default -> {987f8d1a-26e6-4554-b007-6b20e2680632}

4. Add the Installation directory of the Adobe Reader v.8 to the System Path.

For example, if the Reader is installed on "C:\Program Files\Adobe", then add

"C:\Program Files\Adobe\Reader 8.0\Reader"
"C:\Program Files\Adobe\Reader 9.0\Reader" to the system PATH variable :

Right Click on My Computer -> Properties -> Advanced -> Environment Variables -> Path (Under System Variables) -> Edit ( Add "Drive:\Program Files\Adobe\Reader 8.0\Reader") or "Drive:\Program Files\Adobe\Reader 9.0\Reader").

This effectively tells the Adobe IFilter where to pick up the dependent DLLs.

5. Recycle the search service:

net stop osearch
net start osearch

6. Now we can crawl and search PDF documents with v.8/9 Reader.

Update :
New PDF 64 bit Ifilter