Tuesday, July 29, 2008

Indexing pdf documents with Adobe Reader v8/v9 and MOSS 2007

The version 8/9 of the adobe reader has some significant architectural changes (for the better of course) including an inbuilt IFilter to index PDF documents. Previously the adobe IFilter was available as a seperate download. This new change in architecture compromised the ability to search pdf documents from within MOSS 2007. However, the pdf filter works fine with WDS 3.0 . While many consultants recommend that if we're to index pdf documents through MOSS 2007, we use the the v.6 of adobe IFilter and if we want to index pdf documents through WDS 3.0 or higher, we use the v.8 or higher of adobe reader. But what if we wanted to index pdf documents using both WDS and MOSS 2007?!!! Here's how you can use MOSS 2007 with adobe reader v.8, the version currently patronized by WDS 3.0

Steps ( Fixing the icon : REQUIRED on all Web Front Ends ):

1. Add the new filter-extension:
Central Administration -> Search Settings -> File Types -> New File Type ( Add extension 'pdf' , without dot )

2. Search for the PDF icon file on Google.

3. Copy the GIF or PNG file that you downloaded for the icon to the following folder on the server:

Drive:\Program Files\Common Files\Microsoft Shared\Web Server Extensions\12\Template\Images

4. Navigate to Drive:\Program Files\Common Files\Microsoft Shared\Web Server Extensions\12\Template\XML

4. Look for DOCICON.xml file.

5. Open the DOCICON.xml file using preferred Editor ( i.e. notepad ).

6. Add an entry for the .pdf extension

<Mapping Key="pdf" Value="NameofIconFile.gif" >
7. Save the DOCICON.xml

8. IISReset

Steps ( Fixing the search : Only on Index Server ):

1. Download Adobe Reader ( in-build ifilter ) or Only IFilter

For 64 bit here's the Download Link

2 Install Adobe Reader (v8 or v9) or IFilter on Index Server only.

3. Modify the following Registry keys by changing their "Default" value to the new CLSID of the Adobe IFilter: {E8978DA6-047F-4E3D-9C78-CDBE46041603}

HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Office Server\12.0\Search\Setup\ContentIndexCommon\Filters\Extension\.pdf
Default -> {E8978DA6-047F-4E3D-9C78-CDBE46041603}


HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Shared Tools\Web Server Extensions\12.0\Search\Setup\ContentIndexCommon\Filters\Extension\.pdf
Default -> {E8978DA6-047F-4E3D-9C78-CDBE46041603}


4. Add the Installation directory of the Adobe Reader v.8 to the System Path.

For example, if the Reader is installed on "C:\Program Files\Adobe", then add

"C:\Program Files\Adobe\Reader 8.0\Reader"
or
"C:\Program Files\Adobe\Reader 9.0\Reader" to the system PATH variable :

Right Click on My Computer -> Properties -> Advanced -> Environment Variables -> Path (Under System Variables) -> Edit ( Add "Drive:\Program Files\Adobe\Reader 8.0\Reader") or "Drive:\Program Files\Adobe\Reader 9.0\Reader").

This effectively tells the Adobe IFilter where to pick up the dependent DLLs.

5. Recycle the search service:

net stop osearch
net start osearch

6. Now we can crawl and search PDF documents with v.8/9 Reader.

Articles:
Which IFilter

3 comments:

vicky November 23, 2009 at 7:27 PM  

Hey Thanks Sandeep!!!That worked like a charm for me.

Vicky
vicky.malhar@hotmail.com

vicky November 23, 2009 at 7:28 PM  

Thanks Sandeep!!

That worked like a charm for me.

Vicky

vicky November 23, 2009 at 7:28 PM  
This comment has been removed by a blog administrator.