Choose the Content to Index

This section explains how to configure the connector so that it retrieves the content that you want to index, and nothing else.

The content in Microsoft Exchange is organized in the following structure. There is one message store for each user.

Message Store 
  |- Folder
       |- Folder
       |    |- Item(s)
       |- Folder
       |    |- Item(s)
       |- Item(s)

You can restrict the content to retrieve by setting the configuration parameters FolderMustHaveRegex, FolderCantHaveRegex, FolderMustHaveRegexToCrawl, and FolderCantHaveRegexToCrawl.

The difference between these two pairs of parameters is the behavior of the connector when a folder is excluded.

  • When a folder is excluded by FolderMustHaveRegexToCrawl or FolderCantHaveRegexToCrawl, that folder and all of its subfolders are ignored.
  • When a folder is excluded by FolderMustHaveRegex or FolderCantHaveRegex, the folder is ignored but its subfolders are still processed, unless they also fail to pass the filter.

This means that if you want to exclude a large part of a structure, it can be more efficient to use FolderMustHaveRegexToCrawl and FolderCantHaveRegexToCrawl. For example, consider the following folder structure:

   /
   /Inbox
   /Inbox/folder1
   /Inbox/folder2
   /Sent Items
   /Sent Items/folder1
   /Sent Items/folder2

To configure the connector to retrieve content only from /Inbox and its subfolders, you could use the following configuration:

FolderMustHaveRegex=.*Inbox.*

However, the connector has to process all of the folders and attempt to match their paths to the regular expression. The following configuration is more efficient because the connector only has to process the root folder and folders that have paths starting with /Inbox:

FolderMustHaveRegex=.*Inbox.*
FolderMustHaveRegexToCrawl=/(Inbox.*)?