The DocumentsXML File

The data to extract from the Salesforce repository is specified through an XML file that defines the objects to index and the metadata and content to associate with those objects.

The schema for the XML is described in salesforceconnector.documents.xsd located in your connector’s installation directory.

For each type of object that you want to index, this file must include a <document> tag specifying the type name in the basetype attribute:

   <document basetype="Document"/>

You can use the optional where attribute to filter the objects. This attribute uses the Salesforce Object Query Language (SOQL) syntax. For example, the following XML instructs the connector to retrieve only those documents that are contained in the MyFolder folder.

   <document basetype="Document" where="Folder.Name='MyFolder' " />

You can also use the optional with attribute to filter the objects. For example:

   <document basetype="StringFieldType" with="RecordVisibilityContext (maxDescriptorPerRecord=100, supportsDomains=true, supportsDelegates=true)">
   <document basetype="Document" where="Folder.Name='MyFolder' " with="DATA CATEGORY a ABOVE b" />

To index binary file data associated with the type, include the <file> tag and specify the field that contains the data:

   <document basetype="Document">
      <file query="Body"/>
   </document>

If the type does not have binary data but does have textual content, you can index this using the <content> tag:

   <document basetype="Case">
      <content query="Description"/>
   </document>

Specify the individual fields to index using the <field> tag. The name attribute sets the resulting IDOL field name:

   <document basetype="Document">
      <file query="Body"/>
      <field name="Document_Name" query="Name"/>  
   </document>

To index all fields for the type, use the special query value “*”. An IDOL field is generated for each field returned. The “*” in the name attribute is replaced with the field name:

   <document basetype="Document">
      <file query="Body"/>
      <field name="Document_*" query="*"/>
   </document>

In the example above, the Body field is excluded from the wildcard field query.

Relationships between objects can be traversed (with some limitations) to retrieve fields from parent or child objects. To extract a parent or child field, the query attribute should specify the relationship name and the field name to extract separated by “.” as in Author.Name or Attachments.Id below.

Using a child-to-parent relationship (Author):

   <document basetype="Document">
      <file query="Body"/>
      <field name="Document_Author_Name" query="Author.Name"/>
      <field name="Document_AuthorId" query="AuthorId"/>
      <field name="Document_*" query="*"/>
   </document>

Using a parent-to-child relationship (Attachments):

   <document basetype="Case">
      <content query="Description"/>
      <field name="Case_Attachment_Id" query="Attachments.Id"/>
      <field name="Case_*" query="*"/>
   </document>

In the examples above, the Body, AuthorId, and Description fields are not included by the wildcard field query. File and content fields are always ignored and IDs associated with a referenced relationship are also ignored.

Using the * character is the same as using a sub-query. For example:

   <field name="Contact_*" query="Contacts.*" />

...retrieves the same metadata as the following sub-query:

   <subquery childrelationship="Contacts">
<field name="Contact_*" query="*" /> </subquery>

When you include the subquery element, the attribute childrelationship is required.

You can optionally use the WHERE and LIMIT attributes in sub-queries. The following example retrieves all fields from the attachments and cases associated with an account. It then uses the WHERE and LIKE attributes with a sub-query to retrieve all fields from the first five contacts whose last name is "Smith". If you include the LIMIT attribute, its value must be numeric.

   <document basetype="Account"> 
      <content query="Description" /> 
      <field name="Account_Attachment_*" query="Attachments.*" /> 
      <field name="Account_Case_*" query="Cases.*" /> 
      <field name="Account_*" query="*" /> 
      <subquery childrelationship="Contacts" where="LastName='Smith'" limit="5"> 
         <field name="Contact_*" query="*" /> 
      </subquery> 
   </document>

Relationship Query Restrictions

Currently, there are some restrictions on the queries that you can use with the connector:

  • Parent-to-child relationships can only be used on the base type. (Child.Id and Child.Parent.Id are valid, Parent.Child.Id and Child.Child.Id are not valid.)
  • Child-to-parent relationships to multiple types of object do not support “*” notation fields and can only extract fields common to all types. (Parent.Id would always be valid, parent.* is not valid, and any other field name may result in errors during processing if there is a type mismatch.)
  • There are query size and complexity limits enforced by salesforce.com.

Generate the DocumentsXML

If you start the Salesforce Connector and run the synchronize fetch action without setting the DocumentsXML configuration parameter, the connector will attempt to connect to Salesforce and construct a DocumentsXML file. This is named documents.xml, and contains all of the document types available through salesforce.com.

The synchronize action will terminate immediately after generating the file. The generated file is for reference and is not intended to be used without modification.

Some of the document types can not be queried and are therefore commented out. The remaining documents will have an entry resembling:

   <document basetype="Account">
        <content query="Description" />
        <field name="Account_*" query="*" />
        <!--
        field query: Id (id)
        field query: IsDeleted (boolean)
        field query: MasterRecordId (reference) references
           MasterRecord.* (Account)
        ...
        references AccountContactRoles.* (AccountContactRole)
        references Feeds.* (AccountFeed)
        references Histories.* (AccountHistory)
        ...
        -->
   </document>

The lines beginning “field query:” list the fields for the type that can be queried along with the data type of the field. Where a field contains a reference to a parent object, it also shows the relationship name and parent types.

The lines beginning “references” list the child relationships and types.

For the type description above, the following queries are all valid:

   query="*"  
   query="Id"
   query="MasterRecordId"
   query="MasterRecord.*"
   query="MasterRecord.Id"
   query="Feeds.*"
   query="Feeds.Id"