The DocumentsXML File
The data to extract from the Salesforce repository is specified through an XML file that defines the objects to index and the metadata and content to associate with those objects.
The schema for the XML is described in salesforceconnector.documents.xsd
located in your connector’s installation directory.
For each type of object that you want to index, this file must include a <document>
tag specifying the type name in the basetype
attribute:
<document basetype="Document"/>
You can use the optional where
attribute to filter the objects. This attribute uses the Salesforce Object Query Language (SOQL) syntax. For example, the following XML instructs the connector to retrieve only those documents that are contained in the MyFolder folder.
<document basetype="Document" where="Folder.Name='MyFolder' " />
You can also use the optional with
attribute to filter the objects. For example:
<document basetype="StringFieldType" with="RecordVisibilityContext (maxDescriptorPerRecord=100, supportsDomains=true, supportsDelegates=true)">
<document basetype="Document" where="Folder.Name='MyFolder' " with="DATA CATEGORY a ABOVE b" />
To index binary file data associated with the type, include the <file>
tag and specify the field that contains the data:
<document basetype="Document"> <file query="Body"/> </document>
If the type does not have binary data but does have textual content, you can index this using the <content>
tag:
<document basetype="Case"> <content query="Description"/> </document>
Specify the individual fields to index using the <field>
tag. The name
attribute sets the resulting IDOL field name:
<document basetype="Document"> <file query="Body"/> <field name="Document_Name" query="Name"/> </document>
To index all fields for the type, use the special query value “*
”. An IDOL field is generated for each field returned. The “*
” in the name
attribute is replaced with the field name:
<document basetype="Document"> <file query="Body"/> <field name="Document_*" query="*"/> </document>
In the example above, the Body
field is excluded from the wildcard field query.
Relationships between objects can be traversed (with some limitations) to retrieve fields from parent or child objects. To extract a parent or child field, the query
attribute should specify the relationship name and the field name to extract separated by “.
” as in Author.Name
or Attachments.Id
below.
Using a child-to-parent relationship (Author
):
<document basetype="Document"> <file query="Body"/> <field name="Document_Author_Name" query="Author.Name"/> <field name="Document_AuthorId" query="AuthorId"/> <field name="Document_*" query="*"/> </document>
Using a parent-to-child relationship (Attachments
):
<document basetype="Case"> <content query="Description"/> <field name="Case_Attachment_Id" query="Attachments.Id"/> <field name="Case_*" query="*"/> </document>
In the examples above, the Body
, AuthorId
, and Description
fields are not included by the wildcard field query. File and content fields are always ignored and IDs associated with a referenced relationship are also ignored.
Using the * character is the same as using a sub-query. For example:
<field name="Contact_*" query="Contacts.*" />
...retrieves the same metadata as the following sub-query:
<subquery childrelationship="Contacts">
<field name="Contact_*" query="*" /> </subquery>
When you include the subquery
element, the attribute childrelationship
is required.
You can optionally use the WHERE
and LIMIT
attributes in sub-queries. The following example retrieves all fields from the attachments and cases associated with an account. It then uses the WHERE
and LIKE
attributes with a sub-query to retrieve all fields from the first five contacts whose last name is "Smith". If you include the LIMIT
attribute, its value must be numeric.
<document basetype="Account"> <content query="Description" /> <field name="Account_Attachment_*" query="Attachments.*" /> <field name="Account_Case_*" query="Cases.*" /> <field name="Account_*" query="*" /> <subquery childrelationship="Contacts" where="LastName='Smith'" limit="5"> <field name="Contact_*" query="*" /> </subquery> </document>
Relationship Query Restrictions
Currently, there are some restrictions on the queries that you can use with the connector:
- Parent-to-child relationships can only be used on the base type. (
Child.Id
andChild.Parent.Id
are valid,Parent.Child.Id
andChild.Child.Id
are not valid.) - Child-to-parent relationships to multiple types of object do not support “
*
” notation fields and can only extract fields common to all types. (Parent.Id
would always be valid,parent.*
is not valid, and any other field name may result in errors during processing if there is a type mismatch.) - There are query size and complexity limits enforced by
salesforce.com
.
Generate the DocumentsXML
If you start the Salesforce Connector and run the synchronize
fetch action without setting the DocumentsXML
configuration parameter, the connector will attempt to connect to Salesforce and construct a DocumentsXML
file. This is named documents.xml
, and contains all of the document types available through salesforce.com
.
The synchronize
action will terminate immediately after generating the file. The generated file is for reference and is not intended to be used without modification.
Some of the document types can not be queried and are therefore commented out. The remaining documents will have an entry resembling:
<document basetype="Account"> <content query="Description" /> <field name="Account_*" query="*" /> <!-- field query: Id (id) field query: IsDeleted (boolean) field query: MasterRecordId (reference) references MasterRecord.* (Account) ... references AccountContactRoles.* (AccountContactRole) references Feeds.* (AccountFeed) references Histories.* (AccountHistory) ... --> </document>
The lines beginning “field query:” list the fields for the type that can be queried along with the data type of the field. Where a field contains a reference to a parent object, it also shows the relationship name and parent types.
The lines beginning “references” list the child relationships and types.
For the type description above, the following queries are all valid:
query="*" query="Id" query="MasterRecordId" query="MasterRecord.*" query="MasterRecord.Id" query="Feeds.*" query="Feeds.Id"