SharePoint Fast Search Planning Design Content Collection
Identify the sources of content that you want to crawlFAST Search Server 2010 for SharePoint uses different indexing connectors for different content sources. The choice of indexing connector is influenced by the kind of content that you want to crawl, by personal preference and by specific needs of your organization.
Most content sources can be crawled using the various indexing connectors offered through Microsoft SharePoint Server 2010. The collection of these indexing connectors is also known as the FAST Search connector. Be aware, however, that it is not one, separate indexing connector, but rather a collection of connectors. The FAST Search connector is associated with one or more content sources (and therefore indexing connectors) through the FAST Search Content Search Service Application (SSA). The Content SSA also connects the Microsoft SharePoint Server 2010 front-end with the FAST Search Server 2010 for SharePoint back-end.
When you install FAST Search Server 2010 for SharePoint, you have access to three FAST Search Server 2010 for SharePoint specific indexing connectors. These connectors can feed Web, database and Lotus Notes content to the index. The table summarizes the available indexing connectors and their recommended use cases.
| Type of content |
Indexing connector |
Recommended use case |
|
SharePoint |
SharePoint indexing connector |
Use in all use cases. |
|
File shares |
File share indexing connector |
Use in all use cases. |
|
Exchange |
Exchange indexing connector |
Use in all use cases. |
|
People profiles |
People profiles indexing connector |
Use in all use cases.
Note that this kind of content is crawled through the FAST Search Query Search Service Application. |
|
Web sites |
Web site indexing connector |
Use when you have a limited amount of Web sites to crawl, without dynamic content. |
|
FAST Search Web crawler |
Use when you have many Web sites to crawl.
Use when the Web site content contains dynamic data, including JavaScript.
Use when the organization needs access to advanced Web crawling, configuration and scheduling options.
Use when you want to crawl RSS Web content.
Use when the Web site content uses advanced logon options. |
|
Database |
Business Data Catalog-based indexing connectors |
Use if the preferred configuration method is using the Microsoft SharePoint Designer 2010.
Use when you want to use time stamp based change detection for incremental database crawls.
Use when the preferred operation method is using the Microsoft SharePoint Server 2010 Central Administration.
Use when you want to enable crawling based on the change log. This can be achieved by directly modifying the connector model file and creating a stored procedure in the database. |
|
FAST Search database connector |
Use when the preferred configuration method is using SQL queries.
Use when you want advanced data joining operation options through SQL queries.
Use when you want to use advanced incremental update features. FAST Search database connector uses checksum based change detection for incremental crawls if there is no update information available. The connector also supports time stamp based change detection and change detection based on update and delete flags. |
|
Lotus Notes |
Lotus Notes indexing connector |
Use when the preferred operation method is using the Microsoft SharePoint Server 2010 Central Administration. |
|
FAST Search Lotus Notes connector |
Use when full Lotus Notes security support is required, including support for Lotus Notes roles.
Use when you want to crawl Lotus Notes databases as attachments. |
|
Line of Business Data |
Business Data Catalog-based indexing connectors |
Use when the data in your content source contains data in line of business applications.
Use when you want to enable crawling based on the change log. This can be achieved by directly modifying the connector model file and creating a stored procedure in the database. | About crawling and indexing content The result of successfully crawling content is that the individual files or pieces of content that you want to make available to search queries are accessed and read by the indexing connector. By crawling the content, you create a set of crawled properties for those items. These crawled properties are mapped to managed properties that are stored in the search index, also known as the index.
 About the integrated indexing connectors
Most content sources can be crawled using the integrated indexing connectors in SharePoint Server 2010. You use the SharePoint Server 2010 Central Administration for most configuration and operation tasks.
These indexing connectors are set up by configuring a FAST Search connector Content Search Service Application (Content SSA). Among other things, this Content SSA enables communication with the FAST Search Server 2010 for SharePoint back-end. You have to configure one Content SSA per content collection, in which you specify the location of the content source(s), the crawl schedule and other information. There is a default content collection called sp. If you decide to create an additional content collection, you will also have to set up a new Content SSA to feed to it.
The FAST Search connector crawls:
- SharePoint sites
- Web sites
- File shares that contain content such as Microsoft Office documents
- Exchange public folders
- Line of business data, for example content from databases
- Custom repositories, accessed with a custom built connector
Building connector models on the connector framework To crawl certain repositories, for example databases or Web services, you need the SharePoint Server 2010 Connector Framework. This framework enables you to use Business Connectivity Services (BCS) models to crawl external data sources. These models define the connection details and structure of the external content source that you plan to crawl. The BCS models are imported into the Business Connectivity Service. You will point to a model when you set up the Line of Business Data type Content Source.
There are several predesigned BCS models you can use for database content, Web services (WCF) and .NET custom code. It is also possible to create your own, custom BCS model. In addition, you can create your own, custom connector using the connector framework and BCS models.
To build on the SharePoint Server 2010 Connector Framework, you must use either SharePoint Designer or Microsoft Visual Studio 2010, depending on your specific requirements and goals.
Use SharePoint Designer to:
- Create BCS models that are needed to crawl out of the box supported external content sources such as databases and Web services.
- Import/export models between BCS applications
Use Microsoft Visual Studio to:
- Implement methods for .NET BCS Connector
- Write a custom connector for your repository
Multiple content sources can all pull from the same Business Connectivity Service (BCS), and you can point different Search Service Applications to the same model in a shared BCS.
Crawling Lotus Notes content with the Lotus Notes indexing connector There are additional prerequisites and configurations to be able to crawl Lotus Notes content with the Lotus Notes indexing connector. These are mainly related to Lotus Domino settings.
About the FAST Search Server 2010 for SharePoint indexing connectors
In addition to the integrated indexing connectors, FAST Search Server 2010 for SharePoint offers additional content indexing connectors for Web, Lotus Notes and database content.
These indexing connectors are configured mainly by editing XML files and Windows PowerShell cmdlets and you operate them by using the command line.
About the FAST Search Web crawler
The FAST Search Web crawler is a highly customizable indexing connector used to crawl Web site content. The FAST Search Web crawler can scale to large environments, for example when your organization is crawling many external Web sites. In addition, the FAST Search Web crawler can crawl dynamic Web content, such as Web sites that contain JavaScript.
The FAST Search Web crawler collects content from a set of defined Web sites, which can be internal or external. The configuration of the FAST Search Web crawler is done by editing a copy of an XML file. You can operate the FAST Search Web crawler through several command line tools.
The FAST Search Web crawler is typically a component inside a FAST Search Server 2010 for SharePoint installation. Internally, the FAST Search Web crawler is organized as a collection of processes and logical entities, which in most cases run on a single server. When the number of Web sites or total number of pages to be crawled is large, the FAST Search Web crawler can be scaled up by distributing these processes across multiple hosts. This requires additional configuration.
The FAST Search Web crawler can crawl HTTP, HTTPS and FTP content and supports NTLM version 1 (and to a limited extend version 2), Digest, basic auth and form based logon authentication. RSS scheduling is supported and you can tag linked documents from the feed.
About the FAST Search database connector
The FAST Search database connector is a specialized indexing connector that collects content from database content sources.
The indexing connector is configured by using an XML template. You operate the connector by using the command-line options from the jdbcconnector.bat file. After running the configured connector, you map crawled properties to managed properties in the SharePoint Server 2010 Central Administration to enable and customize search on the content collected by the connector.
The connector uses an SQL statement to run against the crawl database. This statement is completely customizable. The FAST Search database connector uses checksum based change detection for incremental crawls if there is no update information available. The connector also supports time stamp based change detection and change detection based on update and delete flags. Also, you can indicate pre and post operation procedures that have to be done to the database before it is crawled, which can be an advantage in certain use cases.
About the FAST Search Lotus Notes connector
The FAST Search Lotus Notes connector is a specialized indexing connector that consists of two parts: a user directory connector and a content connector. The content connector collects content from a Lotus Notes content source. The user directory connector ensures that the end-users can only search Lotus Notes content that they have access to. The user directory connector maps the Active Directory user directory with the Lotus Notes user accounts and is closely integrated with FAST Search Authorization.
The connector is configured by using two XML templates, one for the user directory connector and one for the content connector. You operate the connector by using the command-line options from the lotusnotesconnector.bat and lotusnotessecurity.bat files. After running the configured content connector, you map crawled properties to managed properties in the SharePoint Server 2010 Central Administration to enable and customize search on the content collected by the content connector.
The FAST Search Lotus Notes connector supports Lotus Notes version 6.5.6, 7.x and 8.x and Lotus Domino version 6.5, 7.x and 8.x.
The connector fully supports Lotus Notes security, including roles, and can index Lotus Notes databases as attachments.
Limiting the content that you want to crawl
When you use the integrated indexing connectors to crawl content, you can use the user interface of the SharePoint Server 2010 Central Administration to indicate what content you want to exclude from the crawl. The FAST Search Server 2010 for SharePoint specific connectors each have parameters in their respective configuration files to indicate include and exclude rules.

For content within your organization that other administrators are crawling, you can coordinate with those administrators to set impact rules based on the performance and capacity of the servers. For most external sites, this coordination is not possible. Requesting too much content on external servers or making requests too frequently can cause administrators of those sites to limit your future access if your crawls are using too many resources or too much bandwidth. Therefore, the best practice is to crawl more slowly. In in this manner, you can reduce the risk of losing access to crawl the relevant content.
With the FAST Search Web crawler, you can control the crawl rate by setting a request delay, set a maximum to the number of concurrent requests that are sent to the same Web site at the same time or enable or disable concurrent crawling of an IP address where multiple sites are hosted. You can also limit the bandwidth of the FAST Search Web crawler by limiting the number of concurrent Web sites to crawl at the same time.
Setting crawl schedules
When using the integrated indexing connectors to crawl content, you can use the user interface of the SharePoint Server 2010 Central Administration to indicate when you want to crawl content. The FAST Search Lotus Notes connector and the FAST Search database connector use the Windows Task Scheduler to schedule crawls. Scheduling crawls for the FAST Search Web crawler is possible by setting parameters in the XML configuration file.

|