Replication agents are the basic components of SQL Server 2000 replication.
Before you settle on the type of replication you want in your application architecture, it is important to understand how each type of replication works and what agents they use. Since the different types of replication use different combinations of replication agents, understanding what those agents do helps clarify the resulting SQL Agent jobs when you install replication. For example, snapshot, transactional, and merge replication all use the Snapshot Agent to initialize the subscriber's data. Then they employ other agents to implement the replication process. In this article, you'll learn about the roles of the replication agents and learn how to inspect the properties and profiles of the Snapshot and other agents.
Replication Basics: a Review
As you learned in the previous Replication 101 article, "Terminology, Types, and Configuration" (CoDe Magazine, Nov/Dec 2003), SQL Server 2000 replication uses a publish/subscribe metaphor. A publisher server contains data that will be distributed to subscribers. The specification of the data you want to replicate is called a publication, and a publisher server may contain many publications. Each publication consists of one or more articles, which are the database objects including tables, views, stored procedures, and user-defined functions that the publisher server will publish to the subscribers.
A push subscription executes a SQL Agent job on the distributor or publisher and sends the data to the subscriber, while a pull subscription executes a SQL Agent job on the subscriber to retrieve the data.
A subscriber is usually another server that receives data from the publisher and optionally sends updates back to it. You create a subscription to the publication using replication, which SQL Server implements using SQL Agent jobs that execute on a scheduled basis. A distributor is another server that you may use to distribute data.
A push subscription executes a SQL Agent job on the distributor or publisher and sends the data to the subscriber, while a pull subscription executes a SQL Agent job on the subscriber to retrieve the data.
When you configure replication, you define which servers will be the publisher and distributor, and optionally, which servers will be the subscribers.
After you have configured replication, you define a publication on the publisher server that consists of articles of data. When you define a publication, you must choose what type of replication the publication has: snapshot, transactional, or merge. Once you've defined the publication, you can set up one or more subscriptions to the publication from one or more servers. The steps you take in defining a publication vary depending on the type of replication you choose.
Types of Replication: Overview
SQL Server 2000 offers three types of replication: snapshot, transactional, and merge. Each type of replication uses various combinations of replication agents. A replication agent is an executable that a SQL Agent job calls. Usually you will set up replication using the GUI dialogs in SQL Server 2000's Enterprise Manager, and they will automatically create the jobs that execute the agents periodically with the appropriate parameters. You could write your own application to call the agents directly outside of SQL Agent, but then you would lose the monitoring support built into Enterprise Manager. It's much more common to use Enterprise Manager's built-in support to set up and monitor replication.
Snapshot Replication
Use snapshot replication to make a point-in-time distribution of a publication's articles, with all the defined objects and data, to one or more subscribers. The distributed data is a copy of those database's objects as they exist at a certain point in time, and hence the term 'snapshot.' Each time the snapshot replication process runs, a new snapshot containing the objects and data is sent to the subscriber.
Think of snapshot replication as a kind of discrete operation. You can send the objects and their data once or on a periodic schedule, but each time the subscriber is fully reinitialized. You don't need to continuously monitor snapshot replication because it does not send data continuously.
Snapshot replication exports the schema and data of a publication to a shared file location, as you can see in Figure 1. The files consist of scripts to create the database objects defined in the publication: the table, view, stored procedure, and function schemas, along with supporting objects such as index and trigger schemas. Export files containing the data are also placed in the subdirectory. As you'll learn later, SQL Server's replication agents use these files to create the objects on the subscriber and load the data. Each time SQL Server generates a snapshot, a new file location is set up and then it recreates the entire publication schema and data files.
Any changes you make to the publication's articles are sent anew each time SQL Server generates the snapshot that contains the schema scripts and data files. You don't need to worry about changes in stored procedures, triggers, or functions. Because the snapshot generates everything from scratch and recreates the objects on the subscriber, if you've included them in the publication, changes to them will show up in the subscriber.
You'll use snapshot replication when you have relatively static data or you have small amounts of data. When you have static data, such as lookup tables, you may seldom need to refresh the data on subscribers. Because the snapshot process exports all the data for tables, snapshots of large amounts of data can take a long time. Also, if your subscribers can tolerate latency, that is, the data being only occasionally refreshed, snapshot replication may have a role.
Transactional Replication
Transactional replication uses snapshot replication in order to initialize the data objects on their subscribers. Normally you only need to generate the snapshot once, for initialization. From that point on, transactional replication sends transactions to the subscribers through the distribution process. Transactional replication uses the Log Reader Agent, to read the transaction log of the publisher database continuously and stores those transactions in the distribution database, as you can see in Figure 2. Transactional replication will send only the changes to the subscribers.
You'll find transactional replication very useful when you need to deliver data changes to other servers as quickly as possible and the subscribers are continuously connected to the distributor. SQL Server will deliver the transactions to the subscribers atomically?either all the data of a transaction will be applied, or none of it.
Note how similar snapshot and transactional replication are. Transactional replication has two additional agents but otherwise uses the same agents as snapshot replication.
Normally, transactional replication architectures keep the subscribers in a read-only state. You can change data on the subscribers because the databases are not in a read-only mode, but the normal transactional replication flow is one direction only (from publisher to subscribers).
If you must occasionally make changes on the subscriber that propagate back to the publisher, transactional replication provides two mechanisms: immediate updating subscribers and queued updating subscribers. These mechanisms are really meant for occasional updates to the subscriber. If you need to make frequent updates to the subscriber or subscribers, you should consider merge replication.
Transactional replication creates special system stored procedures on the subscriber to apply changes to the data. The distribution database queues up transactions so each subscriber receives changes in exactly the same order they occurred on the publisher. Then the Distribution Agent connects to the subscriber and executes the system stored procedures in order to apply the transactions to the subscriber's data. Transactional replication is the fastest form of replication?it can deliver publication changes to a subscriber very quickly.
You can also change the schema being published, within limits. You can add or drop a table's column using the appropriate replication dialogs in Enterprise Manager or through special replication system stored procedures. (See "Schema Changes on Publication Databases" in SQL Server 2000 Books Online for further details.)
Merge Replication
Merge replication also uses a snapshot to initialize the subscriber data. After initialization, data is exchanged between publisher and subscriber using a peer mechanism internal to the databases, and not transactions. Merge replication does not use a distribution database to distribute data. Instead, it uses the Merge Agent, as you can see in Figure 3.
You'll use merge replication when you have frequent updates on both publishers and subscribers. Also, merge replication is very tolerant of occasionally connected servers. A subscriber can be disconnected from a publisher for a period of time, and then synchronization will occur when the subscriber next connects to the publisher.
Merge replication makes very little use of the distribution database, so you don't need to have a separate distribution server. With merge replication, SQL Server tracks changes to data within each database, and SQL Server synchronizes the publications using the Merge Agent. In a push subscription, the Merge Agent job for each subscription will reside on the publisher/distributor. In a pull subscription, a Merge Agent job will be placed on the SQL Agent of each subscriber.
Because conflicts can occur in a peer-type relationship, merge replication supplies a conflict resolver mechanism. You can set certain priorities within the resolver and even write your own custom resolver.
In both transactional and merge replication, if the subscriber becomes significantly out of date, that is, it expires, SQL server will renew the subscription by generating a new snapshot for the subscriber. You can differentiate all three types of replication by the agents they use. Now let's take a look at replication agents.
Replication Agents: Overview
SQL Server 2000 comes with five major replication agents as well as a number of other miscellaneous agents for monitoring and cleanup. You can see the agents listed under the Replication Monitor/Agents node in Enterprise Manager. Each major replication agent is a separate executable located in the ..\Program Files\Microsoft SQL Server\80\COM folder. The major agents, along with the executables they call, include:
- Snapshot Agent (snapshot.exe)
- Distribution Agent (distrib.exe)
- Log Reader Agent (logread.exe)
- Queue Reader Agent (qrdrsvc.exe)
Miscellaneous agents call replication stored procedures located in the distribution database. SQL Server installs SQL Agent jobs that invoke the miscellaneous agents when you first configure replication, even before you define any publications. SQL Server's miscellaneous agents include:
- Agent history clean up: distribution (sp_MShistory_cleanup)
- Distribution clean up: distribution (sp_MSdistribution_cleanup)
- Reinitialize subscriptions having data validation failures (sp_MSreinit_failed_subscriptions)
- Replication agents checkup (sp_replication_agent_checkup)
SQL Server's three types of replication all use the miscellaneous agents. Table 1 shows you how the three types of replication use the major agents.
In both transactional and merge replication, if the subscriber becomes significantly out of date, that is, it expires, SQL server will renew the subscription by generating a new snapshot for the subscriber.
In addition to the executables, there is a set of ActiveX controls associated with each major agent that you can use to create custom replication applications. You can find out more about these controls and their object models in SQL Server Books Online under the topic, "Replication ActiveX Control Reference."
Table 1 shows that all three types of replication use the Snapshot Agent. Both snapshot replication and transactional replication use the Distribution Agent, but transactional replication also uses the Log Reader Agent to read transactions from the publisher's database. Merge replication only requires Merge Agents in addition to the Snapshot Agent.
Also, note how similar snapshot and transactional replication are: transactional replication has two additional agents, but otherwise uses the same agents as snapshot.
The Snapshot Agent
The Snapshot Agent, a three-step SQL Agent job, does the work of generating a snapshot. The agent generates the schema and data files for the publication and places them in a snapshot folder. The SQL Agent job step that calls the snapshot.exe executable does most of the work. The other two steps update replication history tables.
The Distribution Agent
Both snapshot and transactional replication use the Distribution Agent?a three-step SQL Agent job that accesses the snapshot folder and applies the schema and data to the subscribers if they have not been initialized or need re-initialization. If you choose a push type of snapshot or transactional subscription, then the Distribution Agent will run on the distributor. If you choose a pull type of subscription, the Distribution Agent resides on the subscriber.
The Log Reader Agent
Transactional replication is the only type of replication that uses the Log Reader Agent. The Log Reader Agent, also a three-step SQL Agent job, calls the logread.exe executable and updates some history tables as well. The logread.exe executable reads the transaction log of the publisher database for transactions that have been marked for replication.
The Queue Reader Agent
SQL Server only uses the Queue Reader Agent when you add queued updating subscribers to transactional replication. The Queue Reader Agent runs on the distributor and reads either a queue stored in a table on the subscriber, or a Microsoft Message Queue.
The Merge Agent
The Merge Agent resides on either the publisher or subscriber server in a merge replication implementation. The Merge Agent applies the snapshot of data to the subscribers, so merge replication does not need a Distribution Agent. The Merge Agent's most important role is to read the merge system tables and apply and reconcile the recorded changes between the publisher and its subscribers. In a push subscription, the Merge Agent resides on the publisher; in a pull subscription, the Merge Agent resides on the subscriber.
Merge replication is more complex than either snapshot or transactional replication. It has a number of features that the others lack such as dynamic filtering, which allows you to filter the replicated data based on information that the subscriber sends when its Merge Agent connects to the publisher. Accordingly, the Merge Agent executable has more parameters than the other agents.
Replication Agents in Action: Defining a Snapshot Publication
To see a replication agent at work, let's take a simple example of defining a snapshot publication on a single server. In this example, you'll define a publication consisting of the authors table in the pubs database. To begin, make sure that you have the pubs sample database with the authors table on your SQL Server. If you don't, you can substitute another table on another database accordingly in the following steps.
We'll first configure the server for replication. In Enterprise Manager, from the main menu choose Tools, then choose Replication, and then select the "Configure Publishing, Subscribers, and Distribution" option. In the Configure Publishing and Distribution wizard, make your server its own distribution server. Choose the default location for the snapshot folder. Use the default settings for publisher and distributor. Because you are replicating all on the same server, the wizard will automatically enable your server as a distributor, publisher, and subscriber.
Now let's set up a simple snapshot replication and examine the two resulting agents.
Defining the Publication
In Enterprise Manager, you define a publication by choosing the Tools menu, choosing Replication, and then selecting "Create and Manage Publications." In the opening dialog, choose the pubs database and click Create Publication. You'll next see Create Publication Wizard dialog box. Click Next and you'll see the Choose Publication Database dialog box as shown in Figure 4. Make sure you have selected the pubs database, then click Next. In the Select Publication Type dialog box, choose Snapshot replication and click Next. In the Specify Subscriber Types dialog box, choose SQL Server 2000 because this replication example takes place on one instance of SQL Server 2000.
The next dialog box, Specify Articles, lets you choose the tables, views, stored procedures, and functions that you want to replicate, as shown in Figure 5. Choose the authors table. Note that there are no user-defined functions in the pubs database, so you don't see them listed as an option in the left pane.
Now click the Browse button (the icon with three dots) to the right of the authors table. The resulting Article Properties dialog box has two tabs. On the General tab you could change the owner and table name on the subscriber database. Leave these as the defaults and click the Snapshot tab. Here you can see further options for what objects to send during the snapshot, as shown in Figure 6. By default, SQL Server will send the primary key and clustered index on the table, but not the other indexes, triggers, and referential integrity (foreign key) constraints. You can also send extended properties of the object, as well as the collations on individual columns. If you were expecting any of these objects and they didn't show up, you can return to this properties dialog after replication is set up and modify them.
Click OK to accept the default settings and then click Next. In the next series of dialogs, give the publication a name of PubsSnapshot, accept the default properties, and click the Finish button. You have now defined a publication on the publisher server.
Inspecting the Snapshot Agent
Now drill down in Enterprise Manager to the Replication Monitor, opening the tree control to the Agents node, and then select the Snapshot Agents node. You'll see the Snapshot "agent" in the right pane. Right-click on the agent and choose Agent Properties from the pop-up menu. You'll see the SQL Agent job that implements the Snapshot Agent. If you click the Steps tab, you'll see the Snapshot Agent's job steps, as shown in Figure 7.
Select the second job step, Run Agent, and click Edit. This will launch the Edit JobStep dialog box, shown in Figure 8. In the Command field you will see how SQL Agent calls the Snapshot Agent executable. Scroll the navigation bar to the right and you can see the parameters for the call to snapshot.exe. The snapshot.exe executable has three required and 24 optional parameters. SQL Server uses the optional parameters to specify any values other than the default values.
When SQL Server uses the Snapshot Agent for snapshot replication, the ReplicationType parameter will be blank. When SQL Server uses the Snapshot Agent in transactional replication, the parameter value of the ReplicationType will be 1; and in merge replication, the value of the ReplicationType parameter will be 2.
Cancel out of these dialog boxes and right-click again on the Snapshot Agent and choose Agent Profiles. You'll see the Agent Profiles dialog box that will show you the default profile for this Snapshot Agent. Click View Details to inspect the options for this. As you can see in Figure 9, some additional parameters for the snapshot are stored in the registry. These parameters control the bulk copy extract from the publisher database: the batch size, query timeout, and so on.
The Importance of Agents
You've now seen how defining snapshot replication results in one agent, the Snapshot Agent, and how SQL Server implements it as a SQL Agent job. When you subscribe to the publication, you'll notice the addition of a Distribution Agent. Using similar steps to those defined in the previous example, you can figure out how to view or modify any agent's properties and its profiles. You can modify properties and profiles to inspect, troubleshoot, and tune replication behavior. If you just keep in mind that when you choose any type of replication, the action taking place behind the scenes is actually done by a set of replication agents, you'll have mastered an important step in understanding how replication really works.