Re: Replication bandwidth comparison



My point is with posting absolute numbers. Any numbers are HIGHLY
subjective to hardware, OS, OS configuration, NIC, NIC drivers, network
capacity, number of network hops, database structure, database
configuration, replication configuation, user activity in both publisher and
subscriber, processing capacity, transaction volume, transaction pattern,
and the list goes on and on and on. Unless you account for all of those
variables, any numbers posted are very misleading and we've all dealt with
managers who will take any excuse to "grind an axe".

There are entire classes of applications that never perform a delete while
there are entire classes which never update a row of data. I've dealt with
healthcare applications which never deleted a row of data (a logical delete
was performed instead). I've also dealt with others where updates were not
allowed and every change was posted as a new row of data. Every industry
has applications like this and while not "routine", it is quite common
particularly in financial services.

--
Mike
http://www.solidqualitylearning.com
Disclaimer: This communication is an original work and represents my sole
views on the subject. It does not represent the views of any other person
or entity either by inference or direct reference.

"Hilary Cotter" <hilary.cotter@xxxxxxxxx> wrote in message
news:%23Iq0qwEHGHA.1388@xxxxxxxxxxxxxxxxxxxxxxx
> I've been looking at the benchmark again, and while I still dispute some
> of what you are saying, I notice that the benchmarks mention nothing about
> the update patterns. For example as you point out, we don't know how many
> times an individual row was updated. So, you are correct that these
> benchmarks prove nothing.
>
> I don't have perfect knowledge of the ratio of inserts/updates/deletes in
> the world; my observations are that the relative frequency is
> updates/inserts/deletes - in other words there are more updates, than
> inserts, and there are more inserts than deletes in the workflow that I
> see in my particular domain of influence. I concede that it could be
> wildly different, but my particular experience is more updates than
> anything else. I have checked and confirmed this with my peers.
>
> --
> Hilary Cotter
> Looking for a SQL Server replication book?
> http://www.nwsu.com/0974973602.html
>
> Looking for a FAQ on Indexing Services/SQL FTS
> http://www.indexserverfaq.com
>
> "Michael Hotek" <mike@xxxxxxxxxxxxxxxxxxxxxxxx> wrote in message
> news:uTP39rzGGHA.140@xxxxxxxxxxxxxxxxxxxxxxx
>> Now just where did THAT accusation come from? I really don't appreciate
>> you sticking words in my mouth.
>>
>> You had also better go back and take a MUCH better look at that
>> benchmark, because your conclusions are fatally flawed.
>> 1. The standard workload specified (definitely NOT predominantly update)
>> a. 251140 inserts, 640 deletes, and 1920 updates at the publisher
>> b. 400 inserts, 200 updates, and 50 deletes at each subscriber
>> 2. The high volume workload specified (definitely NOT predominantly
>> update)
>> a. 40,000 inserts, 10,000 deletes, and 50,000 updates downloaded
>> to each subscriber
>> b. 101,000 inserts uploaded from each subscriber
>> c. No numbers for the publisher
>> 3. The tests measured snapshot generation and the number of changes being
>> processed in comparison from 2000 to 2005
>> 4. The benchmark was also explicitly focused on filtered subscriptions
>> (an area of particular performance issues in 2000)
>> 5. Nowhere did it specify ANY network bandwith numbers
>> 6. Nowhere did it specify what the update workload actually was (in terms
>> of explicit statements)
>> 7. Nowhere did it specify that multiple updates to the same row occured
>> between synch cycles
>> 8. The merge agent was DEFINITELY not configured with default settings
>> 9. The benchmark as designd for showing performance improvements, related
>> to specific feature improvements
>> 10. Nowhere did that benchmark discuss or provide any data whatsoever
>> which would be relevant to this discussion or to the question which was
>> asked.
>>
>> It doesn't matter if merge is configured to run continuosly, every
>> microsecond, or once per millenia. The merge engine picks up everything
>> it needs to process during each cycle. The ONLY time that the merge
>> engine essentially ignores multiple updates to a row of data is when
>> those updates happen under a VERY explicit circumstance. Your
>> application must update the row more than once and it must do so after
>> the merge engine has executed a synchronization cycle and before the next
>> cycle has executed. Furthermore, it only ignores a update when you have
>> issued multiple updates to exactly the same column within the same row of
>> data more than once between synchronization cycles. If you have updated
>> more than one column in a row, even across multiple update statements,
>> between synchronization cycles, the merge engine simply packages those
>> into a single update statement. It is extraordinarily rare in production
>> applications to to essentially rapid fire update the same row of data
>> over and over and over again.
>>
>> Benchmarks are benchmarks. They are designed to show what the
>> theoretical capacity of whatever is being tested can do.
>>
>> The only applications I have ever come across in a production environment
>> which even met the muti-update in time requirement had virtually all of
>> their data generated by a machine. Only 1 of those was running merge
>> replication. And on that 1 application, the merge engine issued fewer
>> updates to the subscriber because multiple updates to the same row
>> occured between cycles of the replication engine during less than 10% of
>> the total synchronization cycles.
>>
>> If you are going to quote numbers, provide benchmarks, and specify what
>> something does, you had better be VERY explicit in the exact
>> configuration you are basing it on with the exact options that are set.
>>
>> The only thing which can differ between the transactional engine and the
>> merge engine is when you have an application which makes multiple updates
>> to the same row of data between synchronization cycles. If you do not
>> cause that very specific situation to exist, there is VERY little
>> difference to bandwidth impact. The merge engine, by default will only
>> pick up those columns which have changed whereas the transactional
>> engine, by default, will pick up all columns in the row. But, by
>> changing a simple setting, the merge engine can be made to send all of
>> the columns (even those not changed) while the transactional engine will
>> only pick up just the columns which have changed. Before you get into
>> the multiple writes with the transactional engine (pub->dist +
>> dist->sub), you really need to spend a lot of time running various
>> scripts and testing the number of packets shuttled across the network
>> between the transactional engine sending changes and the merge engine
>> needing to calculate the data differential between publisher/subscriber
>> pairs along with the sending of the actual changes + the conflict
>> resolution process.
>>
>> In terms of bandwidth consumed, both the transactional and the merge
>> engine can be configured to consume remarkably similar chunks of network
>> bandwith.
>>
>> --
>> Mike
>> http://www.solidqualitylearning.com
>> Disclaimer: This communication is an original work and represents my sole
>> views on the subject. It does not represent the views of any other
>> person or entity either by inference or direct reference.
>>
>>
>> "Hilary Cotter" <hilary.cotter@xxxxxxxxx> wrote in message
>> news:%23yEHqGLGGHA.3176@xxxxxxxxxxxxxxxxxxxxxxx
>>> It really depends on your DML. As far as I know there is no typical
>>> workloads. Some workloads are insert heavy. Some are update heavy, some
>>> are a mix of insert and delete. Some are a mix of all three.
>>>
>>> Have a look at the workload MS used here for benchmarking.
>>>
>>> http://www.microsoft.com/technet/prodtechnol/sql/2005/mergrepl.mspx
>>>
>>> Looks pretty update heavy to me! Are you perhaps insinuating that MS is
>>> cooking the books?
>>>
>>> The merge agent by default runs hourly. The distribution agent by
>>> default runs continuously. By default the merge agent only sends a delta
>>> per row during a sync no matter how many have updates occurred since the
>>> last sync. With transactional there is no such consolidation.
>>>
>>> Merge is also tuned for low bandwidth lines.
>>>
>>> --
>>> Hilary Cotter
>>> Looking for a SQL Server replication book?
>>> http://www.nwsu.com/0974973602.html
>>>
>>> Looking for a FAQ on Indexing Services/SQL FTS
>>> http://www.indexserverfaq.com
>>>
>>> "Michael Hotek" <mike@xxxxxxxxxxxxxxxxxxxxxxxx> wrote in message
>>> news:Oqpum4HGGHA.1388@xxxxxxxxxxxxxxxxxxxxxxx
>>>> That is only if the updates happen between cycles of the replication
>>>> engine. Which is an extremely explicit case and also incredibly hard to
>>>> actually do in a production environment. Merge will only send what it
>>>> needs to send. It does this by picking up the ID for a row that changed
>>>> and then pull the data directly from the row. If you happen to be
>>>> lucky enough to update a row multiple times, quickly enough, merge will
>>>> only pick up the most recent change and send it. But, that is a very
>>>> unlikely scenario.
>>>>
>>>> --
>>>> Mike
>>>> http://www.solidqualitylearning.com
>>>> Disclaimer: This communication is an original work and represents my
>>>> sole views on the subject. It does not represent the views of any
>>>> other person or entity either by inference or direct reference.
>>>>
>>>> "Hilary Cotter" <hilary.cotter@xxxxxxxxx> wrote in message
>>>> news:ujkqL0uFGHA.752@xxxxxxxxxxxxxxxxxxxxxxx
>>>>> This is not quite accurate. With transactional replication all the
>>>>> logged changes go across the wire, and are decomposed into individual
>>>>> singletons. So an update transaction affecting 100 rows would be
>>>>> decomposed into 100 separate singleton updates.
>>>>>
>>>>> The same behavior in merge, but with merge replication its the net
>>>>> change which goes across the wire. Suppose the same row is updated 100
>>>>> times. With transactional replication 100 changes flow. With merge
>>>>> only the net change, i.e. the final image of the row will flow if the
>>>>> sync occurs after 100 changes. This is why by default merge
>>>>> replication agents run each hour.
>>>>>
>>>>> --
>>>>> Hilary Cotter
>>>>> Looking for a SQL Server replication book?
>>>>> http://www.nwsu.com/0974973602.html
>>>>>
>>>>> Looking for a FAQ on Indexing Services/SQL FTS
>>>>> http://www.indexserverfaq.com
>>>>>
>>>>> "Z" <z@xxxxx> wrote in message
>>>>> news:u3BpwtuFGHA.2212@xxxxxxxxxxxxxxxxxxxxxxx
>>>>>> Both. It has no bearing on inserts. You get essentially the same
>>>>>> amount of data. It also has no bearing on deletes, because the
>>>>>> transaction is simply pointed at an ID and is also basically the same
>>>>>> size. If you are using purely default settings, then by default
>>>>>> merge transfers less data. You can change the default settings for
>>>>>> transactional and it will then only transfer just those columns which
>>>>>> have changed.
>>>>>>
>>>>>> Both replication methods will transfer data across a 64K link without
>>>>>> any prolems. The issue is with how much data, # of transactions, you
>>>>>> plan on sending between the two.
>>>>>>
>>>>>> "Tom" <mcseman@xxxxxxxxxxx> wrote in message
>>>>>> news:OzH2p9iFGHA.2444@xxxxxxxxxxxxxxxxxxxxxxx
>>>>>>> I'm trying to find out which type of replication takes the least
>>>>>>> network bandwidth to transfer the same amount of data between two
>>>>>>> servers.
>>>>>>> I need to transfer new data from one to another SQL2005 server. I
>>>>>>> could use merge or transactional replication
>>>>>>> or even create my own scripts that would select the right data from
>>>>>>> one server and import it on another but my real concern is what
>>>>>>> method performs best on slow connections (64Kb/s range)?
>>>>>>> I'm planning to test all methods but would still like to get some
>>>>>>> info from people with more experience in that area..
>>>>>>> Thanks.
>>>>>>>
>>>>>>> Tom
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>
>>>
>>
>>
>
>


.



Relevant Pages

  • Re: Replication bandwidth comparison
    ... > You had also better go back and take a MUCH better look at that benchmark, ... Nowhere did it specify that multiple updates to the same row occured ... The merge engine picks up everything ... > once between synchronization cycles. ...
    (microsoft.public.sqlserver.replication)
  • Re: Replication bandwidth comparison
    ... The merge engine picks up everything it ... even across multiple update statements, between synchronization cycles, the ... between cycles of the replication engine during less than 10% of the total ... So an update transaction affecting 100 rows would be ...
    (microsoft.public.sqlserver.replication)
  • Re: RCS Load Simulators
    ... The SCS was configured to launch configuration, ... The A14-275 quad simulators were connected instead of the RCS ... REACTION CONTROL SYSTEM CONFIGURATION ... The engine simulators were installed in lieu of actual engine ...
    (sci.space.history)
  • Re: Problem with Pervasive 2000i SP3
    ... a reinstall makes the most sense. ... It sounds like one or more of the engine files got blown away. ... the ODBC connectors stopped working. ... Trying to look at the configuration of the engines in the ODBC Administrator produces "The setup routines for the Pervasive ODBC Engine Interface ODBC driver could not be found. ...
    (comp.databases.btrieve)
  • Re: Lancair Legacy Design Flaw?
    ... NOT a pusher and the engine is mounted forward of the ...  The weight of the engine on the Legacy is ... involved a tractor configuration with the thrust line being ...
    (rec.aviation.homebuilt)

Quantcast