jueves, 29 de marzo de 2012

[DPS Lab] Consistency and Replication

Replication




"Is the process of sharing information so as to ensure consistency between redundant resources, such as processes, storage devices and systems, etcetera. 
The same data is shared and/or stored in several devices. 
The replication itself should be transparent to an external user. Also, in a failure scenario, a failover of replicas is hidden as much as possible. "


Types
  • Active: is performed by processing the same request at every replica.
  • Passive: each single request is processed on a single replica and then its state is transferred to the other replicas.

Why perform Replication?
  • Enhance reliability
    • If one replica is unavailable or crashes, use another.
    • Protect against corrupted data
  • Improve performance
    • Scale with size of the distributed system (replicated web servers).
    • Scale in geographically distributed systems
  • Key issue: need to maintain consistency of replicated data.
Keeping replicas up to date needs a lot of network use, so, we need a high speed broadband connection between nodes. The updates need to be atomic (transaction) The replicas need to be synchronized (time consuming)

Consistency


Is the process of keep the same information in all the systems all the time.

All the replicas must be consistent, if one copy is modified, others become inconsistent; so:
  • Modifications have to be carried out on all copies.
  • Problems with network performance.
  • It is needed to handling concurrency.
Consistency models are used in distributed systems like distributed shared memory systems or distributed data stores (such as a filesystems, databases, optimistic replication systems or Web caching). The system supports a given model, if operations on memory follow specific rules. The data consistency model specifies a contract between programmer and system, wherein the system guarantees that if the programmer follows the rules, memory will be consistent and the results of memory operations will be predictable. The most common models are:
  • Casual Consistency: Writes that are potentially casually related must be seen by all processes in the same order. Concurrent writes may be seen in a different order on different machines.
  • FIFO Consistency: Writes done by a single process are seen by all other processes in the order in which they were issued, but writes from different processes may be seen in a different order by different processes.
  • Strict Consistency: Any read always returns the result of the most recent write, implicitly assumes the presence of a global clock. A write is immediately visible to all processes.
  • Secuential Consistency: Weaker than strict consistency. Assumes all operations are executed in some sequential order and each process issues operations in program order. Any valid interleaving is allowed. All agree on the same interleaving. Each process preserves its program order. Nothing is said about “most recent write”
Consistency Model Comparison



References

1 comentario: