jueves, 29 de marzo de 2012

[DPS Lab] Consistency and Replication

Replication




"Is the process of sharing information so as to ensure consistency between redundant resources, such as processes, storage devices and systems, etcetera. 
The same data is shared and/or stored in several devices. 
The replication itself should be transparent to an external user. Also, in a failure scenario, a failover of replicas is hidden as much as possible. "


Types
  • Active: is performed by processing the same request at every replica.
  • Passive: each single request is processed on a single replica and then its state is transferred to the other replicas.

Why perform Replication?
  • Enhance reliability
    • If one replica is unavailable or crashes, use another.
    • Protect against corrupted data
  • Improve performance
    • Scale with size of the distributed system (replicated web servers).
    • Scale in geographically distributed systems
  • Key issue: need to maintain consistency of replicated data.
Keeping replicas up to date needs a lot of network use, so, we need a high speed broadband connection between nodes. The updates need to be atomic (transaction) The replicas need to be synchronized (time consuming)

Consistency


Is the process of keep the same information in all the systems all the time.

All the replicas must be consistent, if one copy is modified, others become inconsistent; so:
  • Modifications have to be carried out on all copies.
  • Problems with network performance.
  • It is needed to handling concurrency.
Consistency models are used in distributed systems like distributed shared memory systems or distributed data stores (such as a filesystems, databases, optimistic replication systems or Web caching). The system supports a given model, if operations on memory follow specific rules. The data consistency model specifies a contract between programmer and system, wherein the system guarantees that if the programmer follows the rules, memory will be consistent and the results of memory operations will be predictable. The most common models are:
  • Casual Consistency: Writes that are potentially casually related must be seen by all processes in the same order. Concurrent writes may be seen in a different order on different machines.
  • FIFO Consistency: Writes done by a single process are seen by all other processes in the order in which they were issued, but writes from different processes may be seen in a different order by different processes.
  • Strict Consistency: Any read always returns the result of the most recent write, implicitly assumes the presence of a global clock. A write is immediately visible to all processes.
  • Secuential Consistency: Weaker than strict consistency. Assumes all operations are executed in some sequential order and each process issues operations in program order. Any valid interleaving is allowed. All agree on the same interleaving. Each process preserves its program order. Nothing is said about “most recent write”
Consistency Model Comparison



References

miércoles, 28 de marzo de 2012

[DPS Class] Contributions WEEK 8



Last week, the members of the cluster team did a work meeting. We talked about the importance of join our efforts to build only one cluster instead of work separately in small groups.

We decided the specifications of each computer that are in cluster:
  • Operating System: Ubuntu 10.04 64bits
  • User: cluster
  • Password: (1)
Also, we did the schedule of the activities that we need to do to configure the cluster.
  1. Install all the software and packages.
  2. Configure VPN.
  3. Configure a computing grid.
In a second stage, we are going to integrate a website, where will be possible to upload the tasks, which will be validated, executed and processed by the cluster, and at the final we can download the results, maybe in a compressed file.
This task will be done by Emmanuel, with some tools like:
  1. Apache, PHP
  2. Ruby on Rails
  3. Configure a computing grid.

I think this was necessary. Now we are working together and I hope that the results are even more significant than before.

Nominations
  • Cecy Urbina: Because she remains active in exam weeks
  • Roberto Martinez: Because he remains active in exam weeks
  • Rafael Lopez: Because he participate actively in the meeting, distributing task and clarifying doubts.

(1): Check the Google Document posted in our Facebook page to check this information. (Security :P)

lunes, 12 de marzo de 2012

[DPS Class] Contributions WEEK 6



This week, my partner Rafael Lopez and work on the interconnection of two VPN, one in each home, using two different methods:
  • OpenVPN: I've updated this entry, check the new information at the bottom.
  • HamuchiBy Rafael Lopez
this will serve to us in the future to connect 2 completely separate cluster and make a grid.

For the laboratory, I make the entry: Grid Computing & Cluster Computing

I was unable to login to the wiki because the server stills down, so I wrote the entries in my blog and I will transcribe them to the wiki later.

NOMINATIONS
  • Rafael Lopez For the implementation of a Hamuchi Network, and he still helps me in all the work, and also because he called the group for the first meeting to check the progress of the proyect.
  • Cecilia Urbina for the implementation of Parallel PI Calculation
  • Roberto Martinez for the implemtation of PseudoRMI in Python using Pyro

sábado, 10 de marzo de 2012

[DPS Lab] Cluster Computing & Grid Computing

Grid computing is focused on the ability to support computation across administrative domains sets it apart from traditional computer clusters or traditional distributed computing. Grids offer a way of using the information technology resources optimally inside an organization. In short, it involves virtualizing computing resources. Grid computing is often confused with cluster computing. Functionally, one can classify grids into several types: Computational Grids (including CPU scavenging grids), which focuses primarily on computationally-intensive operations, and Data grids, or the controlled sharing and management of large amounts of distributed data.

Definition


There are many definitions of the term Grid computing:
  • A service for sharing computer power and data storage capacity over the Internet
  • An ambitious and exciting global effort to develop an environment in which individual users can access computers, databases and experimental facilities simply and transparently, without having to consider where those facilities are located.
  • Grid compting is a model for allowing companies to use a large number of computing resources on demand, no matter where they are located.

Cluster Computing VS. Grid Computing


When two or more computers are used together to solve a problem, it is called a computer cluster . Then there are several ways of implementing the cluster, Beowulf is maybe the most known way to do it, but basically it is just cooperation between computers in order to solve a task or a problem. Cluster Computing is then just the thing you do when you use a computer cluster.

Grid computing is something similar to cluster computing, it makes use of several computers connected is some way, to solve a large problem. There is often some confusion about the difference between grid vs. cluster computing. The big difference is that a cluster is homogenous while grids are heterogeneous. The computers that are part of a grid can run different operating systems and have different hardware whereas the cluster computers all have the same hardware and OS. A grid can make use of spare computing power on a desktop computer while the machines in a cluster are dedicated to work as a single unit and nothing else.

Grid are inherently distributed by its nature over a LAN, metropolitan or WAN. On the other hand, the computers in the cluster are normally contained in a single location or complex.

Another difference lies in the way resources are handled. In case of Cluster, the whole system (all nodes) behave like a single system view and resources are managed by centralized resource manager. In case of Grid, every node is autonomous i.e. it has its own resource manager and behaves like an independent entity.

Characteristics of Grid Computing

  • Loosely coupled (Decentralization)
  • Diversity and Dynamism
  • Distributed Job Management & scheduling

Characteristics of Cluster computing

  • Tightly coupled systems
  • Single system image
  • Centralized Job management & scheduling system

Areas of Grid Computing and it's applications for modeling and computing


  1. Predictive Modeling and Simulations
  2. Predictive Modeling is done through extensive computer simulation experiments, which often involve large-scale computations to achieve the desired accuracy and turnaround time. It can also be called "Modeling the Future". Such numerical modeling requires state-of-the-art computing at speeds approaching 1 GFLOPS and beyond. In case of Computational Biology , It is the modeling and simulation of self-organizing adaptive response of systems where spatial and proximal information is of paramount importance. Areas of Research include:
    • Numerical Weather Forecasting.
    • Flood Warning
    • Semiconductor Simulation
    • Oceanography
    • Astrophysics (Modeling of Black holes and Astronomical formations)
    • Sequencing of the human genome
    • Socio-economic and Government use
  3. Engineering Design and Automation
    • Finite-element analysis
    • Computational aerodynamics
    • Remote Sensing Applications
    • Artificial Intelligence and Automation
    This areas requires parallel processing for the following intelligence functions:
    • Image Processing
    • Pattern Recognition
    • Computer Vision
  4. Energy Resources Exploration
    • Seismic Exploration
    • Reservoir Modeling
    • Plasma Fusion Power
    • Nuclear Reactor Safety
  5. Medical, Military and Basic Research
    • Medical Imaging
    • Quantum Mechanics problems
    • Polymer Chemistry
    • Nuclear Weapon Design
  6. Visualization
    • Computer-generated graphics, films and animations
    • Data Visualization
    i

This is an excelent document that explain the diferences and characteristics of Cluster, Grid ans Cloud computing: Cluster, Grid and Cloud Computing

References

jueves, 1 de marzo de 2012

[DPS Class] Contributions WEEK 5



This week, my partner Rafael Lopez and me were working on the installation and configuration of a VPN server, one in each home, this will serve to us in the future to connect 2 completely separate cluster. This is known as GRID COMPUTING

I was unable to login to the wiki, so I wrote the entries in my blog and when I fix the problem with my account I will transcribe them to the wiki.

LINKS:
NOMINATIONS
  • Rafael Lopez: It can be very repetitive, but I think he's one of those who most actively participated iand contributed to the creation of the cluster, and which has given more results.