Relational Databases Management Systems (RDMBSs) – mainstay of business
SQL is good
Rich language
Easy to use
Rich tool set
Promise: ACID
Page 3/31
ACID
Atomicity: all included statements in a transaction are either executed or
the whole transaction is aborted without affecting the database.
Consistency: a database is in a consistent state before and after a
transaction.
Isolation: transactions can not see uncommitted changes in the database.
Durability: changes are written to a disk before a database commits a
transaction so that committed data cannot be lost through a power failure.
Page 4/31
Page 5/31
SQL Challenges
Web-based applications caused spikes.
Internet-scale data size
High read-write rates
Large data
RDBMS were not designed to be distributed.
Page 6/31
Page 7/31
What is NoSQL
Class of non-relational data storage systems.
All NoSQL offerings relax one or more of the ACID properties.
Social applications are not banks and they don't need the same level of ACID.
Page 8/31
NoSQL History
It was first used in 1998 by Carlo Strozzi to name his relational
database that did not expose the standard SQL interface.
The term was picked up again in 2009 when a Last.fm develper, Johan
Oskarsson, wanted to organize an event to discuss opensource
distributed databases.
The name attempted to label the emergence of a growing number of
nonrelational, distributed data stores that often did not attempt to
provide ACID
Page 9/31
Categories of NoSQL Databases
Key/Value stores
Dynamo, Scalaris, Berkeley DB, ...
Column-oriented databases
BigTable, Hbase, Cassandra, ...
Document databases
MongoDB, Terrastore, Riak, ...
Page 10/31
SQL vs. NoSQL
Page 11/31
Consistency
Strong consistency
Single storage image. Informally, after an update completes, any subsequent
access will return the updated value.
southern-sun.blogfa.com
Page 12/31
Consistency
Eventual consistency
The system does not guarantee that subsequent accesses will return the updated
value.
Inconsistency window
If no new updates are made to the object, eventually all accesses will return the last updated value.
Page 13/31
Quorum Model
Quorum: the minimum number of members of an assembly or society that must be present at any of its meetings to make the proceedings of that meeting valid.
N, R and W
user can choose some number W-of-N nodes required for a write to succeed
user can specify the number of nodes (R-of-N) to be contacted during a read.
To enforce strong consistency: R + W > N
N is rarely more than 3: keeping that amount of copy of data gets expensive!
Basho's Riak (N = 3, R = 2, W = 2 default)
Linkedin's Voldemort (N = 2 or 3, R = 1, W = 1 default)
Apache's Cassandra (N = 3, R = 1, W = 1 default)
Page 14/31
BASE Properties
Basically Available: possibilities of faults but not a fault of the whole system.
Soft state: copies of a data item may be inconsistent.
Eventually consistent: copies becomes consistent at some later time if there are no more updates to that data item.
Page 15/31
CAP Theorem
states that it is impossible for a distributed computer system to simultaneously provide all three of the following guarantees:
Consistency: All clients always have the same view of the data.
Availability: Each client can always read and write.
Partition Tolerance: The system works well despite physical network partition.
* A network partition refers to the failure of a network device that causes a network to be split.
Page 16/31
CAP Theorem
Page 17/31
Page 18/31
Page 19/31
Riak
Riak is a distributed, scalable, open source key/value store
originally developed by Basho Technologies to run a Salesforce automation business
heavily influenced by Dr. Eric Brewer’s CAP Theorem and Amazon’s Dynamo.
The Riak APIs
Storage operations use HTTP PUTs or POSTs and fetches use HTTP GETs
fully-featured Protocol Buffers API: a simple binary protocol based on the library Google’s open source project of the same name.
Client Libraries
Ruby, Java, Erlang, Python, PHP, and C/C++
Page 20/31
Buckets, Keys, and Values
the only way to organize data inside of Riak.
Data is stored and referenced by bucket/key pairs.
Each key is attached to a unique value that can be any data type.
Page 21/31
Clustering
Central to any Riak cluster is a 160-bit integer space which is divided into equally-sized partitions.
Physical servers, referred to in the cluster as “nodes”, run a certain number of virtual nodes, or “vnodes”.
As a rule, each node in the cluster is responsible for 1/(total number of physical nodes) of the ring
#vnodes on each node = (number of partitions)/(number of nodes)
Nodes can be added and removed from the cluster dynamically and Riak will redistribute the data accordingly.
Page 22/31
Clustering
Page 23/31
Replication
Riak controls how many replicas of a datum are stored via a setting called “N value”
All nodes in the same cluster should agree on and use the same N value.
Riak uses a technique called ‘hinted handoff’ to compensate for failed nodes in a cluster: self-healing
Page 24/31
Reading, Writing, and Updating Data
Using the Riak APIs, Riak objects can be fetched directly if the client knows the bucket and key. This is the fastest way to get data out of Riak.
R Value: the number of Riak nodes which must return results for a read before the read is considered successful.
Read Failure Tolerance
R - N will tell you the number of down or laggy nodes a Riak cluster can tolerate before becoming unavailable for reads.
For example, an 8 node cluster with an N of 8 and a R of 1 will be able to tolerate up to 7 nodes being down before becoming unavailable for reads.
Page 25/31
Reading, Writing, and Updating Data
Vector Clocks: Each update to a Riak object is tracked by a vector clock.
Vector clocks allow Riak to determine causal ordering and detect conflicts in a distributed system.
Conflict Resolution:
The last update automatically “win”
Return both versions of the object to the client: client itself resolve the conflict
Page 26/31
Map Reduce
Allows to process data in real-time in parallel
MapReduce jobs are described in JSON using a set of nested hashes describing the inputs, phases , and timeout for a job.
A job can consist of an arbitrary number of Map and Reduce phases.
MapReduce in Riak can be thought of as a real-time “mini-Hadoop”
A job is submitted via HTTP and the results are returned in JSON-encoded form.
Page 27/31
What NOT Covered
Secondary Indexes
Links and Link Walking
Riak Search
Commit Hooks
...
Page 29/31
Thanks
Special thanks to: Amir H.Payberah
Page 30/31
References
[Presentation] NoSQL Databases - Amir H.Payberah, April 2012