Databases - Riak
سید احسان مختاری
SQL is Good
Relational Databases Management Systems (
) – mainstay of business
SQL is good
Easy to use
Rich tool set
: all included statements in a transaction are either executed or the whole transaction is aborted without affecting the database.
: a database is in a consistent state before and after a transaction.
: transactions can not see uncommitted changes in the database.
: changes are written to a disk before a database commits a transaction so that committed data cannot be lost through a power failure.
Web-based applications caused spikes.
Internet-scale data size
High read-write rates
designed to be
What is NoSQL
data storage systems.
All NoSQL offerings
one or more of the ACID properties.
Social applications are not
the same level of ACID.
It was first used in 1998 by Carlo Strozzi to name his relational database that did not expose the standard SQL interface.
The term was picked up again in 2009 when a Last.fm develper, Johan Oskarsson, wanted to organize an event to discuss opensource distributed databases.
The name attempted to label the emergence of a growing number of nonrelational, distributed data stores that often did not attempt to provide ACID
Categories of NoSQL Databases
Dynamo, Scalaris, Berkeley DB, ...
BigTable, Hbase, Cassandra, ...
SQL vs. NoSQL
Single storage image. Informally, after an update completes,
any subsequent access will return the updated value.
The system does
that subsequent accesses will return the
If no new updates are made to the object,
will return the last updated value.
: the minimum number of members of an assembly or society that must be present at any of its meetings to make the proceedings of that meeting valid.
user can choose some number
nodes required for a write to succeed
user can specify the number of nodes (
) to be contacted during a read.
To enforce strong consistency:
R + W
N is rarely more than 3
: keeping that amount of copy of data gets
Basho's Riak (N = 3, R = 2, W = 2 default)
Linkedin's Voldemort (N = 2 or 3, R = 1, W = 1 default)
Apache's Cassandra (N = 3, R = 1, W = 1 default)
vailable: possibilities of faults but not a fault of the whole system.
oft state: copies of a data item may be inconsistent.
ventually consistent: copies becomes consistent at some later time if
there are no more updates to that data item.
states that it is impossible for a distributed computer system to simultaneously provide all three of the following guarantees:
: All clients always have the same view of the data.
: Each client can always read and write.
: The system works well despite physical network partition.
* A network partition refers to the failure of a network device that causes a network to be split.
Riak is a
, open source
originally developed by Basho Technologies to run a Salesforce automation business
heavily influenced by Dr. Eric Brewer’s CAP Theorem and Amazon’s Dynamo.
The Riak APIs
Storage operations use
HTTP PUTs or POSTs
and fetches use
Protocol Buffers API
: a simple binary protocol based on the library Google’s open source project of the same name.
Ruby, Java, Erlang, Python, PHP, and C/C++
Buckets, Keys, and Values
the only way to organize data inside of Riak.
Data is stored and referenced by bucket/key pairs.
Each key is attached to a unique value that can be
any data type
Central to any Riak cluster is a
160-bit integer space
which is divided into equally-sized partitions.
Physical servers, referred to in the cluster as
, run a certain number of virtual nodes, or
As a rule, each node in the cluster is responsible for 1/(total number of physical nodes) of the ring
#vnodes on each node = (number of partitions)/(number of nodes)
Nodes can be added and removed from the cluster dynamically and Riak will redistribute the data accordingly.
Riak controls how many replicas of a datum are stored via a setting called
All nodes in the same cluster should agree on and use the same N value.
Riak uses a technique called
to compensate for failed nodes in a cluster:
Reading, Writing, and Updating Data
Using the Riak APIs, Riak objects can be fetched directly if the client knows the bucket and key. This is the fastest way to get data out of Riak.
: the number of Riak nodes which must return results for a read before the read is considered successful.
Read Failure Tolerance
R - N
will tell you the number of down or laggy nodes a Riak cluster can tolerate before becoming unavailable for reads.
For example, an 8 node cluster with an N of 8 and a R of 1 will be able to tolerate up to 7 nodes being down before becoming unavailable for reads.
Reading, Writing, and Updating Data
: Each update to a Riak object is tracked by a vector clock.
Vector clocks allow Riak to determine causal ordering and detect conflicts in a distributed system.
The last update
Return both versions
of the object to the client: client itself resolve the conflict
Allows to process data in real-time in
MapReduce jobs are
described in JSON
using a set of nested hashes describing the
for a job.
A job can consist of an arbitrary number of Map and Reduce phases.
MapReduce in Riak can be thought of as a real-time
A job is
submitted via HTTP
and the results are
returned in JSON-encoded form
What NOT Covered
Links and Link Walking
Special thanks to:
[Presentation] NoSQL Databases - Amir H.Payberah, April 2012
[Book] Seven Databases in Seven Weeks - A Guide to Modern Databases and the NoSQL Movement; Eric Redmond and Jim R. Wilson