Column (data store)

Summary

A column of a distributed data store is a NoSQL object of the lowest level in a keyspace. It is a tuple (a key–value pair) consisting of three elements:

  • Unique name: Used to reference the column
  • Value: The content of the column. It can have different types, like AsciiType, LongType, TimeUUIDType, UTF8Type among others.
  • Timestamp: The system timestamp used to determine the valid content.
A column consists of a (unique) name, a value, and a timestamp.

Usage edit

A column is used as a store for the value and has a timestamp that is used to differentiate the valid content from stale ones. According to the CAP theorem, distributed data stores cannot guarantee consistency, as availability and partition tolerance are more important issues. Therefore, the data store or the application programmer will use the timestamp to find out which of the stored values in the backup nodes are up-to-date.

Some data stores, like Riak, may use the more sophisticated vector clock instead of the timestamp to resolve stale information.

Differences from a relational database edit

In relational databases, a column is a part of a relational table that can be seen in each row of the table. This is not the case in distributed data stores, where the concept of a table only vaguely exists. A column can be part of a ColumnFamily that resembles at most a relational row, but it may appear in one row and not in the others. Also, the number of columns may change from row to row, and new updates to the data store model may also modify the column number. So, all the work of keeping up with changes relies on the application programmer.

Examples edit

Three definitions of columns in JSON-like notation are given below:

{
    street: {name: "street", value: "1234 x street", timestamp: 123456789},
    city: {name: "city", value: "san francisco", timestamp: 123456789},
    zip: {name: "zip", value: "94107", timestamp: 123456789},
}

See also edit

References edit