hashtable - What is the use of a Hash range in a dynamodb table? -


i new dynamodb (ddb). going through documentation , says add hash key , hash range key. in documentation says ddb create usorted index on hash key , sorted index on hash range.

what purpose of having these 2 keys rather 1 key. because first key used : hashtable contains : key - range of keys each value in hash range

2nd hashtable hash range key - actual data value.

this segregate data , make lookup fast. why 2 levels of hashmaps, n number of layers , faster lookups.

thank in advance.

q:"what purpose of having these 2 keys rather 1 key?"

in terms of data model, hash key allows uniquely identify record table, , range key can optionally used group , sort several records retrieved together. example: if defining aggregate store order items, orderid hash key, , orderitemid range key. can find below formal definition use of these 2 keys:

"composite hash key range key allows developer create primary key composite of 2 attributes, 'hash attribute' , 'range attribute.' when querying against composite key, hash attribute needs uniquely matched range operation can specified range attribute: e.g. orders werner in past 24 hours, or games played individual player in past 24 hours." [vogels]

so range key adds grouping capability data model, however, use of these 2 keys have implication on storage model:

"dynamo uses consistent hashing partition key space across replicas , ensure uniform load distribution. uniform key distribution can achieve uniform load distribution assuming access distribution of keys not highly skewed." [ddb-sosp2007]

not hash key allows uniquely identify record, mechanism ensure load distribution. range key (when used) helps indicate records retrieved together, therefore, storage can optimized such need.

q:"but why 2 levels of hashmaps? n number of layers , faster lookups."

having many layers of lookups add exponential complexity run database in cluster environment , 1 of essential use cases majority of nosql databases. database has highly available, failure-proof, scalable, , still perform in distributed environment.

"one of key design requirements dynamo must scale incrementally. requires mechanism dynamically partition data on set of nodes (i.e., storage hosts) in system. dynamo’s partitioning scheme relies on consistent hashing distribute load across multiple storage hosts."[ddb-sosp2007]

it trade off, every single limitation see in nosql databases introduced storage model requirements. although relational databases flexible in terms of data modeling have several limitations when comes run in distributed environment.

choosing correct keys represent data 1 of critical aspects during design process, , directly impacts how application perform, scale , cost.


footnotes:

  • the data model model through perceive , manipulate our data. describes how interact data in database [fowler]. in other words, how abstract data model, way group entities, attributes choose primary keys, etc

  • the storage model describes how database stores , manipulates data internally [fowler]. although cannot control directly, can optimize how data retrieved or written knowing how database works internally.


Comments

Popular posts from this blog

c++ - Delete matches in OpenCV (Keypoints and descriptors) -

java - Could not locate OpenAL library -

sorting - opencl Bitonic sort with 64 bits keys -