Sharding a Database

  • 7

    vkb087 4 months ago

    Great Doc. You can find more information about consistent hashing implementation and detail here.


    http://www.tom-e-white.com/2007/11/consistent-hashing.html

    reply
  • 0

    Ajit 3 months ago

    Java implementation of Sharading

    reply
  • 0

    Ajit 3 months ago

    http://sleeplessinslc.blogspot.in/2008/09/hibernate-shards-maven-simple-example.html

    reply
  • 4

    skcoder 3 months ago

    It looks like this is not addressing the question of how data is actually relocated. I would like to understand how data is physically moved onto the new node when it joins the cluster. Which part of the system does this? Similarly data would need to be relocated to all of the surviving nodes when a node leaves. Is this all done before the node comes online? Also when the node leaves, I suppose this is assuming there must be a redundant copy of data still online that can be moved to the surviving nodes. Would we be expected to come up with this as well in an interview?

    reply
  • 0

    swapnil_marghade 3 months ago

    Storing data in multiple shards is same as replication factor. We can address this concern by using right term as "Replication factor" !

    reply
    • 1

      karan3296 24 days ago

      Not sure they are replicating here. This is just distributing data. Replication is mentioned in the dynamo paper and is a feature of HDFS.

      reply
  • 0

    AB_kyusak 3 months ago

    I cant understand,what advantage we gain by maintaining the multiple copies of the shrad? wht advantage does modified consistent hashing has over consistent hashing.It might sound a dumb question but i am new to the topic :)

    reply
    • 0

      Nivas 25 days ago

      1.All I can think of about maintaining multiple copies per shard is it increases the availability.

      reply
    • 0

      Nivas 25 days ago

      With respect to modified consistent hashing, it makes sure that order in which shards follow each other is not deterministic. So if a shard fails the load will not be on a single shard, it will be shared by all subsequent shards that follow the failed instance at different places.

      reply
    • 0

      karan3296 24 days ago

      It allows for heterogeneity too. Beefier nodes can be used to spread more virtually along the ring whereas weaker servers can be less prominent on the ring.

      reply
  • 0

    kumar955 3 months ago

    I think he needs to expand the answer, what happens if a user shared size increases - do we split the data or not? Do we need to support replicas? How data center to data center replication happens etc?

    reply
  • 0

    kulkav 3 months ago

    http://www.project-voldemort.com/voldemort/design.html

    reply
  • 0

    Srivathsan_Venkatavaradhan 2 months ago

    I think modified consistent hashing explained above is not correct. Please refer https://ihong5.wordpress.com/2014/08/19/consistent-hashing-algorithm/ . It has nothing to do with shards.

    reply
  • 0

    ramanatnsit about 1 month ago

    How does the system remain available during the time when a node failure occurs and redistribution of keys are in process. Is it sacrificing availability for consistency

    reply
    • 0

      Nivas 25 days ago

      Each shard will have a leader/master and some replicas. During failure or redistribution the leader will be changed. So availability is not sacrificed. But this may result in increased load in the machine where the elected replica is located if not managed at right time.

      reply
  • 0

    eipie 21 days ago

    What is this obsession with 72GB RAM? Isn't is better / easier to pick powers of two? Or maybe perhaps 100GB treating 1GB = 1000MB for simplicity of calculations?

    reply
Click here to jump start your coding interview preparation