Sharding a Database

  • 20

    vkb087 10 months ago

    Great Doc. You can find more information about consistent hashing implementation and detail here.


    http://www.tom-e-white.com/2007/11/consistent-hashing.html

    reply
  • -8

    Ajit 10 months ago

    Java implementation of Sharading

    reply
  • -6

    Ajit 10 months ago

    http://sleeplessinslc.blogspot.in/2008/09/hibernate-shards-maven-simple-example.html

    reply
  • 10

    skcoder 10 months ago

    It looks like this is not addressing the question of how data is actually relocated. I would like to understand how data is physically moved onto the new node when it joins the cluster. Which part of the system does this? Similarly data would need to be relocated to all of the surviving nodes when a node leaves. Is this all done before the node comes online? Also when the node leaves, I suppose this is assuming there must be a redundant copy of data still online that can be moved to the surviving nodes. Would we be expected to come up with this as well in an interview?

    reply
  • 0

    swapnil_marghade 10 months ago

    Storing data in multiple shards is same as replication factor. We can address this concern by using right term as "Replication factor" !

    reply
    • 1

      karan3296 8 months ago

      Not sure they are replicating here. This is just distributing data. Replication is mentioned in the dynamo paper and is a feature of HDFS.

      reply
  • 0

    AB_kyusak 10 months ago

    I cant understand,what advantage we gain by maintaining the multiple copies of the shrad? wht advantage does modified consistent hashing has over consistent hashing.It might sound a dumb question but i am new to the topic :)

    reply
    • 0

      Nivas 8 months ago

      1.All I can think of about maintaining multiple copies per shard is it increases the availability.

      reply
    • 0

      Nivas 8 months ago

      With respect to modified consistent hashing, it makes sure that order in which shards follow each other is not deterministic. So if a shard fails the load will not be on a single shard, it will be shared by all subsequent shards that follow the failed instance at different places.

      reply
    • 0

      karan3296 8 months ago

      It allows for heterogeneity too. Beefier nodes can be used to spread more virtually along the ring whereas weaker servers can be less prominent on the ring.

      reply
  • 0

    kumar955 10 months ago

    I think he needs to expand the answer, what happens if a user shared size increases - do we split the data or not? Do we need to support replicas? How data center to data center replication happens etc?

    reply
  • 0

    kulkav 10 months ago

    http://www.project-voldemort.com/voldemort/design.html

    reply
  • 3

    Srivathsan_Venkatavaradhan 9 months ago

    I think modified consistent hashing explained above is not correct. Please refer https://ihong5.wordpress.com/2014/08/19/consistent-hashing-algorithm/ . It has nothing to do with shards.

    reply
  • 1

    ramanatnsit 8 months ago

    How does the system remain available during the time when a node failure occurs and redistribution of keys are in process. Is it sacrificing availability for consistency

    reply
    • 0

      Nivas 8 months ago

      Each shard will have a leader/master and some replicas. During failure or redistribution the leader will be changed. So availability is not sacrificed. But this may result in increased load in the machine where the elected replica is located if not managed at right time.

      reply
  • 0

    eipie 8 months ago

    What is this obsession with 72GB RAM? Isn't is better / easier to pick powers of two? Or maybe perhaps 100GB treating 1GB = 1000MB for simplicity of calculations?

    reply
    • 0

      sunnyk 3 months ago

      I had the same thought.. 64/128/256 makes so much sense.

      reply
  • 0

    shaily_mittal 6 months ago

    Great doc, didn't know the concept of consistent hashing earlier

    reply
  • 0

    ishtiaque_hussain 3 months ago

    I found the following YouTube tutorial by Curtis on Consistent Hashing interesting, easy to understand:

    reply
  • 1

    ishtiaque_hussain 3 months ago

    I found the following YouTube tutorial by Curtis on Consistent Hashing interesting, easy to understand: https://www.youtube.com/watch?v=jznJKL0CrxM

    reply
  • 0

    mohammed_alhfian 27 days ago

    yes

    reply
Click here to jump start your coding interview preparation