Highly Available Database

  • 0

    cesar_augusto_guzman_alvarez 3 months ago

    nice course. I only think it is a little bit long.

    reply
  • 0

    jeffery 3 months ago

    100T +

    reply
    • 0

      jeffery 3 months ago

      Sorry mistype, I wanted to leave this on the note area.

      reply
  • 1

    henry_henry 3 months ago

    How exactly do peer-to-peer systems work? Doesn't there still have to be a high-level router that routes traffic to one of the systems, which represents a SPOF? Any good reads on these?

    reply
    • 1

      jax.teller 3 months ago

      The major idea behind a peer to peer system is that all the nodes are similar in nature/power. Here we use the idea in the sense that multiple nodes can be the owner of a data D. Depending on the settings, more than one node have to go down for this data to become unavailable. The high-level router you talk about can consist of multiple machines since it doesn't require any actual information about that data other than where it might be stored.

      reply
      • 0

        henry_henry 3 months ago

        I see, I was originally a bit confused about consistent hashing using peer-peer systems. Out of curiosity, how do we get rid of "Single point of failure"? Will there not always be 1 SPOF regardless of what system we create? In a general sense, if a system goes down on any level of an architecture, doesn't there have to be a router of some sort that directs traffic?
        reply

        reply
        • 0

          kuzmin about 2 months ago

          You could have DNS-level routing that points to multiple load balancers

          reply
  • 0

    henry_henry 3 months ago

    I see, I was originally a bit confused about consistent hashing using peer-peer systems. Out of curiosity, how do we get rid of "Single point of failure"? Will there not always be 1 SPOF regardless of what system we create? In a general sense, if a system goes down on any level of an architecture, doesn't there have to be a router of some sort that directs traffic?

    reply
    • 0

      henry_henry 3 months ago

      Meant to write this as a reply

      reply
      • 0

        munir_mehta 2 months ago

        I am assuming you are talking about load balancer who directs traffic in peer to peer system and worried about that going down. Highly available system has redundant load balancer as well which will come back online if it doesnt hear back from master. Highly available system has everything redudant starting from load balancer to hubs and switches as well.

        reply
  • 0

    sarang 3 months ago

    For a read to be consistent(return the latest write), we need to keep W + R > P.

    reply
    • 0

      sarang 3 months ago

      first thing first, what is P? is it number of shards for a key?

      reply
      • 0

        rajputnr 3 months ago

        P is the replication factor here as explained above

        reply
      • 0

        sumit_007 3 months ago

        Yes, It is the number of shards which contains data for a paticular key.

        reply
        • 0

          kartikeya_singh about 1 month ago

          P is the no. of peers(machines) in a shard, shards don't inter-replicate data.

          reply
  • 0

    rajputnr 3 months ago

    P is the replication factor here as explained above

    reply
  • 0

    sumit_007 3 months ago

    Does P, W and R predefined ? Or they change dynamically as per new machines addition/deletion ?
    Ex: For a total of 15 machines, I keep 'P' to be 5, 'W' and 'R' both to be 3.
    Q1. What is the criteria to decide 'P' as 5 out of total 15 machines ? Any good articles/paper ?
    Q2. How to handle if machines left available goes down below 'W' ? How sacrificing data consistency means here?
    Q3. After addition of 10 more machines, will value of P, W, and R remains same ?

    reply
    • 0

      jax.teller 3 months ago

      Q. Does P, W and R predefined ? Or they change dynamically as per new machines addition/deletion ?
      A. P, W and R should be defined according to the use case for the system. For example, a system with high reads and low writes, we can keep R as 1 and W as P(hypothetically) so that we spend less time on reads while it is okay to spend more time on writes. P, W and R should be adjusted so that it represents how important the data is(P) and what is the ratio of reads vs writes.


      Q. How to handle if machines left available goes down below 'W' ? What sacrificing data consistency means here?
      A. Sacrificing data consistency would mean that if we are only able to perform a write on a subset of W nodes, while reading we may get an outdated value reading from R nodes. For example, assuming P = 5, W = R = 3. Let's name these machines M1, M2 .. M5. Assume before writing a value machines M1, M2 and M3 dies. For this write, we will have to do with M4 and M5. Now after some time, M1, M2 and M3 comes up. Now, if we try to read the earlier value and if we only read from M1, M2 and M3, we won't have the latest value for the data and hence we are compromising consistency.


      Q. After addition of 10 more machines, will value of P, W, and R remains same ?
      A. Yes. P, W and R are parameters about a single data value itself. More machines means more data to store with the same configuration.

      reply
  • 0

    kaidul about 2 months ago

    Size of the value for a key can increase as mentioned earlier. But the solution wasn't discussed :O

    reply
  • -1

    tersduz88 24 days ago

    You say " we would need to compromise with consistency if we have availability and partition tolerance. ". This is incorrect, we don't need to sacrifice anything if there is no network partition, in other words, you don't need to sacrifice consistency/availability unless there is a network partition.

    reply
Click here to jump start your coding interview preparation