Design Twitter

  • 1

    agniswar_bakshi 11 months ago

    I still feel NoSql would have been a better choice over MySql here.

    reply
    • 0

      nikhil_agarwal 11 months ago

      You can back that with answers to these questions:

      reply
    • 0

      nikhil_agarwal 11 months ago


      1. How will it scale? 2. Do we need to query on Values and find insights?

      reply
    • 0

      sudhagar 8 months ago

      Reference: http://highscalability.com/blog/2011/12/19/how-twitter-stores-250-million-tweets-a-day-using-mysql.html

      reply
  • 0

    cong_wang 7 months ago

    different data could store in different data stores. user data -> mysql, relations -> dynamo, media files -> files system.

    reply
  • 0

    cong_wang 7 months ago

    different data could store in different data stores. user data -> mysql, relations -> dynamo, media files -> files system.

    reply
  • 0

    californier67 6 months ago

    I have been asked this question during an interview and I can say that the justification for using a MySQL is wrong, you will never do a join for system such twitter. 500 millions of users and you think you will be able to fetch the twitters doing a JOIN . Of course not, and it is without saying that the shuffle of data in the network between different shared will be super slow. CONSISTENCY is not required there so instead a NoSql and have a batch system that will update the feed of users continually. Thus you will just fetch your most recent feed and you will get something fresh once the batch system have update you feed based on new tweets. Tell your interviewer that you are going to user MySql, and go back home coz you are done.

    reply
    • 4

      Stup 5 months ago

      Twitter uses MySQL http://www.slideshare.net/nkallen/qcon/59

      reply
    • 0

      erdem_ozdemir 4 months ago

      It is worth to mention that using batch system is not a bad idea, I think instagram uses that if I am not wrong.

      reply
  • 0

    californier67 6 months ago

    Also the approach retained using Shard on recency( timestamp ) is so wrong. So all the request will hit the same shard ? 90% of request will need to fetch the most recent tweets. sharding by time will create one hot spot . Plus what is the primary key there ? we need to filter the tweets for specific users. If you create a double primary key by user and time we will make the write operation so slow.

    reply
    • 1

      iris_ren 5 months ago

      I think sharding on recency is what twitter did, at least at that time https://www.slideshare.net/nkallen/qcon/59

      reply
    • 0

      iris_ren 5 months ago

      Probably although it forms a hot spot as you said, it is also a good case for caching. and maybe replication of caching will resolve the hot spot problem.

      reply
  • 1

    appleiiiii 6 months ago

    use 1 byte for user id? is it a joke?

    reply
  • 0

    yawn_zheng about 2 months ago

    A wide-column NoSQL database like cassandra or HBase also could handle those relationships.

    reply
Click here to start solving coding interview questions