Design Twitter

  • 1

    agniswar_bakshi 7 months ago

    I still feel NoSql would have been a better choice over MySql here.

    reply
    • 0

      nikhil_agarwal 6 months ago

      You can back that with answers to these questions:

      reply
    • 0

      nikhil_agarwal 6 months ago


      1. How will it scale? 2. Do we need to query on Values and find insights?

      reply
    • 0

      sudhagar 4 months ago

      Reference: http://highscalability.com/blog/2011/12/19/how-twitter-stores-250-million-tweets-a-day-using-mysql.html

      reply
  • 0

    cong_wang 2 months ago

    different data could store in different data stores. user data -> mysql, relations -> dynamo, media files -> files system.

    reply
  • 0

    cong_wang 2 months ago

    different data could store in different data stores. user data -> mysql, relations -> dynamo, media files -> files system.

    reply
  • 0

    californier67 about 1 month ago

    I have been asked this question during an interview and I can say that the justification for using a MySQL is wrong, you will never do a join for system such twitter. 500 millions of users and you think you will be able to fetch the twitters doing a JOIN . Of course not, and it is without saying that the shuffle of data in the network between different shared will be super slow. CONSISTENCY is not required there so instead a NoSql and have a batch system that will update the feed of users continually. Thus you will just fetch your most recent feed and you will get something fresh once the batch system have update you feed based on new tweets. Tell your interviewer that you are going to user MySql, and go back home coz you are done.

    reply
    • 2

      Stup 22 days ago

      Twitter uses MySQL http://www.slideshare.net/nkallen/qcon/59

      reply
  • 0

    californier67 about 1 month ago

    Also the approach retained using Shard on recency( timestamp ) is so wrong. So all the request will hit the same shard ? 90% of request will need to fetch the most recent tweets. sharding by time will create one hot spot . Plus what is the primary key there ? we need to filter the tweets for specific users. If you create a double primary key by user and time we will make the write operation so slow.

    reply
    • 1

      iris_ren 8 days ago

      I think sharding on recency is what twitter did, at least at that time https://www.slideshare.net/nkallen/qcon/59

      reply
    • 0

      iris_ren 8 days ago

      Probably although it forms a hot spot as you said, it is also a good case for caching. and maybe replication of caching will resolve the hot spot problem.

      reply
  • 0

    appleiiiii 30 days ago

    use 1 byte for user id? is it a joke?

    reply
Click here to jump start your coding interview preparation