Choosing the Best Architecture


I have a terrible challenge to resolve. Make the good architecture choice to build the next big thing.

First, let's introduce the problematics of my project in few words. It's an iOS app (but also an Android app as soon as possible) with chat rooms + real-time feeds access + massive social features + massive game mechanics (with rewards). Furthermore, we need to store all the actions, the messages and the interactions to generate reports and data analysis.




Of course, we hope (and we will work 200% for that) our app will become mainstream, so I need a scalable architecture. Browsing the web (and specially Quora), I ended with this type:

What do you think?



Nick Kallen (@nk), twitter engineer:
  • "All engineering solutions are transient
  • Nothing's perfect but some solutions are good enough for a while
  • Scalability solutions aren't magic. They involve partitioning, indexing, and replication
  • All data for real-time queries MUST be in memory. Disk is for writes only.
  • Some problems can be solved with pre-computation, but a lot can't
  • Exploit locality where possible"



Sources: 
Big Data in Real-Time at Twitter: http://www.slideshare.net/nkallen/q-con-3770885
Instagram architecture: http://www.quora.com/Why-did-Burbn-Instagram-choose-Postgres-over-MySQL?q=instagram+database
Foursquare: http://www.quora.com/What-stack-does-Foursquare-run-on-EC2?q=foursquare+technology
Facebook: http://royal.pingdom.com/2010/06/18/the-software-behind-facebook/ and
http://www.quora.com/What-database-technology-is-Facebook-built-on?q=facebook+database

Groupon architecture: http://www.quora.com/Groupon/What-technology-platform-is-Groupon-built-on?q=groupon+technology
Zynga: http://www.quora.com/How-does-server-technology-work-for-Zyngas-games?q=zynga+technology
LinkedIn: http://www.quora.com/What-is-LinkedIn-s-database-architecture-like?q=linkedin+database
Quora: http://www.philwhln.com/quoras-technology-examined
Yelp: http://www.quora.com/What-is-Yelps-technology-stack?q=yelp+technology


[update 03/04/2011]


As Johann (CTO @seesmic) suggested us in late january + developers during the BeMyApp event in San Francisco, I will have a closer look at node.js + mongoDB, it looks pretty cool:

9 comments:

  1. My advise would be to start simple and to not solve problems you don't have right now ("[some] solutions are good enough for a while"). So get rid of what's not absolutely necessary to launch.

    Question: what tech for the real time part?

    ReplyDelete
  2. Thanks Thomas, And what do you recommend as scripting language, is RoR the best choice according to u?

    ReplyDelete
  3. Damn, I hit F5 and this deleted my comment.

    I was saying (in more depth) that Ruby is a language ans RoR is a framework. Both are great.

    For the server, you can use Nginx (github has this kind of setup https://github.com/blog/517-unicorn)

    But, the real time part is very important. You'll need to have something that can handle a bunch of active connections without killing your server (something evented maybe)

    ReplyDelete
  4. MongoDB my friend! Foursquare style. http://engineering.foursquare.com/2011/02/09/mongodb-strategies-for-the-disk-averse/

    ReplyDelete
  5. De mon point de vue, il faut se concentrer, à fonctionnalités équivalentes, sur la techno la plus répandue car elle t'offrira :
    - une plus grande facilité pour trouver des dev. de qualité
    - une communauté beaucoup plus développée
    - des belles class toutes prêtes (inutile de ré-inventer la roue)

    Au niveau BDD, Cassandra c'est bien joli mais pourquoi Facebook ne l'utilise pas d'avantage ? Pourquoi Twitter a stoppé sa migration ?
    J'ai l'impression que le NoSQL movement est devenu un peu overhype quand même. Bref, j'ai pas encore eu l'occasion de tester donc c'est juste mon avis.

    Ton architecture me parait tenir la route. J'envisagerai quand même plutôt du PostgreSQL au lieu de MySQL (gestion de la geoloc, index partiels, etc)

    Je rejoins THomas sur le côté real time, ne pas sous-estimer cet aspect.

    ReplyDelete
  6. Guys, what do you think of node.js + mongoDB? Looks like it solves all my problems.

    ReplyDelete
  7. FWIW, we moved a large portion of our data from a sql+memcache setup to mongodb. That's certainly not the right solution for every problem, but for us, the less moving par... (more)
    FWIW, we moved a large portion of our data from a sql+memcache setup to mongodb. That's certainly not the right solution for every problem, but for us, the less moving parts the better. mongo uses the operating system's file cache to keep recently accessed data in memory. it doesn't fsync by default which allows for a very fast right through cache like behavior. also, it's not a key-value store. it's schemaless, but the data is structured and you can create multiple indexes on each collection (analogous to a table).

    Source: quora - Jonathan Hoffman, mongodb wrangler at foursquare

    ReplyDelete
  8. Bonjour Sam,

    Est-ce que aujourd'hui vous recommanderiez toujours cette architecture pour démarrer un projet "the next big thing" ?

    Merci pour votre retour.

    Cordialement,
    Mathieu.

    ReplyDelete
    Replies
    1. Bonjour Mathieu,
      2 ans après ce post, j'ai appris qu'il faut laisser le choix de l'architecture à son CTO ou son lead architecte. Depuis le lancement, nous avons changé 3 fois de techno pour enfin arriver à quelque chose de génial. En plus de notre équipe en interne, nous avons fait appel à des spécialistes externes pour confirmer (ou améliorer) les choix que nous avons fait. Je pense que c'est la bonne manière de procéder.
      En revanche, j'ai appris à ne plus écouter les avis des gens qui, sur la base d'une conversation de 5 minutes, recommandent telle ou telle techno. Il faut vraiment prendre en compte tout l'environnement et les problématiques. Chaque projet est différent.

      Delete