advertise
« Any Suggestions for the Architecture Template? | Main | Kevin's Great Adventures in SSDland »
Sunday
Feb242008

Yandex Architecture

Update: Anatomy of a crash in a new part of Yandex written in Django. Writing to a magic session variable caused an unexpected write into an InnoDB database on every request. Writes took 6-7 seconds because of index rebuilding. Lots of useful details on the sizing of their system, what went wrong, and how they fixed it.

Yandex is a Russian search engine with 3.5 billion pages in their search index. We only know a few fun facts about how they do things, nothing at a detailed architecture level. Hopefully we'll learn more later, but I thought it would still be interesting. From Allen Stern's interview with Yandex's CTO Ilya Segalovich, we learn:

  • 3.5 billion pages in the search index.
  • Over several thousand servers.
  • 35 million searches a day.
  • Several data centers around Russia.
  • Two-layer architecture.
  • The database is split in pieces and when a search is requested, it pulls the bits from the different database servers and brings it together for the user.
  • Languages used: c++, perl, some java.
  • FreeBSD is used as their server OS.
  • $72 million in revenue in 2006.
  • Reader Comments (13)

    Yandex is more than a search engine. It's a portal. I have a friend with a yandex.ru email address.

    November 29, 1990 | Unregistered CommenterAnonymous

    Wow.

    I never heard of Yandex but they obviously do some serious business over there. it's always cool to see successful things that make it out of russia / eastern europe: The top devs over there are actually really excellent: very bright and ultra-hardcore.

    November 29, 1990 | Unregistered CommenterAnonymous

    Yes, Yandex isn't just a search engine, it also provides fairly long list of services, e-mail included.
    But from my point of view, their services are not really provided on the competitive level of quality, especially comparing with Google services: sometimes i think that Yandex search engine chooses results not by relevancy, but according to random numbers generator, mailboxes are constantly flooded by spam and so on.
    As for me, I prefer using international service providers even considering a fact that I live in Russia, but still Yandex remains the largest russian provider of internet-services and the really huge amount of russian people uses it just because of several strange reasons, such as for example they just "used to perform search in Yandex", or maybe that's too time-consuming to type google.com instead of ya.ru...

    November 29, 1990 | Unregistered CommenterInsight IT

    Google does not search well in russian as russian has many forms of words. So too make good search in russian you should know russian very well. Right now google searches in russian as if it was english.

    + May be Russia supports its search engines to control russian part of internet.

    There are russian search engines yandex and ramber, russian money transfer webmoney and yander money, there is russian social networks and of course rutube.

    There is also russian domains that you can't type in your browser as you do not have russian keyboard.

    November 29, 1990 | Unregistered CommenterAnonymous

    Google does make stemming for cyrillic languages like Russian or Bulgarian (my native tongue). Perhaps Yandex does it better.

    November 29, 1990 | Unregistered CommenterAnonymous

    Damn, this captcha is case sensitive! Not a good choice, really!

    November 29, 1990 | Unregistered CommenterAnonymous

    Perhaps I'm picking a nit, but there are a couple of things off in the summary of the 'Anatomy of a crash' article.

    They weren't writing to any magic variable. They were modifying the session data on every request (unnecessarily), which caused it to be saved to the DB. Nothing magic about that.

    Lengthy index rebuilds were caused by the use of non-sequential primary keys (MD5 hashes) -- with InnoDB, that meant it would rebuild the whole thing on every request... thus poor performance.

    November 29, 1990 | Unregistered CommenterAnonymous

    To me it's magic because the consequences are hard to deduce from the code. And the choice of key values having such a tremendous negative impact is another bit of wild magic. Would one expect such an effect by looking at that line? Not me, which is why magic popped to mind.

    November 29, 1990 | Unregistered CommenterTodd Hoff

    Google does make stemming for cyrillic languages like Russian or Bulgarian (my native tongue). Perhaps Yandex does it better.

    November 29, 1990 | Unregistered Commenteryoutube

    I always use Yandex when search in Russian due to the single reason - Yandex speaks Russian and Google does not.

    However Google gets all of my English searches, cause Yandex just does not do a good job in "English Web". But hey - they have a big brother to learn from, so maybe one day... ;)

    November 29, 1990 | Unregistered Commenterdotkam

    yandex looks really good and im sure it will come in usefull for me in the future
    thanks

    November 29, 1990 | Unregistered Commenterplay wink bingo

    Pretty good summary about the Yandex. To be honest, i never knew such website exited before today.

    November 29, 1990 | Unregistered Commenterfree online dating

    PostPost a New Comment

    Enter your information below to add a new comment.
    Author Email (optional):
    Author URL (optional):
    Post:
     
    Some HTML allowed: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <code> <em> <i> <strike> <strong>