Yandex Architecture

Todd Hoff's picture

Update: Anatomy of a crash in a new part of Yandex written in Django. Writing to a magic session variable caused an unexpected write into an InnoDB database on every request. Writes took 6-7 seconds because of index rebuilding. Lots of useful details on the sizing of their system, what went wrong, and how they fixed it.

Yandex is a Russian search engine with 3.5 billion pages in their search index. We only know a few fun facts about how they do things, nothing at a detailed architecture level. Hopefully we'll learn more later, but I thought it would still be interesting. From Allen Stern's interview with Yandex's CTO Ilya Segalovich, we learn:

  • 3.5 billion pages in the search index.
  • Over several thousand servers.
  • 35 million searches a day.
  • Several data centers around Russia.
  • Two-layer architecture.
  • The database is split in pieces and when a search is requested, it pulls the bits from the different database servers and brings it together for the user.
  • Languages used: c++, perl, some java.
  • FreeBSD is used as their server OS.
  • $72 million in revenue in 2006.

  • Comments

    Re: Yandex Architecture

    Yandex is more than a search engine. It's a portal. I have a friend with a yandex.ru email address.

    Re: Yandex Architecture

    Wow.

    I never heard of Yandex but they obviously do some serious business over there. it's always cool to see successful things that make it out of russia / eastern europe: The top devs over there are actually really excellent: very bright and ultra-hardcore.

    Re: Yandex Architecture

    Yes, Yandex isn't just a search engine, it also provides fairly long list of services, e-mail included.
    But from my point of view, their services are not really provided on the competitive level of quality, especially comparing with Google services: sometimes i think that Yandex search engine chooses results not by relevancy, but according to random numbers generator, mailboxes are constantly flooded by spam and so on.
    As for me, I prefer using international service providers even considering a fact that I live in Russia, but still Yandex remains the largest russian provider of internet-services and the really huge amount of russian people uses it just because of several strange reasons, such as for example they just "used to perform search in Yandex", or maybe that's too time-consuming to type google.com instead of ya.ru...

    Re: Yandex Architecture

    Google does not search well in russian as russian has many forms of words. So too make good search in russian you should know russian very well. Right now google searches in russian as if it was english.

    + May be Russia supports its search engines to control russian part of internet.

    There are russian search engines yandex and ramber, russian money transfer webmoney and yander money, there is russian social networks and of course rutube.

    There is also russian domains that you can't type in your browser as you do not have russian keyboard.

    Re: Yandex Architecture

    Re: Yandex Architecture

    Google does make stemming for cyrillic languages like Russian or Bulgarian (my native tongue). Perhaps Yandex does it better.

    Re: Yandex Architecture

    Damn, this captcha is case sensitive! Not a good choice, really!

    Re: Yandex Architecture

    Perhaps I'm picking a nit, but there are a couple of things off in the summary of the 'Anatomy of a crash' article.

    They weren't writing to any magic variable. They were modifying the session data on every request (unnecessarily), which caused it to be saved to the DB. Nothing magic about that.

    Lengthy index rebuilds were caused by the use of non-sequential primary keys (MD5 hashes) -- with InnoDB, that meant it would rebuild the whole thing on every request... thus poor performance.

    Todd Hoff's picture

    Re: Yandex Architecture

    To me it's magic because the consequences are hard to deduce from the code. And the choice of key values having such a tremendous negative impact is another bit of wild magic. Would one expect such an effect by looking at that line? Not me, which is why magic popped to mind.

    Re: Yandex Architecture

    Great article, keep up the good work.
    ----------------------------------------------------
    برامج نت|free software|افضل المواقع العربية|منتدى
    برامج نت
    |العاب فلاش - العاب بنات|برامج|دليل
    المواقع
    |عيادة طب |
    الأرشيف|برامج مشروحة|برامج ترجمة|برامج الفاكس|برامج طباعة|برامج تحرير|برامج التقاط الصور والشاشات|برامج سطح المكتب|برامج البريد الالكتروني|برامج خدمات البريد الاكتروني|برامج القوائم البريدية|برامج ادوات البريد الاكتروني|برامج مكافحة الرسائل المزعجة|برامج الإنترنت|برامج مشاهده القنوات الفضائيه|برامج تسربع الانترنت|برامج تحميل الملفات والصور|برامج المحادثة|برامج ماسنجر|اتصال دولي - الرسائل القصيره|إدوات خدمية وتعاريف قطع جهاز|برامج نسخ الأحتياطي|ادارة الملفات|تقارير الاداء|ضغط وفك ضغط الملفات|الصيانة والمعالجة|ادارة النظام|برامج تحرير الذاكره|الحفظ الاحتياطي|برامج الاداره والتحكم|برامج شبكات|برامج الحماية|برامج مكافحة الفايروسات|مكافحة ملفات التجسس|برامج صد الهاكرز والمخترقين|برامج تشفير ملفات|اخر تحديثات|برامج الرسوم والتصاميم|برامج الفلاش|برامج تحرير الصور|برامج استعراض الصور|برامج ادوات الصور الرقمية|برامج تحويل صور|برامج التقاط الصور|اضافات الصور|برامج جوالات سوني أريكسون

    Re: Yandex Architecture

    Google does make stemming for cyrillic languages like Russian or Bulgarian (my native tongue). Perhaps Yandex does it better.

    Comment viewing options

    Select your preferred way to display the comments and click "Save settings" to activate your changes.