Google Talk is Google's instant communications service. Interestingly the IM messages aren't the major architectural challenge, handling user presence indications dominate the design. They also have the challenge of handling small low latency messages and integrating with many other systems. How do they do it?
Site: http://www.google.com/talk
Information Sources
GoogleTalk Architecture
Platform
Linux
Java
Google Stack
Shard
What's Inside?
The Stats
Support presence and messages for millions of users.
Handles billions of packets per day in under 100ms.
IM is different than many other applications because the requests are small packets.
Routing and application logic are applied per packet for sender and receiver.
Messages must be delivered in-order.
Architecture extends to new clients and Google services.
Lessons Learned
Measure the right thing.
- People ask about how many IMs do you deliver or how many active users. Turns out not to be the right engineering question.
- Hard part of IM is how to show correct present to all connected users because growth is non-linear: ConnectedUsers * BuddyListSize * OnlineStateChanges
- A linear user grown can mean a very non-linear server growth which requires serving many billions of presence packets per day.
- Have a large number friends and presence explodes. The number IMs not that
big of deal.
Real Life Load Tests
- Lab tests are good, but don't tell you enough.
- Did a backend launch before the real product launch.
- Simulate presence requests and going on-line and off-line for weeks
and months, even if real data is not returned. It works out many of the
kinks in network, failover, etc.
Dynamic Resharding
- Divide user data or load across shards.
- Google Talk backend servers handle traffic for a subset of users.
- Make it easy to change the number of shards with zero downtime.
- Don't shard across data centers. Try and keep users local.
- Servers can bring down servers and backups take over. Then you can bring up new servers and data migrated automatically and clients auto detect and go to new servers.
Add Abstractions to Hide System Complexity
- Different systems should have little knowledge of each other, especially when separate groups are working together.
- Gmail and Orkut don't know about sharding, load-balancing, or fail-over, data center architecture, or number of servers. Can change at anytime without cascading changes throughout the system.
- Abstract these complexities into a set of gateways that are discovered at runtime.
- RPC infrastructure should handle rerouting.
Understand Semantics of Lower Level Libraries
- Everything is abstracted, but you must still have enough knowledge of how they work to architect your system.
- Does your RPC create TCP connections to all or some of your servers? Very different implications.
- Does the library performance health checking? This is architectural implications as you can have separate system failing independently.
- Which kernel operation should you use? IM requires a lot connections but few have any activity. Use epoll vs poll/select.
Protect Again Operation Problems
- Smooth out all spoke in server activity graphs.
- What happens when servers restart with an empty cache?
- What happens if traffic shifts to a new data center?
- Limit cascading problems. Back of from busy servers. Don't accept work when sick.
- Isolate in emergencies. Don't infect others with your problems.
- Have intelligent retry logic policies abstracted away. Don't sit in hard 1msec retry loops, for example.
Any Scalable System is a Distributed System
- Add fault tolerance to every component of the system. Everything fails.
- Add ability to profile live servers without impacting server. Allows continual improvement.
- Collect metrics from server for monitoring. Log everything about your system so you see patterns in cause and effects.
- Log end-to-end so you can reconstruct an entire operation from beginning to end across all machines.
Software Development Strategies
- Make sure binaries are both backward and forward compatible so you can have old clients work with new code.
- Build an experimentation framework to try new features.
- Give engineers access to product machines. Gives end-to-end ownership. This is very different than many companies who have completely separate OP teams in their data centers. Often developers can't touch production machines.
Comments
Re: GoogleTalk Architecture
hi there,
I knew someone from the yahoo IM team and there also, the developer gets to maintain the production servers.
google : does the developer get to maintain production servers for both search, gmail and other products as well ?
BR,
~A
Re: GoogleTalk Architecture
I've never bored to study the IM protocols, but I've always thought that presence-packets are so small in size (maybe few bytes) that they totally generate less traffic than the exact messages... Now I start thinking that I was wrong, going to take a look at that video in informational sources.
P.S.: I'm wondering if I would ever stop constantly finding interesting posts on this website... It's really great!
Re: GoogleTalk Architecture
Great article, keep up the good work.
----------------------------------------------------
برامج نت|free software|افضل المواقع العربية|منتدى
برامج نت|العاب فلاش - العاب بنات|برامج|دليل
المواقع|عيادة طب |
الأرشيف|برامج مشروحة|برامج ترجمة|برامج الفاكس|برامج طباعة|برامج تحرير|برامج التقاط الصور والشاشات|برامج سطح المكتب|برامج البريد الالكتروني|برامج خدمات البريد الاكتروني|برامج القوائم البريدية|برامج ادوات البريد الاكتروني|برامج مكافحة الرسائل المزعجة|برامج الإنترنت|برامج مشاهده القنوات الفضائيه|برامج تسربع الانترنت|برامج تحميل الملفات والصور|برامج المحادثة|برامج ماسنجر|اتصال دولي - الرسائل القصيره|إدوات خدمية وتعاريف قطع جهاز|برامج نسخ الأحتياطي|ادارة الملفات|تقارير الاداء|ضغط وفك ضغط الملفات|الصيانة والمعالجة|ادارة النظام|برامج تحرير الذاكره|الحفظ الاحتياطي|برامج الاداره والتحكم|برامج شبكات|برامج الحماية|برامج مكافحة الفايروسات|مكافحة ملفات التجسس|برامج صد الهاكرز والمخترقين|برامج تشفير ملفات|اخر تحديثات|برامج الرسوم والتصاميم|برامج الفلاش|برامج تحرير الصور|برامج استعراض الصور|برامج ادوات الصور الرقمية|برامج تحويل صور|برامج التقاط الصور|اضافات الصور|برامج جوالات سوني أريكسون