advertise
Monday
May102010

Sify.com Architecture - A Portal at 3900 Requests Per Second

Sify.com is one of the leading portals in India. Samachar.com is owned by the same company and is one of the top content aggregation sites in India, primarily targeting Non-resident Indians from around the world. Ramki Subramanian, an Architect at Sify, has been generous enough to describe the common back-end for both these sites. One of the most notable aspects of their architecture is that Sify does not use a traditional database. They query Solr and then retrieve records from a distributed file system. Over the years many people have argued for file systems over databases. Filesystems can work for key-value lookups, but they don't work for queries, using Solr is a good way around that problem. Another interesting aspect of their system is the use of Drools for intelligent cache invalidation. As we have more and more data duplicated in multiple specialized services, the problem of how to keep them synchronized is a difficult one. A rules engine is a clever approach.

Click to read more ...

Thursday
May062010

Going global on EC2

Since its inception, Amazon EC2 has enabled companies to run highly scalable infrastructure with minimal overhead.  Over the years, Amazon Web Services has expanded with new offerings and additional regions around the world.

All this growth has made establishing a global footprint easier than ever.  And yet, most EC2 customers still choose to operate in a single region.  While this is fine for many applications, customers with significant web infrastructure are depriving users of drastically improved performance.  Deploying infrastructure in EC2's new regions cuts out one of the biggest sources of latency: distance.

In this post, I describe how Bizo significantly reduced load times by implementing Global Server Load Balancing (GSLB) to distribute traffic across all Amazon regions.

Wednesday
May052010

How will memristors change everything? 

A non-random sample of my tech friends shows that not many have heard of memristors (though I do suspect vote tampering). I'd read a little about memristors in 2008 when the initial hubbub about the existence of memristors was raised. I, however,  immediately filed them into that comforting conceptual bucket of potentially revolutionary technologies I didn't have to worry about because like most wondertech, nothing would ever come of it. Wrong. After watching Finding the Missing Memristor by R. Stanley Williams I've had to change my mind. Memristors have gone from "maybe never" to holy cow this could happen soon and it could change everything.

Let's assume for the sake of dreaming memristors do prove out. How will we design systems when we have access to a new material that is two orders of magnitude more efficient from a power perspective than traditional transistor technologies, contains multiple petabits (1 petabit = 128TB) of persistent storage, and can be reconfigured to be either memory or CPU in a package as small as a sugar cube (in a stacked configuration)?

Click to read more ...

Tuesday
May042010

Business continuity with real-time data integration

Enterprises want to protect their data. As the appetite for data volumes grows, storage technology becomes a critical business asset on which business continuity relies. My recent survey in the medium-size enterprise segment shows the five dominant investment directions at the level of data management architecture: disaster recovery (DR), high availability (HA), backup, data processing performance and migration to more advanced databases.

 

This suggests that corporations generally have sufficiently structured data collections but are concerned with business continuity and continuous availability of data. What infrastructures can provide these assurances? In this post I want to focus on yet another option, and that is the Real-Time Data Integration model. As an example I am going to discuss Oracle GoldenGate, which permits you to manage the data critical to your business in safety, ensuring business continuity without disruption even if the data is distributed among multiple, heterogeneous business applications and architectures.

 

Read more on BigDataMatters.com

Monday
May032010

MocoSpace Architecture - 3 Billion Mobile Page Views a Month

This is a guest post by Jamie Hall, Co-founder & CTO of MocoSpace, describing the architecture for their mobile social network. This is a timely architecture to learn from as it combines several hot trends: it is very large, mobile, and social. What they think is especially cool about their system is: how it optimizes for device/browser fragmentation on the mobile Web; their multi-tiered, read/write, local/distributed caching system; selecting PostgreSQL over MySQL as a relational DB that can scale.

MocoSpace is a mobile social network, with 12 million members and 3 billion page views a month, which makes it one of the most highly trafficked mobile Websites in the US. Members access the site mainly from their mobile phone Web browser, ranging from high end smartphones to lower end devices, as well as the Web. Activities on the site include customizing profiles, chat, instant messaging, music, sharing photos & videos, games, eCards and blogs. The monetization strategy is focused on advertising, on both the mobile and Websites, as well as a virtual currency system and a handful of premium feature upgrades.

Stats

Click to read more ...

Monday
May032010

100 Node Hazelcast cluster on Amazon EC2

Deploying, running and monitoring application on a big cluster is a challenging task. Recently Hazelcast team deployed a demo application on Amazon EC2 platform to show how Hazelcast p2p cluster scales and screen recorded the entire process from deployment to monitoring.

Hazelcast is open source (Apache License), transactional, distributed caching solution for Java. It is a little more than a cache though as it provides distributed implementation of map, multimap, queue, topic, lock and executor service. 

Details of running 100 node Hazelcast cluster on Amazon EC2 can be found here. Make sure to watch the screencast!

Friday
Apr302010

Hot Scalability Links for April 30, 2010

  • I Want a New Data Store. Jeremy Zawodny of Craigslist wants a new database, one that can do what it should: perform alter table operations faster, has efficient queries when most of the data is on disk and not in RAM, and matches their data that now looks more document oriented than relational. A lot of people willing to help.
  • Computer Science Unplugged. An extensive collection of free resources that teach principles of Computer Science such as binary numbers, algorithms and data compression through engaging games and puzzles that use cards, string, crayons and lots of running around. And it's free! Fascinating Interview with Tim Bell on teaching complex computing concepts, creating makers not just users, and how to change schools. From O'Reilly Radar
  • Akamai’s Network Now Pushes Terabits of Data Every Second. Akamai handles 12 million requests per second, logs more than 500 billion requests for content per day, and sends 3.45 terabits per second of data.

Click to read more ...

Friday
Apr302010

Behind the scenes of an online marketplace

In a presentation originally held at the 4. O2 Hosting Event in Hamburg, I spoke about the technology at a large online marketplace in Germany called Hitmeister.

 Some of the topics discussed include:

  • what makes up a marketplace? technically
  • system principles
  • development patterns
  • tools philosophy
  • data model
  • hardware

I am looking forward to comments and suggestions for both the presentation and our work.

Thursday
Apr292010

Product: SciDB - A Science-Oriented DBMS at 100 Petabytes

Scientists are doing it for themselves. Doing what? Databases. The idea is that most databases are designed to meet the needs of businesses, not science, so scientists are banding together at scidb.org to create their own Domain Specific Database, for science. The goal is to be able to handle datasets in the 100PB range and larger.

SciDB, Inc. is building an open source database technology product designed specifically to satisfy the demands of data-intensive scientific problems. With the advice of the world's leading scientists across a variety of disciplines including astronomy, biology, physics, oceanography, atmospheric sciences, and climatology, our computer scientists are currently designing and prototyping this technology

The scientists that are participating in our open source project believe that the SciDB database — when completed — will dramatically impact their ability to conduct their experiments faster and more efficiently and further improve the quality of life on our planet by enabling them to run experiments that were previously impossible due to the limitations of existing database systems and infrastructure. Many of the world's leading computer scientists with expertise in database systems have contributed to the design and architecture of the system to meet the needs of the world's scientists.

SciDB looks like a cool project and follows what might be considered a trend, instead of beating a general tool into submission, build a specialized tool that does what you need it to do. More details about SciDB can be found in the paper A Demonstration of SciDB: A Science-Oriented DBMS. A nice succinct poster is available summarizing the product.

Some interesting bits from the paper:

Click to read more ...

Wednesday
Apr282010

Elasticity for the Enterprise -- Ensuring Continuous High Availability in a Disaster Failure Scenario

Many enterprises' high-availability architecture is based on the assumption that you can prevent failure from happening by putting all your critical data in a centralized database, back it up with expensive storage, and replicate it somehow between the sites. As I argued in one of my previous posts (Why Existing Databases (RAC) are So Breakable!) many of those assumptions are broken at their core, as storage is doomed to failure just like any other device, expensive hardware doesn’t make things any better and database replication is often not enough.

Click to read more ...