advertise
Sunday
Sep202009

PaxosLease: Diskless Paxos for Leases

PaxosLease is a distributed algorithm for lease negotiation. It is based on Paxos, but does not require disk writes or clock synchrony. PaxosLease is used for master lease negotation in the open-source Keyspace replicated key-value store.

Saturday
Sep192009

Space Based Programming in .NET

Space-based architectures are an alternative to the traditional n-tier model for enterprise applications. Instead of a vertical tier partitioning, space based applications are partitioned horizontally into self-sufficient units. This leads to almost linear scalability of stateful, high-performance applications.

This is a recording of a talk I did last month where I introduce space based programming and demonstrate how that works in practice on the .NET platform using Oracle Coherence and GigaSpaces.

Thursday
Sep172009

Infinispan narrows the gap between open source and commercial data caches 

Recently I attended a lecture presented by Manik Surtani , JBoss Cache & Infinispan project lead. The goal of the talk was to provide a technical overview of both products and outline Infinispan's road-map. Infinispan is the successor to the open-source JBoss Cache. JBoss Cache was originally targeted at simple web page caching and Infinispan builds on this to take it into the Cloud paradigm.

Why did I attend? Well, over the past few years I have worked on projects that have used commercial distributed caching (aka data grid) technologies such as GemFire, GigaSpaces XAP or Oracle Coherence . These projects required more functionality than is currently provided by open-source solutions such as memcached or EHCache. Looking at the road-map for Infinispan, I was struck by its ambition – will it provide the functionality that I need?

Read more at: http://bigdatamatters.com/bigdatamatters/2009/09/infinispan-vs-gigaspaces.html

Thursday
Sep172009

Hot Links for 2009-9-17 

  • Save 25% on Hadoop Conference Tickets
    Apache Hadoop is a hot technology getting traction all over the enterprise and in the Web 2.0 world. Now, there's going to be a conference dedicated to learning more about Hadoop. It'll be Friday, October 2 at the Roosevelt Hotel in New York City.

    Hadoop World, as it's being called, will be the first Hadoop event on the east coast. Morning sessions feature talks by Amazon, Cloudera, Facebook, IBM, and Yahoo! Then it breaks out into three tracks: applications, development / administration, and extensions / ecosystems. In addition to the conference itself, there will also be 3 days of training prior to the event for those looking to go deeper. In addition to general sessions speakers, presenters include Hadoop project creator Doug Cutting, as well as experts on large-scale data from Intel, Rackspace, Softplayer, eHarmony, Supermicro, Impetus, Booz Allen Hamilton, Vertica, About.com, and other companies.

    Readers get a 25% discount if you register by Sept. 21: http://hadoop-world-nyc.eventbrite.com/?discount=hadoopworld_promotion_highscalability.

  • Essential storage tradeoff: Simple Reads vs. Simple Writes by Stephan Schmidt. Data in denormalized chunks is easy to read and complex to write.
  • Kickfire's approach to parallelism by DANIEL ABADI. Kickfire uses column-oriented storage and execution to address I/O bottlenecks and FPGA-based data-flow architecture to address processing and memory bottlenecks.
  • "Just in Time" Decompression in Analytic Databases by Michael Stonebraker. A DBMS that is optimized for compression through and through--especially with a query executor that features just in time decompression will not just reduce IO and storage overhead, but also offer better query performance with lower CPU resource utilization.
  • Reverse Proxy Performance – Varnish vs. Squid (Part 2) by Bryan Migliorisi. My results show that in raw cache hit performance, Varnish puts Squid to shame.
  • Building Scalable Databases: Denormalization, the NoSQL Movement and Digg by Dare Obasanjo. As a Web developer it's always a good idea to know what the current practices are in the industry even if they seem a bit too crazy to adopt…yet.
  • How To Make Life Suck Less (While Making Scalable Systems) by Bradford Stephens. Scalable doesn’t imply cheap or easy. Just cheaper and easier.
  • Some perspective to this DIY storage server mentioned at Storagemojo by by Joerg Moellenkamp. It's about making decision. Application and hardware has to be seen as one. When your application is capable to overcome the limitations and problems of such ultra-cheap storage
  • Wednesday
    Sep162009

    The VeriScale Architecture - Elasticity and efficiency for private clouds

    The modern datacenter is evolving into the network centric datacenter model, which is applied to both public and private cloud computing. In this model, networking, platform, storage, and software infrastructure are provided as services that scale up or down on demand. The network centric model allows the datacenter to be viewed as a collection of automatically deployed and managed application services that utilize underlying virtualized services. Providing sufficient elasticity and scalability for the rapidly growing needs of the datacenter requires these collections of automatically-managed services to scale efficiently and with essentially no limits, letting services adapt easily to changing requirements and workloads. Sun’s VeriScale architecture provides the architectural platform that can deliver these capabilities. Sun Microsystems has been developing open and modular infrastructure architectures for more than a decade. The features of these architectures, such as elasticity, are seen in current private and public cloud computing architectures, while the non-functional requirements, such as high availability and security, have always been a high priority for Sun. The VeriScale architecture leverages experience and knowledge from many Sun customer engagements and provides an excellent foundation for cloud computing. The VeriScale architecture can be implemented as an overlay, creating a virtual infrastructure on a public cloud or it can be used to implement a private cloud.

    Read more at: http://wikis.sun.com/display/BluePrints/The+VeriScale+Architecture+-+Elasticity+and+Efficiency+for+Private+Clouds

    Wednesday
    Sep162009

    Paper: A practical scalable distributed B-tree

    We've seen a lot of NoSQL action lately built around distributed hash tables. Btrees are getting jealous. Btrees, once the king of the database world, want their throne back. Paul Buchheit surfaced a paper: A practical scalable distributed B-tree by Marcos K. Aguilera and Wojciech Golab, that might help spark a revolution.

    From the Abstract:

    We propose a new algorithm for a practical, fault tolerant, and scalable B-tree distributed over a set of servers. Our algorithm supports practical features not present in prior work: transactions that allow atomic execution of multiple operations over multiple B-trees, online migration of B-tree nodes between servers, and dynamic addition and removal of servers. Moreover, our algorithm is conceptually simple: we use transactions to manipulate B-tree nodes so that clients need not use complicated concurrency and locking protocols used in prior work. To execute these transactions quickly, we rely on three techniques: (1) We use optimistic concurrency control, so that B-tree nodes are not locked during transaction execution, only during commit. This well-known technique works well because B-trees have little contention on update. (2) We replicate inner nodes at clients. These replicas are lazy, and hence lightweight, and they are very helpful to reduce client-server communication while traversing the B-tree. (3)We replicate version numbers of inner nodes across servers, so that clients can validate their
    transactions efficiently, without creating bottlenecks at the root node and other upper levels in the tree.

    Distributed hash tables are scalable because records area easily distributed across a cluster which gives the golden ability to perform many writes in parallel. The problem is keyed access is very limited.

    A lot of the time you want to iterate through records or search records in a sorted order. Sorted could mean time stamp order, for example, or last name order as another example.

    Access to data in sorted order is what btrees are for. But we simply haven't seen distributed btree systems develop. Instead, you would have to use some sort of map-reduce mechanism to efficiently scan all the records or you would have to maintain the information in some other way.

    This paper points the way to do some really cool things at a system level:

  • It's distributed so it can scale dynamically in size and handle writes in parallel.
  • It supports adding and dropping servers dynamically, which is an essential requirement for architectures based on elastic cloud infrastructures.
  • Data can be migrated to other nodes, which is essential for maintenance.
  • Multiple records can be involved in transactions which is essential for the complex data manipulations that happen in real systems. This is accomplished via a version number mechanism that looks something like MVCC.
  • Optimistic concurrency, that is, the ability to change data without explicit locking, makes the job for programmers a lot easier.

    These are the kind of features needed for systems in the field. Hopefully we'll start seeing more systems offering richer access structures while still maintaining scalability.
  • Sunday
    Sep132009

    How is Berkely DB fare against other Key-Value Database

    I want to know how is Berkeley DB compared against other key-value solution. I read it from Net that Google uses it for their Enterprise Sign-on feature. Is anyone has any experience using Berkeley DB. Backward compatibility is poor in Berkley DB but that is fine for me. How easy to scale using Berkeley DB.

    Saturday
    Sep122009

    How Google Taught Me to Cache and Cash-In

    A user named Apathy on how Reddit scales some of their features, shares some advice he learned while working at Google and other major companies.

    To be fair, I [Apathy] was working at Google at the time, and every job I held between 1995 and 2005 involved at least one of the largest websites on the planet. I didn't come up with any of these ideas, just watched other smart people I worked with who knew what they were doing and found (or wrote) tools that did the same things. But the theme is always the same:

    1. Cache everything you can and store the rest in some sort of database (not necessarily relational and not necessarily centralized).
    2. Cache everything that doesn't change rapidly. Most of the time you don't have to hit the database for anything other than checking whether the users' new message count has transitioned from 0 to (1 or more).
    3. Cache everything--templates, user message status, the front page components--and hit the database once a minute or so to update the front page, forums, etc. This was sufficient to handle a site with a million hits a day on a couple of servers. The site was sold for $100K.
    4. Cache the users' subreddits. Blow out the cache on update.
    5. Cache the top links per subreddit. Blow out cache on update.
    6. Combine the previous two steps to generate a menu from cached blocks.
    7. Cache the last links. Blow out the cache on each outlink click.
    8. Cache the user's friends. Append 3 characters to their name.
    9. Cache the user's karma. Blow out on up/down vote.
    10. Filter via conditional formatting, CSS, and an ajax update.
    11. Decouple selection/ranking algorithm(s) from display.
    12. Use Google or something like Xapian or Lucene for search.
    13. Cache "for as long as memcached will stay up." That depends on how many instances you're running, what else is running, how stable the Python memcached hooks are, etc.
    14. The golden rule of website engineering is that you don't try to enforce partial ordering simultaneously with your updates.
    15. When running a search engine operate the crawler separately from the indexer.
    16. Ranking scores are used as necessary from the index, usually cached for popular queries.
    17. Re-rank popular subreddits or the front page once a minute. Tabulate votes and pump them through the ranker.
    18. Cache the top 100 per subreddit. Then cache numbers 100-200 when someone bothers to visit the 5th page of a subreddit, etc.
    19. For less-popular subreddits, you cache the results until an update comes in.
    20. With enough horsepower and common sense, almost any volume of data can be managed, just not in realtime.
    21. Never ever mix your reads and writes if you can help it.
    22. Merge all the normalized rankings and cache the output every minute or so. This avoids thousands of queries per second just for personalization.
    23. It's a lot cheaper to merge cached lists than build them from scratch. This delays the crushing read/write bottleneck at the database. But you have to write the code.
    24. Layering caches is a clasisc strategy for milking your servers as much as possilbe. First look for an exact match. If that's not found, look for the components and build an exact match.
    25. The majority of traffic on almost all websites comes from the default, un-logged-in front page or from random forum/comment/result pages. Make sure those are cached as much as possible.. If one or more of the components aren't found, regenerate those from the DB (now it's cached!) and proceed. Never hit the database unless you have to.
    26. You (almost) always have to hit the database on writes. The key is to avoid hitting it for reads until you're forced to do so.
    Friday
    Sep112009

    The interactive cloud

    How many times have you been called in the middle of the night by your operation guys telling you that your application throws some odd red alerts? How many times did you found out that when those issues happens you don't have enough information to analyze this incident? have you tried to increase the log level just to find out that your problem became even worse - now your application throws tons of information in a continues basis most of which is complete garbage...

    The current separation between the way we implement our application and the way we manage it leads to many of this ridicules situations. Cloud makes those things even worse.

    In this post i suggest an alternative approach. Why don't we run our application the way we run our business? I refer to this approach as the "interactive cloud" where our application behaves just like our project team and the operations just like our managers. As with our business our application would need to take more responsibility to the way it runs and take corrective actions such as balancing it own resources, re-assign tasks to the available resources in case of failure etc. It will need to involve its manager only when it runs out of resource. It will need to provide reports in a way that makes sense to our managers.

    In the first part of this post describes the general concept behind this model and the second part provides technical background which include code snippet based on our experience in GigaSpaces.

    Thursday
    Sep102009

    When optimizing - don't forget the Java Virtual Machine (JVM) 

    Recently, I was working on a project that was coming to a close. It was related to optimizing a database using a Java based in-memory cache to reduce the load. The application had to process up to a million objects per day and was characterized by its heavy use of memory and the high number of read, write and update operations. These operations were found to be the most costly, which meant that optimization efforts were concentrated here.

    The project had already achieved impressive performance increases, but one question remained unanswered - would changing the JVM increase performance?


    Read more at: http://bigdatamatters.com/bigdatamatters/2009/08/jvm-performance.html