Grid

Todd Hoff's picture

Is Eucalyptus ready to be your private cloud?


Rich Wolski, professor of Computer Science at the University of California, Santa Barbara, gave a spirited talk on Eucalyptus to a large group of very interested cloudsters at the Eucalyptus Cloud Meetup. If Rich could teach computer science at every school the state of the computer science industry would be stratospheric. Rich is dynamic, smart, passionate, and visionary. It's that vision that prompted him to create Eucalyptus in the first place. Rich and his group are experts in grid and distributed computing, having a long and glorious history in that space. When he saw cloud computing on the rise he decided the best way to explore it was to implement what everyone accepted as a real cloud, Amazon's API. In a remarkably short time they implement Eucalyptus and have been improving it and tracking Amazon's changes ever since.

The question I had going into the meetup was: should Eucalyptus be used to make an organization's private cloud? The short answer is no.

The project is of high quality, the people are of the highest quality, but in the end Eucalyptus is a research project from a university. As an academic project Eucalyptus is subject to changes in funding and the research interests of the team. When funding sources dry up so does the project. If the team finds another research area more interesting, or if they get tired of chasing a continuous stream of new Amazon features, or no new grad students sign on, which will happen in a few years, then the project goes dark.

Fears over continuity have at least two solutions: community support and commercial support. Eucalyptus could become community supported open source project. This is unlikely to happen though as it conflicts with the research intent of Eucalyptus. The Eucalyptus team plans to control the core for research purposes and encourage external development of add-on service like SQS. Eucalyptus won't go commercial as University projects must stay clear from commercial pretensions. Amazon is "no comment" on Eucalyptus so it's not clear what they would think of commercial development should it occur.

Taken together these concerns imply Eucalyptus is not a good base for an enterprise quality private cloud. Which they readily admit. It's not enterprise ready Rich repeats. It's not that the quality isn't there. It is and will be. And some will certainly base their private cloud on Eucalyptus, but when making a decision of this type you have to be sure your cloud infrastructure will be around for the long haul. With Eucalyptus that is not necessarily the case. Eucalyptus is still a good choice for it's original research purpose, or as cheap staging platform for Amazon, or as base for temporary clouds, but as your rock solid private cloud infrastructure of the future Eucalyptus isn't the answer.

The long answer is a little more nuanced and interesting.

CloudCamp London 2: private clouds and standardisation

CloudCamp returned to London yesterday, organised with the help of Skills Matter at the Crypt on the Clarkenwell green. The main topics of this cloud/grid computing community meeting were service-level agreements, connecting private and public clouds and standardisation issues.

Managing application on the cloud using a JMX Fabric

This post describes how you can create a federated management model using JMX standard API. Applications that are already using a standard JMX interface can plug-in the new federated implementation without changing the application code and without introducing additional performance overhead.

Alternatives to Google App Engine

One particularly interesting EC2 third party provider is GigaSpaces with their XAP platform that provides in memory transactions backed up to a database. The in memory transactions appear to scale linearly across machines thus providing a distributed in-memory datastore that gets backed up to persistent storage.

Oracle opens Coherence Incubator

During the Coherence Special Interest Group meeting in London, Brian Oliver from Oracle yesterday announced the start of the Coherence Incubator project. Coherence Incubator is a new online repository of projects that provides reference implementation examples for commonly used design patterns and integration solutions based on Oracle Coherence.

Sun N1 Grid Engine Software and the Tokyo Institute of Technology Super Computer Grid

One of the world's leading technical institutes, the Tokyo Institute of Technology (Tokyo Tech) created the fastest supercomputer in Asia, and one of the largest outside of the United States. Using Sun x64 servers and data servers deployed in a grid architecture, Tokyo Tech built a cost-effective, flexible supercomputer that meets the demands of compute- and data-intensive applications. Built in just 35 days, the TSUBAME grid includes hundreds of systems incorporating thousands of processor cores and terabytes of memory, and delivers 47.38 trillion1 floating-point operations per second (TeraFLOPS) of sustained LINPACK benchmark performance and 1.1 petabyte of storage to users running common off-the-shelf applications. Based on the deployment architecture, the grid is expected to reach 100 TeraFLOPS in the future. This Sun BluePrints article provides an overview of the Tokyo Tech grid, named TSUBAME. The third in a series of Sun BluePrints articles on the TSUBAME grid, this document provides an overview of the overall system architecture of the grid, as well as a detailed look at the configuration of the Sun N1 Grid Engine software that makes the grid accessible to users.

Is MapReduce going mainstream?

Compares MapReduce to other parallel processing approaches and suggests new paradigm for clouds and grids

Todd Hoff's picture

Is your cloud as scalable as you think it is?

An unstated assumption is that clouds are scalable. But are they? Stick thousands upon thousands of machines together and there are a lot of potential bottlenecks just waiting to choke off your scalability supply. And if the cloud is scalable what are the chances that your application is really linearly scalable? At 10 machines all may be well. Even at 50 machines the seas look calm. But at 100, 200, or 500 machines all hell might break loose. How do you know?

You know through real life testing. These kinds of tests are brutally hard and complicated. who wants to do all the incredibly precise and difficult work of producing cloud scalability tests? GridDynamics has stepped up to the challenge and has just released their Cloud Performance Reports.

The report is quite detailed so I'll just cover what I found most interesting. GridDynamics in this report test three configurations:

Todd Hoff's picture

GridGain: One Compute Grid, Many Data Grids

GridGain was kind enough to present at the September 17th instance of the Silicon Valley Cloud Computing Group. I've been curious about GridGain so I was glad to see them there. In short GridGain is: an open source computational grid framework that enables Java developers to improve general performance of processing intensive applications by splitting and parallelizing the workload. GridGain can also be thought of as a set of middleware primitives for building applications. GridGain's peer group of competitors includes GigaSpaces, Terracotta, Coherence, and Hadoop.

The speaker for GridGain was the President and Founder, Nikita Ivanov. He has a very pleasant down-to-earth way about him that contrasts nicely with a field given to religious discussions of complex taxomic definitions. Nikita first talked about cloud computing in general. He feels Java is the perfect gateway for cloud computing. Which is good because GridGain only works with Java. The Java centricity of GridGain may be an immediate deal killer or a non-issue for a Java shop. Being so close to the language does offer a lot of power, but it sure sucks in a multi-language environment.

Nikita gave a few definitions which are key to understanding where GridGain stands in the grid matrix:

  • Compute Grids: parallel execution.
  • Data Grids: parallel data storage.
  • Grid Computing: Compute Grids + Data Grids
  • Cloud Computing: datacenter + API. The key is automation via programmability as a way to deploy applications. The advantage is a unified programming model. Build an application on one node and you can run on many nodes without code change. Moving peak loads to the cloud can give you a 10x-100x cost reduction. Cloud computing poses a number of challenges: deployment, data sharing, load balancing, failover, discovery (nodes, availability), provisioning (add, remove), management, monitoring, development process, debugging, inter and external clouds (syncing data, syncing code, failover jobs). Nakita talked some about these issues, but he didn't go in-depth. But he showed a good understanding of the issues involved so I would be inclined to think GridGain handles them well.

  • Todd Hoff's picture

    Product: Condor - Compute Intensive Workload Management

    From their website:
    Condor is a specialized workload management system for compute-intensive jobs. Like other full-featured batch systems, Condor provides a job queueing mechanism, scheduling policy, priority scheme, resource monitoring, and resource management. Users submit their serial or parallel jobs to Condor, Condor places them into a queue, chooses when and where to run the jobs based upon a policy, carefully monitors their progress, and ultimately informs the user upon completion.

    While providing functionality similar to that of a more traditional batch queueing system, Condor's novel architecture allows it to succeed in areas where traditional scheduling systems fail. Condor can be used to manage a cluster of dedicated compute nodes (such as a "Beowulf" cluster). In addition, unique mechanisms enable Condor to effectively harness wasted CPU power from otherwise idle desktop workstations. For instance, Condor can be configured to only use desktop machines where the keyboard and mouse are idle. Should Condor detect that a machine is no longer available (such as a key press detected), in many circumstances Condor is able to transparently produce a checkpoint and migrate a job to a different machine which would otherwise be idle. Condor does not require a shared file system across machines - if no shared file system is available, Condor can transfer the job's data files on behalf of the user, or Condor may be able to transparently redirect all the job's I/O requests back to the submit machine. As a result, Condor can be used to seamlessly combine all of an organization's computational power into one resource.

    Todd Hoff's picture

    Amazon's EC2: Pay as You Grow Could Cut Your Costs in Half

    Update 2: Summize Computes Computing Resources for a Startup. Lots of nice graphs showing Amazon is hard to beat for small machines and become less cost efficient for well used larger machines. Long term storage costs may eat your saving away. And out of cloud bandwidth costs are high.
    Update: via ProductionScale, a nice Digital Web article on how to setup S3 to store media files and how Blue Origin was able to handle 3.5 million requests and 758 GBs in bandwidth in a single day for very little $$$. Also a Right Scale article on Network performance within Amazon EC2 and to Amazon S3. 75MB/s between EC2 instances, 10.2MB/s between EC2 and S3 for download, 6.9MB/s upload.

    Now that Amazon's S3 (storage service) is out of beta and EC2 (elastic compute cloud) has added new instance types (the class of machine you can rent) with more CPU and more RAM, I thought it would be interesting to take a look out how their pricing stacks up.

    The quick conclusion:the more you scale the more you save. A six node configuration in Amazon is about half the cost of a similar setup using a service provider. But cost may not be everything...

    Todd Hoff's picture

    Hire Facebook, Ning, and Salesforce to Scale for You

    One of the premier scaling strategies is always: get someone else to do the work for you. But unlike Huckleberry Finn in Tom Sawyer, you won't have to trick anyone into whitewashing a fence for you. Times have changed. Companies like Ning, Facebook, and Salesforce are more than happy to help. Their price: lock-in.

    Previously you had few options when building a "real" website. You needed to do everything yourself. Infrastructure and application were all yours. Then companies stepped in by commoditizing parts of the infrastructure, but the application was still yours. The next step is full on Borg take no prisoners assimilation where the infrastructure and application are built as one collective. What you have to decide as someone faced with building a scalable website is if these new options are worth the price.

    Todd Hoff's picture

    Should you build your next website using 3tera's grid OS?

    Update 2: 3tera has added Dynamic Appliances, which are "packaged data center operations like backup, migration or SLAs that users can add to their applications to provide functionality."
    Update: in an effort to help cross the chasm of how start building a website using their grid OS, 3tera is offering their Assured Success Plan. The idea is to provide training, consulting, and support so you can get started with some confidence you'll end up succeeding.

    If you are starting or extending a website you have a problem: what technologies should you use?

    Now there are more answers to that question than ever. One new and refreshingly innovative answer is 3tera's grid OS. In this podcast interview with Bert Armijo from 3tera, we'll learn how 3tera wants to change how you build websites.

    How? By transforming the physical into the virtual and then allowing the virtual to be manipulated as if it were real. Could I possibly be more abstract? Not really. But when I think of what they are doing that's the mental model I see whirling around in my mind. Don't worry, I promise we'll drill down to how it can help you in the real world. Let's see how.

    Todd Hoff's picture

    Product: Sun Utility Computing

    The Sun Grid Compute Utility is a simple to use, simple to access data center-on-demand. Sun Grid delivers enterprise computing power and resources over the Internet, enabling developers, researchers, scientists and businesses to optimize performance, speed time to results, and accelerate innovation without investment in IT infrastructure. No matter the size of your business or the size of your job -- there is no barrier to entry and exit. This is the future of computing available today: IT as a service.

    Todd Hoff's picture

    Product: GridLayer. Utility computing for online application

    TGL delivers Virtual Private Datacenters and virtual private servers from grids of commodity servers. Each TGL grid consists of a pool of HP servers connected with a Gigabit backbone network and running 3Tera's AppLogic grid operating system.

    With a Virtual Private Datacenter, you get complete control of your own private grid. Using our visual interface, you set up and assemble disposable virtual infrastructure, including firewalls, load balancers, web servers, database servers, NAS boxes, etc. visually, by pointing and clicking. You can build advanced clusters, deploy large and small applications, and save them as templates that can be provisioned in minutes. You can even build your own virtual servers and appliances.

    Todd Hoff's picture

    Build an Infinitely Scalable Infrastructure for $100 Using Amazon Services

    Can you really create an infinitely scalable infrastructure for less than $100 using Amazon's storage, grid, and queuing services platform? It appears so, at least for the right application. Amazon beams a spot light on the future battle of the roll-your-own versus the connect-the-dots approach to building next generation websites using core external services. Their argument is strong. Using Amazon's platform you can quickly build an infrastructure that would otherwise take an eternity to make, a pile of money to create, and an unbounded mass of people to implement and maintain. Yet Amazon doesn't provide SLAs, so you can you really trust them with your crown jewels? Facebook recently leap frogged Amazon's vision with an even more comprehensive set of services. The battle for the future is on.

    Syndicate content