We have added quite a few features specifically tailored to high scalability and high performance environments to our tool over the years. This includes the ability to log to memory and dump log files on demand (when a crash occurs for example), special backlog queue features, a log service application for central log storage and a lot more. Additionally, our SmartInspect Console (the viewer application) makes viewing, filtering and inspecting large amounts of logging data a lot easier/practical.
Optimus Cloud™ by Prima Grid (www.primagrid.com) provides development and distributed runtime environment designed to deliver internet-scale Software as a Service innovation. Optimus Cloud™ Services Collaboration, deliver innovations with shorter-time-to-market and helps Cloud-based service developers to boost reuse and productivity. Optimus Cloud™ Technology is a unique implementation of server-less, peer-to-peer, grid-based, cloud-ready service runtime that coordinates service components according to user-defined SLAs. Optimus Cloud™ is built for highly distributed heterogeneous environments and delivers none-trivial Qualities-of-Service. With its self-organizing and fault-tolerance capabilities Optimus Cloud™ improves price for performance, flexibility, time-to-market, robustness, application level elasticity, on demand scalability, capacity utilization and asset utilization. Distributed service creation platform: Optimus Cloud™ enables Service Developers to collaborate capabilities and services from distributed heterogeneous sources, assemble and easily mashup capabilities into new innovations. Capabilities can be legacy in house investments, third party cloud services, SOA based Services, APIs, Packaged applications etc. Optimus Cloud™ Delivers services innovation faster, reduces barriers to entry and risks dramatically lowering initial investments. Distributed service delivery platform: Optimus Cloud™ provides a distributed grid computing runtime environment for Internet-scale cloud-ready-applications over clouds with the following benefits and capabilities: Maximize asset utilization: Optimus Cloud™ built from the ground up based on Grid Technology, maximizes asset utilization of Clouds and multi-site data-center. Optimus Cloud™ allocates computing power on-demand and automatically adjusts application scale according to the changing demands. Thus, reduces Total Cost of Ownership (TCO) and maximizes profits. Failure ready: Optimus Cloud™ provides a failure ready environment for hosting highly distributed cloud-ready application components reducing the need for immediate administrative action thus the service scales and adjusts cost-effectively and reliably. Improves productivity and shortens time-to-market: Optimus Cloud™ improves productivity building Cloud-Based applications, enables developers to focus on delivering innovations while collaborating and assembling Cloud-Based Service capabilities. Multi-tenancy: Optimus Cloud™ assures secured access and control of services and information as profile and assets are partitioned and separated from other tenants. Optimus Cloud™ technology implements Multi-tenancy and can host multiple and separated Virtual Organizations. On-Demand Scalability: Optimus Cloud technology distributes workloads for improved utilization of computing power and scale application components on-demand based on virtual organizations policy and SLA. Optimus Cloud™ is a server-less Grid, thus has no single point or performance bottlenecks. Making Optimus Cloud™ a premium choice for massive scale application. Delivers none trivial qualities: Optimus Cloud™ features none-trivial-qualities-of-service in terms of performance, availability, throughput and sustainability as Optimus Cloud™infrastructure services continuously optimize service to meet with SLA terms. Commodity hardware: Optimus Cloud™ is a multi-cloud platform grid, and may be run on top of multiple utility and cloud computing service providers in order to accommodate Service requirements, scale and resilience. Optimus Cloud Technology is optimal for resource utilization over multi-site multi-cloud services enabling services and applications to the Cloud.
Geir Magnusson from 10gen presented a talk titled Cloud Data Persistence or ‘We’re in a database reneaissance - pay attention” today at QCon London 2009. The main message of his talk was that “physical limitations of today’s technology combined with the computational complexity of conventional relational databases are driving databases into new exciting spaces”, or to put it simpler the database landscape is changing and we should keep our eyes on that.
Over the last several decades computer architects have been phenomenally successful turning the transistor bounty provided by Moore's Law into chips with ever increasing single-threaded performance. During many of these successful years, however, many researchers paid scant attention to multiprocessor work. Now as vendors turn to multicore chips, researchers are reacting with more papers on multi-threaded systems. While this is good, we are concerned that further work on single-thread performance will be squashed. To help understand future high-level trade-offs, we develop a corollary to Amdahl's Law for multicore chips [Hill & Marty, IEEE Computer 2008]. It models fixed chip resources for alternative designs that use symmetric cores, asymmetric cores, or dynamic techniques that allow cores to work together on sequential execution. Our results encourage multicore designers to view performance of the entire chip rather than focus on core efficiencies. Moreover, we observe that obtaining optimal multicore performance requires further research BOTH in extracting more parallelism and making sequential cores faster. This talk is based on an HPCA 2008 keynote address. Speaker: Mark D. Hill Mark D. Hill (http://www.cs.wisc.edu/~markhill) is professor in both the computer sciences department and the electrical and computer engineering department at the University of Wisconsin--Madison, where he also co-leads the Wisconsin Multifacet (http://www.cs.wisc.edu/multifacet/) project with David Wood. His research interests include parallel computer system design, memory system design, computer simulation, and recently transactional memory. He earned a PhD from University of California, Berkeley. He is an ACM Fellow and a Fellow of the IEEE.
Paper: Understanding and Designing New Server Architectures for Emerging Warehouse-Computing Environments
Authors: Kevin Lim Parthasarathy Ranganathan Jichuan Chang Chandrakant Patel Trevor Mudge Steven Reinhardt This International Symposium on Computer Architecture paper seeks to understand and design next-generation servers for emerging "warehouse-computing" environments. We make two key contributions. First, we put together a detailed evaluation infrastructure including a new benchmark suite for warehouse-computing workloads, and detailed performance, cost, and power models, to quantitatively characterize bottlenecks. Second, we study a new solution that incorporates volume non-server-class components in novel packaging solutions, with memory sharing and flash-based disk caching. Our results show that this approach has promise, with a 2X improvement on average in performance-per-dollar for our benchmark suite.
Hi we are looking at sharding our existing Java/Oracle based application. We are looking to make the app servers able to process requests for multiple (any?) shard. The concern that has come up is the amount of memory that would be consumed by having so many connection pools on one app server. Additionally there is concern about having so many physical connections to the database server coming from all the various app servers that may talk to that particular shard. I was wondering if anyone else has dealt with this issue and how you resolved it? Thanks, Scott
Gregg Pollack has made 13 screen casts on how to scale rails:
Update: How do you design and handle peak load on the Cloud? by Cloudiquity. Gives a formula to try and predict and plan for peak load and talks about how GigaSpaces XAP, Scalr, RightScale and FreedomOSS can be used to handle peak load within EC2. Theo Schlossnagle, with his usual insight, talks about in Dissecting today's surges how the nature of internet traffic has evolved over time. Traffic now spikes like a heart attack, larger and more quickly than ever from traffic inflow sources like Digg and The New York Times. Theo relates how At least eight times in the past month, we've experienced from 100% to 1000% sudden increases in traffic across many of our clients and those spike can happen as quickly as 60 seconds. To me this sounds a lot like Punctuated equilibrium in evolution, a force that accounts for much creative growth in species... VMs don't spin up in less than 60 seconds so your ability to respond to such massive quick spikes is limited. This assumes of course that you've created an architecture that can automatically scale by adding VMs. Such elastic demand is usually met with a reservoir. You have more VMs in reserve to soak up temporary spikes. But who would do this in reality? Money would be going to non productive VMs, so you are likely to already have put those VMs into production. Interestingly, Theo ties handling sudden unexpected spikes back to performance. We are always told performance and scalability are separate issues. And while I accept this notionally, in my heart of hearts I think they have more in common than not and I think Theo nails why. A well performing system acts as a kind of reservoir for handling spikes before you can ever notice there's a spike. That gives you some time to add more resources to your site if a spike continues. With that reservoir you are just crushed. Theo gives four rules for for handling spikes: Be alert, Be prepared, Perform triage, and Be calm. Please see his site for more discussion of these rules. A few things that might help:
I try to group XTP in to two main groups, type 1 and 2 and then subdivide type 2 in to 2a and 2b. I describe how I do this grouping and then amplify it a little in the context of cloud services.
Update:Barbara Liskov’s Turing Award, and Byzantine Fault Tolerance. Henry Robinson has created an excellent series of articles on consensus protocols. We already covered his 2 Phase Commit article and he also has a 3 Phase Commit article showing how to handle 2PC under single node failures. But that is not enough! 3PC works well under node failures, but fails for network failures. So another consensus mechanism is needed that handles both network and node failures. And that's Paxos. Paxos correctly handles both types of failures, but it does this by becoming inaccessible if too many components fail. This is the "liveness" property of protocols. Paxos waits until the faults are fixed. Read queries can be handled, but updates will be blocked until the protocol thinks it can make forward progress. The liveness of Paxos is primarily dependent on network stability. In a distributed heterogeneous environment you are at risk of losing the ability to make updates. Users hate that. So when companies like Amazon do the seemingly insane thing of creating eventually consistent databases, it should be a little easier to understand now. Partitioning is required for scalability. Partitioning brings up these nasty consensus issues. Not being able to write under partition failures is unacceptable. Therefor create a system that can always write and work on consistency when all the downed partitions/networks are repaired.