Abstract—This paper presents the architecture and characteristics of a memory database intended to be used as a cache engine for web applications. Primary goals of this database are speed and efficiency while running on SMP systems with several CPU cores (four and more). A secondary goal is the support for simple metadata structures associated with cached data that can aid in efficient use of the cache. Due to these goals, some data structures and algorithms normally associated with this field of computing needed to be adapted to the new environment.
Update: As expected I'm undergoing a massive spam attack for speaking truth to dark powers. This is the time to be strong. Together we can make a change. What change you may ask? I can't say, just change and lots more change. Let's link arms together and bravely stand against the forces of chaos for a better yesterday and a better tomorrow. CAPTCHA doesn't work. Even Google can't make CAPTCHA work (Spammers Choose GMail). And even if CAPTCHA worked it wouldn't really work because CAPTCHA solving markets (Inside India’s CAPTCHA solving economy) have evolved where for a mere $2 you can buy 1000 human broken CAPTCHA's. And we know once the free market tackles a problem that's it. Game over :-) Making ever more clever CAPTCHA programs won't outwit and outlast the CAPTCHA solving markets. Until Skynet evolves the only way to defeat humans is with humans.
Using Games to Get Humans to Do Work (like CAPTCHA) for FreeHow do we harness the power of humans to do battle with the CAPTCHA solving networks, without, of course, paying them anything? We make it a game! In particular we make a Game With a Purpose (GWAP). Read all about GWAPs in Designing games with a purpose. A GWAP is a game in which people, as a side effect of playing, perform tasks computers are unable to perform.
Google's Image LabelerA good example GWAP is Google's Image Labeler, a game in which people provide meaningful, accurate labels for images on the Web as a side effect of playing the game; for example, an image of a man and a dog is labeled "dog," "man," and "pet.". Now this sounds like work. And it is. But because it's made into a game people will do it for free! An example Labeler session looks like: In the game two people are matched at random to label the same set of images. Points are awarded when you and your partner match labels. Top scores are kept so you can earn your label street cred. But can't people cheat? GWAP games include cheating detection mechanisms, but we won't go into detail here, see Designing games with a purpose for cheater foiling strategies.
ESP Game, Tag a Tune, and SquiglMore games can be found at the GWAP Home Page. They have the ESP Game which is like Labeler. Tag a Tune is a game where players hear tunes, describe them, and through the description guess if they are listening to the same tune. In Squigl partners see an image and a word. Using the mouse each player traces the object described by the word in the image. Winning is when both players trace the same image. Here's what a Squigl session looks like: So you see the pattern. Players are picked from a pool. They are asked to do some task that's hard for computers to do. The task must be structured so that winning enables the system to learn something valid while providing a feeling of game play for the humans. Points are awarded and scores are kept to keep the poor human slaves playing.
Creating a Spam Catcher GameWith the basic ideas in place let's create a game for identifying and filtering out comment spam. According to Designing games with a purpose this appears to a be an output-agreement type game, which has the following structure:
The Final MoveSpam crushes many sites. Many site owners don't even allow comments anymore because of the time it takes to deal with spam, which is a shame, because without interactivity the internet might as well be a newspaper. We can't let those spammers win! A system like the Spam Catcher Game might be able provide the human oversight, quick latency, and high throughput needed to out compete the CAPTCHA solving networks. The game is finally afoot!
During the Coherence Special Interest Group meeting in London, Brian Oliver from Oracle yesterday announced the start of the Coherence Incubator project. Coherence Incubator is a new online repository of projects that provides reference implementation examples for commonly used design patterns and integration solutions based on Oracle Coherence.
A group of top Silicon Valley engineers (ex-Yahoo, Facebook, Google) have come together to launch a new startup called Cloudera.
Not yet launched, it intends to help other companies adopt a promising software platform called Hadoop.
Hadoop is an open-source software project (written in Java) designed to let developers write and run applications that process huge amounts of data. While it could potentially improve a wide range of other software, the ecosystem supporting its implementation is still developing. Which is where Cloudera hopes to make a place for itself.
More on Hadoop: It uses the Google-introduced MapReduce systems framework that divides applications into small blocks of work, creating multiple replicas of data blocks that it places on various computer nodes.
It is already in use at large companies like Yahoo.
Read more about Cloudera here.
This Sun BluePrint article describes the storage architecture of the Tokyo Institute of Technology TSUBAME grid. The Tokyo Institute of Technology is of the world's leading technical institutes, and recently created the fastest supercomputer in Asia, and one of the largest supercomputers outside of the United States. By deploying Sun Fire x64 servers and data servers in a grid architecture, Tokyo Tech built a cost-effective and flexible supercomputer consisting of hundreds of systems, thousands of processors, terabytes of memory and a petabyte of storage that supports users running common off-the-shelf applications. This is the second of a three-article series. It describes the steps to install and configuring the Lustre file system within the storage architecture.
One of the world's leading technical institutes, the Tokyo Institute of Technology (Tokyo Tech) created the fastest supercomputer in Asia, and one of the largest outside of the United States. Using Sun x64 servers and data servers deployed in a grid architecture, Tokyo Tech built a cost-effective, flexible supercomputer that meets the demands of compute and data-intensive applications. Built in just 35 days, the TSUBAME grid includes hundreds of systems incorporating thousands of processor cores and terabytes of memory, and delivers 47.38 trillion floating-point operations per second (TeraFLOPS) of sustained LINPACK benchmark performance and 1.1 petabyte of storage to users running common off-the-shelf applications. Based on the deployment architecture, the grid is expected to reach 100 TeraFLOPS in the future. This article provides an overview of the Tokyo Tech grid, named TSUBAME. The first in a series of Sun BluePrints articles on the TSUBAME grid, this document discusses the requirements and overall system architecture of the grid, as well as the tuning performed to achieve high LINPACK benchmark performance results.
Sun Customer Ready HPC Cluster: Reference Configurations with Sun Fire X4100, X4200, and X4600 Servers
The reference configurations described in this paper are starting points for building Sun Customer Ready HPC Clusters configured with the Sun Fire X4100, X4200, and X4600 families of servers. The configurations define how Sun Systems Group products can be configured in a typical grid rack deployment. This document describes configurations using Sun Fire X4100 and X4100 M2 servers with a Gigabit Ethernet data fabric and with a high-speed InfiniBand fabric. In addition, this document describes configurations using Sun Fire X4200, X4200 M2, X4600, and X4600 M2 servers with an InfiniBand data fabric. These configurations focus on single rack solutions, with external connections through uplink ports of the switches. These reference configurations have been architected using Sun's expertise gained in actual, real-world installations. Within certain constraints, as described in the later sections, the system can be tailored to the customer needs. Certain system components described in this document are only available through Sun's factory integration. Although the information contained here could be used during an integration on-site, the optimal benefit is achieved through Sun Customer Ready System integration.
Sun Customer Ready HPC Cluster: Reference Configurations with Sun Fire X2200 M2 and X2100 M2 Servers
The reference configurations described in this blueprint are starting points for building Sun Customer Ready HPC Clusters configured with Sun Fire X2100 M2 and X2200 M2 servers. The configurations define how Sun Systems Group products can be configured in a typical grid rack deployment. This document describes configurations in detail using Sun Fire X2100 M2 and X2200 M2 servers with a Gigabit Ethernet data fabric, as well as configurations using Sun Fire X2200 M2 servers with a high-speed InfiniBand fabric. These configurations focus on single rack solutions, with external connections through uplink ports of the switches. These reference configurations have been architected using Sun's expertise gained in actual, real-world installations. Within certain constraints, as described in the later sections, the system can be tailored to the customer needs. Certain system components described in this document are only available through Sun's factory integration. Although the information contained here could be used during an integration on-site, the optimal benefit is achieved through Sun Customer Ready System integration.
Hadoop is a distributed computing platform written in Java. It incorporates features similar to those of the Google File System and of MapReduce to process vast amounts of data "Hadoop is a Free Java software framework that supports data intensive distributed applications running on large clusters of commodity computers. It enables applications to easily scale out to thousands of nodes and petabytes of data" (Wikipedia) * What platform does Hadoop run on? * Java 1.5.x or higher, preferably from Sun * Linux * Windows for development * Solaris