Update: Typical Programmer tackles the technical issues in Relational Database Experts Jump The MapReduce Shark. The culture clash is still what fascinates me. David DeWitt writes in the Database Column that MapReduce is a major step backwards:
Recently I had to help users on one of my opensource project ISPMan. http://ispman.net This project started in 2001 as I was too unwilling to take care of the DNS and VitualHosting stuff as it was a side-thing to the company I worked for (so i wrote a software that took care of all these little details) Summary: A large project that needs a rewrite can be done in a matter of day. I will not give you a full case study about a project that went through a re-write but a case study about how easy it is to re-write something. Details: My boss was cool enough to let me open-source the project and obviously, I got a lot of cool-cred out of it. Later on I also did some support and implementation and earned quiet some money with it. Eventually I had to let the project go out of my hand to the community as I only did it to facilitate a job that wasnt williing to do. (Setup DNS zones of multiple servers, find out which host should host the website and put VirtualHost section there, find out which mail server should take care of the mailbox and create a mailbox, etc) The project was quiet successful and there are a number of users who are using it. One of the project members took himself to be the project manager and has been running the project since. I have been out of this project almost 5 years (not much time, I have had 2 kids 4.5 years and 8 months old, and my job was very demanding). The stress from my job has weakened a bit now (It took me really 3.5 years to bring them to a stable "actually an oxymoron when we are talking about high scalability" state). Back to the topic. We have had having complains about not having this feature and that for this opensource product. I tried to put in this feature... "I was the main authour. How complicated would it be for me to add this new feature.." and eventuall I went "What the ...". Yup, I hated my code, I hated everything about it. (as a programmer, as a sysadmin, as an op). Yes it was me coded it 7 years ago, it was me who insisted it to be like that... etc. But times have changes.. Its not 2001 any more. So I want to re-write it. Its not the first time that the idea of re-write was in place. Couple of years the senior members (any one who had to deal with the code) wanted to re-write. Its not extensible, its not pluggable, blah, blah, blah. Yup it was not written to do so. I was just written so I dont have to do the work which I did not wanted to and it served it welll... So if you want more extend it, re-write it, fork-it, you get my point (from the guy who wrote an app and opensourced it). I understand that point of view of the developers too "Why do I have to know how you LDAP scheme works, or how does Cyrus mail server works, etc" The re-write part: Using a PHP framework ( http://framework.zend.com ) ( something that almost did not exist in 2001 ) I was able to get something up and running in a couple of hours. http://ispdirector.net This whole thing was not possible without. * PHP5 (we moved from perl to php, in 2001 PHP was really Pretty Home Pages) * Zend framework ( opensource frameworks were scarce in 2001) * Some experience that LDAP as great it is, should not be where you put all your eggs This particular example does not say that perl(or any other language ) is bad and php is good or ldap(or any other directory) is bad and mysql is good , its just how we did it for this particuliar project. Oh and forgot about the "Get Some Help" Tellling your colleagues that they have to move to a new API can be more difficult then to hear a few blah and blah by the secreteries moved from XP to Vista This was about really moving from Perl to PHP, from PHP to Java, from Java to Perl, from Perl to Ruby. The whole point is that it does not matter. If you can do X faster than Y than I take you (In compute intensive scenario) If you do all your calculations this way, it might go somewhere.
Atif Ghaffar has a nice strategy to deal with virus checking uploads:
Sun is buying MySQL for $1 billion. The MySQL team has worked long and hard so I don't begrudge them their pay day. Strike while the iron is offering a lot of cash I say. And I have nothing against Sun. Yet I can't help but think this changes the mental calculation of what database to use. When Oracle acquired Innobase a new independent storage engine was needed for MySQL. How is this different? Does this change your thinking any? Would Martha say it's a good thing? Like Luke I've searched my feelings, but the force is not with me and I don't really know how I feel about it.
So what are we announcing today? That in addition to acquiring MySQL, Sun will be unveiling new global support offerings into the MySQL marketplace. We'll be investing in both the community, and the marketplace - to accelerate the industry's phase change away from proprietary technology to the new world of open web platforms. Read more on Jonathan Schwartz's Blog What do you think about this?
GigaSpaces launched OpenSpaces.org, a community web site for developers who wish to utilize and contribute to the open source OpenSpaces development framework. OpenSpaces extends the Spring Framework for enterprise Java development, and leverages the GigaSpaces eXtreme Application Platform (XAP) for data caching, messaging and as the container for application business logic. It is designed for building highly-available, scale-out applications in distributed environments, such as SOA, cloud computing, grids and commodity servers. OpenSpaces is widely used in a variety of industries, including financial services, telecommunications, manufacturing and retail -- and across the web in e-commerce, Web 2.0 applications such as social networking sites, search and more. OpenSpaces.org already lists more than two dozen projects submitted by the developer community, including GigaSpaces customers, partners and employees. Innovative projects include an instant messaging platform, integration with PHP, configuration via JRuby, an implementation of Spring Batch and a scalable dynamic RSS feed delivery system. GigaSpaces recently announced the OpenSpaces Developer Challenge, a developer competition with $25,000 in total prizes and a $10,000 grand prize. The prizes will be awarded to the most innovative applications built using the OpenSpaces framework or plug-ins that extend it. The Challenge deadline is April 2, 2008 and ‘early bird’ prizes are available for those who submit their concepts by February 13, 2008. Additionally, in November of 2007 GigaSpaces launched its Start-Up Program, which provides free software licenses for qualifying individuals and companies.
The Google Operating System blog has an interesting post on Google's scale based on an updated version of Google's paper about MapReduce. The input data for some of the MapReduce jobs run in September 2007 was 403,152 TB (terabytes), the average number of machines allocated for a MapReduce job was 394, while the average completion time was 6 minutes and a half. The paper mentions that Google's indexing system processes more than 20 TB of raw data. Niall Kennedy calculates that the average MapReduce job runs across a $1 million hardware infrastructure, assuming that Google still uses the same cluster configurations from 2004: two 2 GHz Intel Xeon processors with Hyper-Threading enabled, 4 GB of memory, two 160 GB IDE hard drives and a gigabit Ethernet link. Greg Linden notices that Google's infrastructure is an important competitive advantage. "Anyone at Google can process terabytes of data. And they can get their results back in about 10 minutes, so they can iterate on it and try something else if they didn't get what they wanted the first time." It is interesting to compare this to Amazon EC2:
- $0.40 Large Instance price per hour x 400 instances x 10 minutes = $26.7
- 1 TB data transfer in at $0.10 per GB = $100
I fully and enthusiastically encourage anyone who wants to share a relevant topic to register and post. People have added a lot of good and useful content. Don't be shy. It's been asked how a teaser is created when posting so the full article doesn't display on the front page. A teaser is a paragraph interesting enough to convince readers to click on the "read more" link to get the full article. Creating a teaser in Drupal is accomplished by inserting < ! -- break -- > on a separate line directly after the text you want to be the teaser. Only DO NOT include the spaces. So your post looks like: Teaser Content < ! -- break -- > (no spaces in real life) Rest of Content It's a bit kludgey, but it works.
Gandi.net, a French domain registrar has launched a very flexible dynamic resource allocated VPS service.