Stuff The Internet Says On Scalability For September 9, 2011
Scale the modern way / No brush / No lather / No rub-in / Big tube 35 cents - Drug stores / HighScalability:
- GAE Serves 1.5 Billion Pages a Day
- Potent quotables:
- @kendallmiller : The code changes I'm most proud of are the ones few people will ever see - like I just tripled the scalability of our session analysis.
- @Kellblog : Heard: "Cassandra is more a system on which you build a DBMS than a DBMS itself."
- @DDevine_au : Ah dammit. I'm thinking of using a #NoSQL database. Down the rabbit hole I go.
- A comprehensive guide to parallel video decoding. Emeric Grange with a sweet explanation of the decoding process.
- Node.js vs. Scala - "Scaling in the large". tedsuo tldrs it: in node, there is only one concurrency model. A number of other platforms offer multiple concurrency models. If you want access to one of those other models down the line, you will have to carve off that part of your application and rewrite it in another language.
- Quora: How does HBase write performance differ from write performance in Cassandra with consistency level ALL? Good thread for those selecting between the two. Information, yes, clarity, not so much.
- Acuna with a great series of posts: Benchmarking LevelDB, how to write to flash SSDs effectively, why append-only B-trees fail on SSDs, All about Stratified B-trees.
- Cassandra @ twitter.
- Evernote explains how they use Lucene, in their best Ricky Ricardo voice. Denormalize all searchable and sortable data for each note into the Lucene index rather than just storing the raw search text. Normalize the representation of the text for correct comparisons. Lucene has become the most expensive software component in their shard infrastructure. For more on search take a look at Solr + Hadoop = Big Data Love.
-
Overview of Linux-Kernel Reference Counting by Paul E. McKenney. A key reason for the variety of reference-counting techniques is the wide variety of mechanisms used to protect objects from concurrent access.
- Tide Sensors, Hurricane Irene, and the Internet of Things. After analyzing the data, we observed something interesting. The period of the tides did not change, but the amplitudes (the high tide and low tide marks) were greatly exaggerated as Hurricane Irene passed through Cape Cod.
- George Monbiot tackles a pet peeve of mine, as someone who lives outside of academia and looks at a lot of research papers: How did academic publishers acquire these feudal powers? The knowledge monopoly is as unwarranted and anachronistic as the Corn Laws. Let’s throw off these parasitic overlords and liberate the research which belongs to us. On Hacker News.
- Are timestamps in Cassandra using too much memory? Or is it a case of optimizing prematurely? Only your DBA knows for sure.
- The great debate: Windows Azure vs. Amazon Web Service. GigaOM interviews Craig Knighton of LiquidSpace and Zach Richardson of Ravel Data. Knighton: The real choice is between IaaS and PaaS. If we were interested in an IaaS cloud, we would have definitely considered AWS. Richardson: For pure performance, I highly doubt Azure can match AWS.
- Flickr explains the code behind their cool new geofences based privacy feature. Algorithms, MySQL, and concurrency.
- In Googlebot Spawning New Instances, Jeff Deskins notices something very interesting: Thought it was funny that while I was looking at ways to keep the cost down in App Engine - Googlebot started crawling the site at a rate of up to 12 pages per second. This caused another 6 instances to be created! Real-time is a money maker.
- Azure Scalability:- Use “Queues” as your Bridges. Anshulee Asthana recommends using queues to communicate between components. The advantages are: scalability, extensibility, and decoupling.
- badly underwhelmed by 120GB Intel 510 performance: Joe Landman: we aren’t buying any more Intel 510 SSDs for testing/use. The real world performance is about equivalent to a reasonably fast hard disk. Which sort of destroys the utility of getting an SSD. Speaking of disks, Fifty-five years ago, IBM introduced the disk drive.
- Architecting for massive cloud service provider failures. Royans Tharakan on theport that Apple's iCloud runs on AWS and Azure: That iCloud decided not to build its own cloud infrastructure is very good news for most cloud service consumers.By putting money in existing public facing cloud services like amazon and azure, they help these service providers to reduce overall cost per user and increase quality and SLA of the service they provide.