advertise
Friday
May012015

Stuff The Internet Says On Scalability For May 1st, 2015

Hey, it's HighScalability time:


Got containers? Gorgeous shot of the CSCL Globe (by Walter Scriptunas II), world's largest container ship: 1,313ft long; 19,000 standard containers.
  • $3000: Tesla's new 7kWh daily cycle battery.
  • Quotable Quotes:
    • @mamund: "Turns out there is nothing about HTTP that I like" --  Douglas Crockford 
    • @PeterChch: Your little unimportant site might be hacked not for your data but for your aws resources. E.g. bitcoin mining.
    • @Joseph_DeSimone: I find it stunning that Google's annual R&D budget totaled $9.8 billion and the Budget for the National Science Foundation was $7.3 billion
    • @jedberg: The new EC2 container service adds the missing granularity to #ec2
    • Randy Shoup: “Every service at Google is either deprecated or not ready yet.”  -- Google engineering proverb
    • @mtnygard: Today the ratio of admins to servers in a well-behaved scalable web companies is about 1 to 10,000. @botchagalupe #craftconf
    • @joshk: Data: There Are Over 9x More Private IPOs Than Actual Tech IPOs 
    • @nwjsmith: “Systems are not algorithms. Systems are much more complex.“ #CraftConf @skamille
    • kk: “Because the center of the universe is wherever there is the least resistance to new ideas.”
    • John Allspaw: Stop thinking that you’re trying to solve a troubleshooting problem; you’re not. Instead of telling me about how your software will solve problems, show me that you’re trying to build a product that is going to join my team as an awesome team member, because I’m going to think about using/buying your service in the same way that I think about hiring.
    • @mpaluchowski: "Netflix is a #logging system that happens to play movies." #CraftConf
    • John Wilke:  Resiliency is more important than performance.
    • @peakscale: The server/cattle metaphor rubs me the wrong way... all the farmers I knew and worked for named and cared about their herd.
    • @aphyr: "We've managed to run 40 services in prod for three years without needing to introduce a consensus system" @skamille, #CraftConf
    • @ryantomlinson: “Spotify have been using DNS for service discovery for a long time” #CraftConf
    • @csanchez: Google "we start over 2 billion containers per week" containers, containers, containers! #qconlondon 
    • @tyler_treat: If you're using RabbitMQ, consider replacing it with Kafka. Higher throughput, better replication, replayability. Same goes for other MQs.
    • @tastapod: @botchagalupe telling #CraftConf how it is! “Yelp is spinning up 8 containers a second. This is the real sh*t, man!”
    • @mpaluchowski: "A static #alert threshold won't be any good next week. It must be calculated." #CraftConf
    • @mtnygard: #craftconf @randyshoup “Microservices are an answer to a scaling problem, not a business problem.”  So right.
    • @adrianco: @mtnygard @randyshoup speed of development is the business problem that leads to Microservices.
    • @b6n: the aws financials should be a wake-up call to anyone still thinking cloud isn't a game of raw scale
    • @mtnygard: The “edge” used to be top-of-rack. Then the hypervisor. Now it’s the container. That’s 100x the number of IPs. — @botchagalupe #craftconf
    • @idajantis: 'An escalator can never break; it can only become stairs' - nice one by @viktorklang at #CraftConf on Distributed Systems failing gracefully
    • @jessitron: "You should store your data in a real database and replicate it to Elasticsearch." @aphyr #CraftConf

  • A telling difference between Google and Apple: Google Now becomes a more robust platform with 70 new partner apps. Apple takes an app-centric view of the world and Google not surprisingly takes a data centric view. With Google developers feed Google data for Google to display. With Apple developers feed Apple apps for users to consume. On Apple developers push their own brand and control functionality through bundled extensions, but Google will have the perspective to really let their deep learning prowess sing. So there's a real choice.

  • How appropriate that game theory is applied to cyberwarfare. Mutually Assured Destruction isn't just for nukes. Pentagon Announces New Strategy for Cyberwarfare: “Deterrence is partially a function of perception,” the new strategy says. “It works by convincing a potential adversary that it will suffer unacceptable costs if it conducts an attack on the United States, and by decreasing the likelihood that a potential adversary’s attack will succeed.

  • Reducing big data using ideas from quantum theory makes it easier to interpret. So maybe QM is nature's way of making sense of the BigData that is the Universe?

  • Synergy is not always BS. Cheaper bandwidth or bust: How Google saved YouTube: YouTube was burning through $2 million a month in bandwidth costs before the acquisition. What few knew at the time was that Google was a pioneer in data center technology, which allowed it to dramatically lower the costs of running YouTube.

  • In a winner take all market is the cost of customer acquisition pyrrhic? Uber Burning $750 Million in a Year.

  • The cloud behind the cloud. Apple details how it rebuilt Siri on Mesos: Apple’s custom Mesos scheduler is called J.A.R.V.I.S.; Apple uses J.A.R.V.I.S. as its internal platform-as-a-service; Apple’s Mesos cluster spans thousands of nodes and runs about a hundred services; Siri’s Mesos backend represents its third generation, and a move away from “traditional” infrastructure.

Don't miss all that the Internet has to say on Scalability, click below and become eventually consistent with all scalability knowledge (which means this post has many more items to read so please keep on reading)...

Click to read more ...

Wednesday
Apr292015

Paper: DNACloud: A Tool for Storing Big Data on DNA

"From the dawn of civilization until 2003, humankind generated five exabytes (1 exabytes = 1 billion gigabytes) of data. Now we produce five exabytes every two days and the pace is accelerating."

-- Eric Schmidt, Executive Chairman, Google, August 4, 2010. 

 

Where are we going to store the deluge of data everyone is warning us about? How about in a DNACloud that can store store 1 petabyte of information per gram of DNA?

Writing is a little slow. You have to convert your data file to a DNA description that is sent to a biotech company that will send you back a vile of synthetic DNA. Where do you store it? Your refrigerator.

Reading is a little slow too. The data can apparently be read with great accuracy, but to read it you have to sequence the DNA first, and that might take awhile.

The how of it is explained in DNACloud: A Tool for Storing Big Data on DNA (poster). Abstract:

The term Big Data is usually used to describe huge amount of data that is generated by humans from digital media such as cameras, internet, phones, sensors etc. By building advanced analytics on the top of big data, one can predict many things about the user such as behavior, interest etc. However before one can use the data, one has to address many issues for big data storage. Two main issues are the need of large storage devices and the cost associated with it. Synthetic DNA storage seems to be an appropriate solution to address these issues of the big data. Recently in 2013, Goldman and his collegues from European Bioinformatics Institute demonstrated the use of the DNA as storage medium with capacity of storing 1 peta byte of information on one gram of DNA and retrived the data successfully with low error rate [1]. This significant step shows a promise for synthetic DNA storage as a useful technology for the future data storage. Motivated by this, we have developed a software called DNACloud which makes it easy to store the data on the DNA. In this work, we present detailed description of the software.

 Related Articles

Tuesday
Apr282015

Sponsored Post: OpenDNS, MongoDB, Internap, Aerospike, SignalFx, InMemory.Net, Couchbase, VividCortex, MemSQL, Scalyr, AiScaler, AppDynamics, ManageEngine, Site24x7

Who's Hiring?

  • The Cloud Platform team at OpenDNS is building a PaaS for our engineering teams to build and deliver their applications. This is a well rounded team covering software, systems, and network engineering and expect your code to cut across all layers, from the network to the application. Learn More

  • At Scalyr, we're analyzing multi-gigabyte server logs in a fraction of a second. That requires serious innovation in every part of the technology stack, from frontend to backend. Help us push the envelope on low-latency browser applications, high-speed data processing, and reliable distributed systems. Help extract meaningful data from live servers and present it to users in meaningful ways. At Scalyr, you’ll learn new things, and invent a few of your own. Learn more and apply.

  • UI EngineerAppDynamics, founded in 2008 and lead by proven innovators, is looking for a passionate UI Engineer to design, architect, and develop our their user interface using the latest web and mobile technologies. Make the impossible possible and the hard easy. Apply here.

  • Software Engineer - Infrastructure & Big DataAppDynamics, leader in next generation solutions for managing modern, distributed, and extremely complex applications residing in both the cloud and the data center, is looking for a Software Engineers (All-Levels) to design and develop scalable software written in Java and MySQL for backend component of software that manages application architectures. Apply here.

Fun and Informative Events

  • How to Get a Game-Changing Performance Advantage with Intel SSDs and Aerospike. Presenter: Frank Ober, Data Center Solution Architect at Intel Corporation. Wednesday, May 13, 2015 @ 10:00AM PST, 1:00PM PST. Learn how to maximize the price/performance of your Intel Solid-State Drives (SSDs) with Aerospike. Frank Ober of Intel’s Solutions Group will review how he achieved 1+ million transactions per second on a single dual socket Xeon Server with SSDs using the open source tools of Aerospike for benchmarking. Register Now.

  • MongoDB World brings together over 2,000 developers, sysadmins, and DBAs in New York City on June 1-2 to get inspired, share ideas and get the latest insights on using MongoDB. Organizations like Salesforce, Bosch, the Knot, Chico’s, and more are taking advantage of MongoDB for a variety of ground-breaking use cases. Find out more at http://mongodbworld.com/ but hurry! Super Early Bird pricing ends on April 3.

Cool Products and Services

  • SQL for Big Data: Price-performance Advantages of Bare Metal. When building your big data infrastructure, price-performance is a critical factor to evaluate. Data-intensive workloads with the capacity to rapidly scale to hundreds of servers can escalate costs beyond your expectations. The inevitable growth of the Internet of Things (IoT) and fast big data will only lead to larger datasets, and a high-performance infrastructure and database platform will be essential to extracting business value while keeping costs under control. Read more.

  • SignalFx: just launched an advanced monitoring platform for modern applications that's already processing 10s of billions of data points per day. SignalFx lets you create custom analytics pipelines on metrics data collected from thousands or more sources to create meaningful aggregations--such as percentiles, moving averages and growth rates--within seconds of receiving data. Start a free 30-day trial!

  • InMemory.Net provides a Dot Net native in memory database for analysing large amounts of data. It runs natively on .Net, and provides a native .Net, COM & ODBC apis for integration. It also has an easy to use language for importing data, and supports standard SQL for querying data. http://InMemory.Net

  • Benchmark: MongoDB 3.0 (w/ WiredTiger) vs. Couchbase 3.0.2. Even after the competition's latest update, are they more tired than wired? Get the Report.

  • VividCortex goes beyond monitoring and measures the system's work on your MySQL and PostgreSQL servers, providing unparalleled insight and query-level analysis. This unique approach ultimately enables your team to work more effectively, ship more often, and delight more customers.

  • MemSQL provides a distributed in-memory database for high value data. It's designed to handle extreme data ingest and store the data for real-time, streaming and historical analysis using SQL. MemSQL also cost effectively supports both application and ad-hoc queries concurrently across all data. Start a free 30 day trial here: http://www.memsql.com/

  • aiScaler, aiProtect, aiMobile Application Delivery Controller with integrated Dynamic Site Acceleration, Denial of Service Protection and Mobile Content Management. Also available on Amazon Web Services. Free instant trial, 2 hours of FREE deployment support, no sign-up required. http://aiscaler.com

  • ManageEngine Applications Manager : Monitor physical, virtual and Cloud Applications.

  • www.site24x7.com : Monitor End User Experience from a global monitoring network.

If any of these items interest you there's a full description of each sponsor below. Please click to read more...

Click to read more ...

Monday
Apr272015

How can we Build Better Complex Systems? Containers, Microservices, and Continuous Delivery.

We must be able to create better complex software systems. That’s that message from Mary Poppendieck in a wonderful far ranging talk she gave at the Craft Conference: New New Software Development Game: Containers, Micro Services.

The driving insight is complexity grows nonlinearly with size. The type of system doesn’t really matter, but we know software size will continue to grow so software complexity will continue to grow even faster.

What can we do about it? The running themes are lowering friction and limiting risk:

  • Lower friction. This allows change to happen faster. Methods: dump the centralizing database; adopt microservices; use containers; better organize teams.

  • Limit risk. Risk is inherent in complex systems. Methods: PACT testing; continuous delivery.

Some key points:

  • When does software really grow? When smart people can do their own thing without worrying about their impact on others. This argues for building federated systems that ensure isolation, which argues for using microservices and containers.

  • Microservices usually grow successfully from monoliths. In creating a monolith developers learn how to properly partition a system.

  • Continuous delivery both lowers friction and lowers risk. In a complex system if you want stability, if you want security, if you want reliability, if you want safety then you must have lots of little deployments. 

  • Every member of a team is aware of everything. That's what makes a winning team. Good situational awareness.

The highlight of the talk for me was the section on the amazing design of the Swedish Gripen Fighter Jet. Talks on microservices tend to be highly abstract. The fun of software is in the building. Talk about parts can be so nebulous. With the Gripen the federated design of the jet as a System of Systems becomes glaringly concrete and real. If you can replace your guns, radar system, and virtually any other component without impacting the rest of the system, that’s something! Mary really brings this part of the talk home. Don’t miss it.

It’s a very rich and nuanced talk, there’s a lot history and context given, so I can’t capture all the details, watching the video is well worth the effort. Having said that, here’s my gloss on the talk...

Hardware Scales by Abstraction and Miniaturization

Click to read more ...

Friday
Apr172015

Stuff The Internet Says On Scalability For April 17th, 2015

Hey, it's HighScalability time:

A fine tribute on Silicon Valley & hilarious formula evaluating Peter Gregory's positive impact on humanity.

  • 118/196: nations becoming democracies since mid19th century; $70K: nice minimum wage; 70 million: monthly StackExchange visitors; 1 billion: drone planted trees; 1,000 Years: longest-exposure camera shot ever

  • Quotable Quotes:

    • @DrQz: #Performance modeling is really about spreading the guilt around.

    • @natpryce: “What do we want?” “More levels of indirection!” “When do we want it?” “Ask my IDateTimeFactoryImplBeanSingletonProxy!”

    • @BenedictEvans: In the late 90s we were euphoric about what was possible, but half what we had sucked. Now everything's amazing, but we worry about bubbles

    • Calvin Zito on Twitter: "DreamWorks Animation: One movie, 250 TB to make.10 movies in production at one time, 500 million files per movie. Wow."

    • Twitter: Some of our biggest MySQL clusters are over a thousand servers.

    • @SaraJChipps: It's 2015: open source your shit. No one wants to steal your stupid CRUD app. We just want to learn what works and what doesn't.

    • Calvin French-Owen: And as always: peace, love, ops, analytics.

    • @Wikipedia: Cut page load by 100ms and you save Wikipedia readers 617 years of wait annually. Apply as Web Performance Engineer

    • @IBMWatson: A person can generate more than 1 million gigabytes of health-related data.

    • @allspaw: "We’ve learned that automation does not eliminate errors." (yes!)  

    • @Obdurodon: Immutable data structures solve everything, in any environment where things like memory allocators and cache misses cost nothing.

    • KaiserPro: Pixar is still battling with lots of legacy cruft. They went through a phase of hiring the best and brightest directly from MIT and the like.

    • @Obdurodon: Immutable data structures solve everything, in any environment where things like memory allocators and cache misses cost nothing.

    • @abt_programming: "Duplication is far cheaper than the wrong abstraction" - @sandimetz

    • @kellabyte: When I see places running 1,200 containers for fairly small systems I want to scream "WHY?!"

    • chetanahuja: One of the engineers tried running our server stack on a raspberry for a laugh.. I was gobsmacked to hear that the whole thing just worked (it's a custom networking protocol stack running in userspace) if just a bit slower than usual.

  • Chances are if something can be done with your data, it will be done. @RMac18: Snapchat is using geofilters specific to Uber's headquarter to poach engineers.

  • Why (most) High Level Languages are Slow. Exactly this by masterbuzzsaw: If manual memory management is cancer, what is manual file management, manual database connectivity, manual texture management, etc.? C# may have “saved” the world from the “horrors” of memory management, but it introduced null reference landmines and took away our beautiful deterministic C++ destructors.

  • Why NFS instead of S3/EBS? nuclearqtip with a great answer: Stateful; Mountable AND shareable; Actual directories; On-the-wire operations (I don't have to download the entire file to start reading it, and I don't have to do anything special on the client side to support this; Shared unix permission model; Tolerant of network failures Locking!; Better caching ; Big files without the hassle.

Don't miss all that the Internet has to say on Scalability, click below and become eventually consistent with all scalability knowledge (which means this post has many more items to read so please keep on reading)...

Click to read more ...

Thursday
Apr162015

Paper: Large-scale cluster management at Google with Borg

Joe Beda (@jbeda): Borg paper is finally out. Lots of reasoning for why we made various decisions in #kubernetes. Very exciting.

The hints and allusions are over. We now have everything about Google's long rumored Borg project in one iconic Google style paper: Large-scale cluster management at Google with Borg.

When Google blew our minds by audaciously treating the Datacenter as a Computer it did not go unnoticed that by analogy there must be an operating system for that datacenter/computer.

Now we have the story behind a critical part of that OS:

Google's Borg system is a cluster manager that runs hundreds of thousands of jobs, from many thousands of different applications, across a number of clusters each with up to tens of thousands of machines.

It achieves high utilization by combining admission control, efficient task-packing, over-commitment, and machine sharing with process-level performance isolation. It supports high-availability applications with runtime features that minimize fault-recovery time, and scheduling policies that reduce the probability of correlated failures. Borg simplifies life for its users by offering a declarative job specification language, name service integration, real-time job monitoring, and tools to analyze and simulate system behavior.

We present a summary of the Borg system architecture and features, important design decisions, a quantitative analysis of some of its policy decisions, and a qualitative examination of lessons learned from a decade of operational experience with it.

Virtually all of Google’s cluster workloads have switched to use Borg over the past decade. We continue to evolve it, and have applied the lessons we learned from it to Kubernetes

The next version of Borg was called Omega and Omega is being rolled up into Kubernetes (steersman, helmsman, sailing master), which has been open sourced as part of Google's Cloud initiative.

Note how the world has changed. A decade ago when Google published their industry changing Big Table and Map Reduce papers they launched a thousand open source projects in response. Now we are not only seeing Google open source their software instead of others simply copying the ideas, the software has been released well in advance of the paper describing the software.

The future is still in balance. There's a huge fight going on for the future of what software will look like, how it is built, how it is distributed, and who makes the money. In the search business keeping software closed was a competitive advantage. In the age of AWS the only way to capture hearts and minds is by opening up your software. Interesting times.

Related Articles

Wednesday
Apr152015

Full Stack Tuning for a 100x Load Increase and 40x Better Response Times

A world that wants full stack developers also needs full stack tuners. That tuning process, or at least the outline of a full stack tuning process is something Ronald Bradford describes in not quite enough detail in Improving performance – A full stack problem.

The general philosophy is:

  • Understanding were to invest your energy first, know what the return on investment can be.
  • Measure and verify every change.

He lists several tips for general website improvements:

Click to read more ...

Tuesday
Apr142015

Sponsored Post: OpenDNS, MongoDB, Internap, Aerospike, Nervana, SignalFx, InMemory.Net, Couchbase, VividCortex, Transversal, MemSQL, Scalyr, AiScaler, AppDynamics, ManageEngine, Site24x7

Who's Hiring?

  • The Cloud Platform team at OpenDNS is building a PaaS for our engineering teams to build and deliver their applications. This is a well rounded team covering software, systems, and network engineering and expect your code to cut across all layers, from the network to the application. Learn More

  • At Scalyr, we're analyzing multi-gigabyte server logs in a fraction of a second. That requires serious innovation in every part of the technology stack, from frontend to backend. Help us push the envelope on low-latency browser applications, high-speed data processing, and reliable distributed systems. Help extract meaningful data from live servers and present it to users in meaningful ways. At Scalyr, you’ll learn new things, and invent a few of your own. Learn more and apply.

  • Nervana Systems is hiring several engineers for cloud positions. Nervana is a startup based in Mountain View and San Diego working on building a highly scalable deep learning platform on CPUs, GPUs and custom hardware. Deep Learning is an AI/ML technique breaking all the records by a wide-margin in state of the art benchmarks across domains such as image & video analysis, speech recognition and natural language processing. Please apply here and mention “highscalability.com” in your message.

  • Linux Web Server Systems EngineerTransversal. We are seeking an experienced and motivated Linux System Engineer to join our Engineering team. This new role is to design, test, install, and provide ongoing daily support of our information technology systems infrastructure. As an experienced Engineer you will have comprehensive capabilities for understanding hardware/software configurations that comprise system, security, and library management, backup/recovery, operating computer systems in different operating environments, sizing, performance tuning, hardware/software troubleshooting and resource allocation. Apply here.

  • UI EngineerAppDynamics, founded in 2008 and lead by proven innovators, is looking for a passionate UI Engineer to design, architect, and develop our their user interface using the latest web and mobile technologies. Make the impossible possible and the hard easy. Apply here.

  • Software Engineer - Infrastructure & Big DataAppDynamics, leader in next generation solutions for managing modern, distributed, and extremely complex applications residing in both the cloud and the data center, is looking for a Software Engineers (All-Levels) to design and develop scalable software written in Java and MySQL for backend component of software that manages application architectures. Apply here.

Fun and Informative Events

  • MongoDB World brings together over 2,000 developers, sysadmins, and DBAs in New York City on June 1-2 to get inspired, share ideas and get the latest insights on using MongoDB. Organizations like Salesforce, Bosch, the Knot, Chico’s, and more are taking advantage of MongoDB for a variety of ground-breaking use cases. Find out more at http://mongodbworld.com/ but hurry! Super Early Bird pricing ends on April 3.

Cool Products and Services

  • SQL for Big Data: Price-performance Advantages of Bare Metal. When building your big data infrastructure, price-performance is a critical factor to evaluate. Data-intensive workloads with the capacity to rapidly scale to hundreds of servers can escalate costs beyond your expectations. The inevitable growth of the Internet of Things (IoT) and fast big data will only lead to larger datasets, and a high-performance infrastructure and database platform will be essential to extracting business value while keeping costs under control. Read more.

  • Looking for a scalable NoSQL database alternative? Aerospike is validating the future of ACID compliant NoSQL with our open source Key-Value Store database for real-time transactions. Download our free Community Edition or check out the Trade-In program to get started. Learn more.

  • SignalFx: just launched an advanced monitoring platform for modern applications that's already processing 10s of billions of data points per day. SignalFx lets you create custom analytics pipelines on metrics data collected from thousands or more sources to create meaningful aggregations--such as percentiles, moving averages and growth rates--within seconds of receiving data. Start a free 30-day trial!

  • InMemory.Net provides a Dot Net native in memory database for analysing large amounts of data. It runs natively on .Net, and provides a native .Net, COM & ODBC apis for integration. It also has an easy to use language for importing data, and supports standard SQL for querying data. http://InMemory.Net

  • Benchmark: MongoDB 3.0 (w/ WiredTiger) vs. Couchbase 3.0.2. Even after the competition's latest update, are they more tired than wired? Get the Report.

  • VividCortex goes beyond monitoring and measures the system's work on your MySQL and PostgreSQL servers, providing unparalleled insight and query-level analysis. This unique approach ultimately enables your team to work more effectively, ship more often, and delight more customers.

  • MemSQL provides a distributed in-memory database for high value data. It's designed to handle extreme data ingest and store the data for real-time, streaming and historical analysis using SQL. MemSQL also cost effectively supports both application and ad-hoc queries concurrently across all data. Start a free 30 day trial here: http://www.memsql.com/

  • aiScaler, aiProtect, aiMobile Application Delivery Controller with integrated Dynamic Site Acceleration, Denial of Service Protection and Mobile Content Management. Also available on Amazon Web Services. Free instant trial, 2 hours of FREE deployment support, no sign-up required. http://aiscaler.com

  • ManageEngine Applications Manager : Monitor physical, virtual and Cloud Applications.

  • www.site24x7.com : Monitor End User Experience from a global monitoring network.

If any of these items interest you there's a full description of each sponsor below. Please click to read more...

Click to read more ...

Monday
Apr132015

Three Fast Data Application Patterns

This is guest post by John Piekos, VP Engineering at VoltDB. I understand this is a little PRish, but I think the ideas are solid.

The focus of many developers and architects in the past few years has been on Big Data, specifically mining historical intelligence from the Data Lake (usually a Hadoop stack containing terabytes to petabytes of data).

Now, product architects are asking how they can use this business intelligence for competitive advantage. As a result, application developers have come to see the value of using and acting in real-time on streams of fast data; using OLAP reporting wisdom, they can realize the benefits of both fast data and Big Data. As a result, a new set of application patterns have emerged. The applications are designed to capture value from fast-moving streaming data, before it reaches Hadoop.

At VoltDB we call this new breed of applications “fast data” applications. The goal of these fast data applications is to do more than just push data into Hadoop asap, but also to capture real-time value from the data the moment the data arrives.  

Because traditional databases historically haven’t been fast enough, developers have been forced to go to great effort to build fast data applications - they build complex multi-tier systems often involving a handful of tools typically utilizing a dozen or more servers.  However, a new class of database technology, especially NewSQL offerings, has changed this equation.

If you have a relational database that is fast enough, highly available, and able to scale horizontally, the ability to build fast data applications becomes less esoteric and much more manageable. Three new real-time application patterns have emerged as the necessary dataflows to implement real-time applications. These patterns, enabled by new, fast database technology, are:

Click to read more ...

Friday
Apr102015

Stuff The Internet Says On Scalability For April 10th, 2015

Hey, it's HighScalability time:


Beautiful, isn't it? It's the cerebral cortex of a rat that is organized like a mini-Internet.
  • $47 million: value of Cannabis per square km; $3.7 trillion: worldwide IT spending in 2014;  $41B: spend on spectrum; 48,000 square km: How Much Land Would it Take to Power the US via Solar; 2,000: Hadoop clusters in the world; 650 pounds: projected size of ET
  • Quotable Quotes:
    • John Hugg: The number one rule of 21st century data management: If a problem can be solved with an instance of MySQL, it’s going to be.
    • @sarahnovotny: "there is no compression algorithm for experience" - great quote from Andy Jassy at #AWSSummit
    • Steve Martin: I did stand-up comedy for eighteen years. Ten of those years were spent learning, four years were spent refining, and four were spent in wild success.
    • Yossi Vardi: Revenues kill the dream.
    • @AWSSummits: AdRoll's retargeting and real-time betting operates at 6 billion impressions/day at 100ms latency on #AWS #AWSSummit 
    • @AWS_Partners: Nike is operating 70+ services as production loads in #aws today #AWSSummit 
    • @bernardgolden: S3 usage up 102% YOY, ec2 93%: #AWSSummit
    • @bernardgolden: AWS growing over 40% yoy. Next earnings announcement s/b v interesting. #awssummit 
    • @AlexBalk: Here is my Apple Watch review: Your life is largely meaningless. No gadget can obscure its emptiness. You are dying every day.
    • Jonas: Google: all apps become search. Facebook: all apps become feeds. 
    • @jon_moore: most scalable/fast/reliable systems follow these principles: elastic; responsive; resilient; message-driven. #phillyete
    • mrmondo: NVMe [Non-Volatile Memory Express] is one of the most important changes to storage over the past decade.
    • Peter Thiel: Often the smarter people are more prone to trendy, fashionable thinking because they can pick up on things, they can pick up on cues more easily, and so they’re even more trapped by it than people of average ability
    • @nickstenning: The women and men who wrote the nearly bug-free code that controlled a $4Bn space shuttle and the lives of astronauts worked 8am to 5pm.

  • Have you been let down by miracle materials like carbon nanotubes, buckyballs, and graphene? MOFs  (metal–organic frameworks) are here and they are real. This Nature podcast and article tells you all about them (about 13 minutes in). MOFs are scaffolds made of metal containing nodes linked by carbon-based struts. They are pieces that you can plug together and build up into big networks which have spaces in-between. It's those spaces that make MOFs useful. You can trap things in those holes and do things to the molecules when they are trapped. You can store gasses like methane and hydrogen. You can separate mixture of things by varying the pore sizes. Carbon capture is one big use. They also can be used as chemical sensors, maybe in some future version of your watch. Also perhaps write-once-read-many times memory.

  • Is Amazon recreating the Sun ecosystem in the cloud? We now have the Amazon Elastic File System so everything is remote mounted. WorkSpaces feels like diskless workstations. Storage is over on some NAS. The database is somewhere on the network. And so on. Let's hope NFS lock contention failures and network UI jitter don't also make a comeback. OK, I don't remember having anything like Amazon Machine Learning

  • Etsy is giving Facebook's HipHop Virtual Machine (HHVM) for PHP a try. Why? Their API and web code was diverging under parallel development pressures. And they were developing many small API endpoints that used many small requests instead of larger requests that do more work per request. And instead of sharing state in an inherently shared nothing architecture they went with the strategy of just making things faster. This is where HHMV comes in.

  • OK, that's impressive. Migrating from Heroku to AWS (using Docker). It took two engineers about one month. Performance increased 2x and average API response time dropped from around 220ms to under 100ms, and our background task execution times dropped in half as well. Half the number of servers were needed.

  • I was excited to see AWS is opening up Lambda. It's close to some ideas I've been talking about for a while (Building Super Scalable Systems, What Google App Engine Price Changes Say About The Future Of Web Architecture). When it first came out I rehabed my atrophied node.js skills and gave it a shot. Played around a bit, got some code working, but the problem was Lambda only exposed a few integration points and none of those were anything I cared about. Now, they've made Lambda much more general and in the process much more useful. Worth another look. I also suspect their NFS product was necessary to generalize Lambda. Code could be instantly available on every machine via a mount point. Just like back in the day.

  • How Early Adopters Are Using Unikernels - With and Without Containers: The creator of MirageOS, Anil Madhavapeddy’s group is working on a new tool stack called Jitsu (Just-in-Time Summoning of Unikernels), which can start a unikernel in ~20ms in response to a network request. < Also, Towards Heroku for Unikernels: Part 2 - Self Scaling Systems.

Don't miss all that the Internet has to say on Scalability, click below and become eventually consistent with all scalability knowledge (which means this post has many more items to read so please keep on reading)...

Click to read more ...