advertise
Friday
Dec152017

Stuff The Internet Says On Scalability For December 15th, 2017

Hey, wake up, it's HighScalability time:

 

Merry Christmas and Happy New Year everyone! I'll be off until the new year. Here's hoping all your gifts were selected using machine learning.

 

If you like this sort of Stuff then please support me on Patreon. And I'd appreciate your recommending my new book—Explain the Cloud Like I'm 10—to anyone who needs to understand the cloud (who doesn't?). I think they'll like it. Now with twice the brightness and new chapters on Netflix and Cloud Computing.

 

  • 157 terabytes: per second raw data output of the Square Kilometre Array; $11 million: made by a 6 year old on YouTube; 14TB: helium hard drive; 1: year education raises IQ 1-5 points; 10: seconds mining time to pay for wifi; 110 TFLOPS: Nvidia Launches $3,000 Titan V; 400: lines of JavaScript injected by Comcast; 20 million: requests per second processed by Netflix to personalize artwork; 270: configuration parameters in postgresql.conf; hundreds: eyes in scallop from a unique mirroring system; $72 billion: record DRAM revenue; 20: rockets landed by SpaceX; 

  • Quotable Quotes:
    • Bill Walton: Mirai was originally developed to help them corner the Minecraft market, but then they realized what a powerful tool they built. Then it just became a challenge for them to make it as large as possible.
    • Stephen Andriole: The entire world of big software design, development, deployment and support is dead. Customers know it, big software vendors know it and next generation software architects know it. The implications are far-reaching and likely permanent. Business requirements, governance, cloud delivery and architecture are the assassins of old "big" software and the liberators of new "small" software. In 20 years very few of us will recognize the software architectures of the 20th century or how software in the cloud enables ever-changing business requirements.
    • Melanie Johnston-Hollitt: There is not yet compute available that can process the data we want to collect and use to understand the universe. 
    • Brandon Liverence: Credit and debit card transaction data shows, at these businesses, the average customer in the top 20 percent spent 8x as much as the average customer from the bottom 80 percent.
    • @evonbuelow: After looking at the source code for a series of k8s components & operators, I'm struck by how go (#golang) is used more as a declarative construct than a set of procedural steps encoding sophisticated logic.
    • apandhi: I had a run-in with CoinHive this weekend so I did a bit of research. Most modern computers can do about 30/h a second. Coinhive currently pays out 0.00009030 XMR ($0.02 USD) per 1M hashes. For a 10 second pause, they'd mine 300 hashes (about $0.000006 USD). To make $1 USD, they'd need to have ~166,666.66 people connect to their in store WIFI.
    • @matt_healy: Went from zero clue about #aws codepipeline and friends yesterday, to setting up an automatic Lambda and API gateway deployment with every git push in production today. Awesome!
    • lgierth: Pubsub is probably one of the lesser known features of IPFS right now, given that it's still marked as experimental. We're researching more efficient tree-forming and message routing algorithms, but generally the interface is pretty stable by now. Pubsub is supported in both go-ipfs and js-ipfs. A shining example of pubsub in use is PeerPad, a collaborative text editor exchanging CRDTs over IPFS/Pubsub
    • Manish Rai Jain: Given these advancements, Amazon Neptune’s design is pre-2000. Single server vertically scaled, asynchronously replicated, lack of transactions — all this screams outdated.
    • There are more quotes. So many more. More. More. More. Yep, there's even more.

Don't miss all that the Internet has to say on Scalability, click below and become eventually consistent with all scalability knowledge (which means this post has many more items to read so please keep on reading)...

Click to read more ...

Monday
Dec112017

Netflix: What Happens When You Press Play?

 

This article is a chapter from my new book Explain the Cloud Like I'm 10. The first release was written specifically for cloud newbies. I've made some updates and added a few chapters—Netflix: What Happens When You Press Play? and What is Cloud Computing?—that level it up to a couple ticks past beginner. I think even fairly experienced people might get something out of it.

So if you are looking for a good introduction to the cloud or know someone who is, please take a look. I think you'll like it. I'm pretty proud of how it turned out. 

I pulled this chapter together from dozens of sources that were at times somewhat contradictory. Facts on the ground change over time and depend who is telling the story and what audience they're addressing. I tried to create as coherent a narrative as I could. If there are any errors I'd be more than happy to fix them. Keep in mind this article is not a technical deep dive. It's a big picture type article. For example, I don't mention the word microservice even once :-)

 

Netflix seems so simple. Press play and video magically appears. Easy, right? Not so much.

 

Given our discussion in the What is Cloud Computing? chapter, you might expect Netflix to serve video using AWS. Press play in a Netflix application and video stored in S3 would be streamed from S3, over the internet, directly to your device. 

A completely sensible approach…for a much smaller service. 

But that’s not how Netflix works at all. It’s far more complicated and interesting than you might imagine.

To see why let’s look at some impressive Netflix statistics for 2017.

  • Netflix has more than 110 million subscribers.
  • Netflix operates in more than 200 countries. 
  • Netflix has nearly $3 billion in revenue per quarter.
  • Netflix adds more than 5 million new subscribers per quarter.
  • Netflix plays more than 1 billion hours of video each week. As a comparison, YouTube streams 1 billion hours of video every day while Facebook streams 110 million hours of video every day.
  • Netflix played 250 million hours of video on a single day in 2017.
  • Netflix accounts for over 37% of peak internet traffic in the United States.
  • Netflix plans to spend $7 billion on new content in 2018. 

What have we learned? 

Netflix is huge. They’re global, they have a lot of members, they play a lot of videos, and they have a lot of money.

Another relevant factoid is Netflix is subscription based. Members pay Netflix monthly and can cancel at any time. When you press play to chill on Netflix, it had better work. Unhappy members unsubscribe.

Netflix operates in two clouds: AWS and Open Connect.

How does Netflix keep their members happy? With the cloud of course. Actually, Netflix uses two different clouds: AWS and Open Connect. 

Both clouds must work together seamlessly to deliver endless hours of customer-pleasing video.

The three parts of Netflix: client, backend, CDN.

You can think of Netflix as being divided into three parts: the client, the backend, and the CDN. 

The client is the user interface on any device used to browse and play Netflix videos. It could be an app on your iPhone, a website on your desktop computer, or even an app on your Smart TV. Netflix controls each and every client for each and every device. 

Everything that happens before you hit play happens in the backend, which runs in AWS. That includes things like preparing all new incoming video and handling requests from all apps, websites, TVs, and other devices.

Everything that happens after you hit play is handled by Open Connect. Open Connect is Netflix’s custom global content delivery network (CDN). When you press play the video is served from Open Connect. Don’t worry; we’ll talk about what this means later.

Interestingly, at Netflix they don’t actually say hit play on video, they say clicking start on a title. Every industry has its own lingo.

By controlling all three areas—client, backend, CDN— Netflix has achieved complete vertical integration. 

Netflix controls your video viewing experience from beginning to end. That’s why it just works when you click play from anywhere in the world. You reliably get the content you want to watch when you want to watch it. 

Let’s see how Netflix makes that happen.

In 2008 Netflix Started Moving to AWS

Click to read more ...

Friday
Dec082017

Stuff The Internet Says On Scalability For December 8th, 2017

Hey, it's HighScalability time: 


AWS Geek creates spectacular visual summaries.

 

If you like this sort of Stuff then please support me on Patreon. And please recommend my new book—Explain the Cloud Like I'm 10—to those looking to understand the cloud. I think they'll like it.


  • 127 terabytes: per year growth in blockchain if bitcoin wins; 4: hours from tabula rasa to chess god; 1.4 billion: Slack jobs per day; 400: hyperscale data centers worldwide by 2018; 9.8X: Machine Learning Engineer job growth; 14%: Ethereum transactions are for Cryptokitties; 80: seconds per hash on 55 year old IBM 1401 mainframe; $110 billion: app stores spending in 2018; 25: years since first text message; 4,000: AWS code pushes per day; two elephants: of space dust hits earth every day; 

  • Quotable Quotes:
    • @DavidBrin: Now that's what I call engineering! [Voyager 1] Thrusters that haven't been used in 37 years - still reliable!
    • drkoalamanSo despite not supporting other cryptos the majority of my time on the DNM's I think its officially time to step away from bitcoin, at least for the time being. Went to do a direct deal today with a vendor, realized my $250 purchase would end up costing me $315 or so with fees and would still take probably 24 hours to get to him. As of this morning the lowest electrum fee was approx $32 to send coin.... and people reporting at the highest level still not having coin move 12-16 hours later. Vendors are loving this surge but its creating a sellers market and backlogging the blockchain and fees are just crazy... Not to mention not knowing if your $250 will be worth $300 when it gets to the vendor or a random drop in BTC causing it to be less...
    • Alex Lindsay: 30 years ago you couldn’t get cash on Sunday. Now you can send cash on your watch.
    • @prestonjbyrne: “We’re launching on Ethereum” == “100% uptime, unless someone makes a cat app, at which point all bets are off”
    • @GossiTheDog: So I got somebody to talk, without names, about one of the big S3 bucket leaks. A developer set a bucket to open by mistake. They had open S3 bucket monitoring scripts running and got warning emails, which nobody did anything with - nobody had ownership of S3 buckets.
    • @jaksprats: reInvent 2017 Amazon Time Sync Service … Prediction: by reInvent 2018 either they build their own Spanner or they acquire CockroachDB
    • @PatrickMcFadin: 4/ I don’t think we’ll see many more big AWS database announcements after this year. What they have is “good enough” for them and the consensus is they are moving to AI and “everything Alexa” quickly.
    • Eric Horvitz~ in 50 summers, the aviation industry went from canvas flopping on a beach to the Boeing 707...And what is this thing called consciousness, that we use the word consciousness to refer to. Where does that come from? What are these subjective states? We have no idea. We have theories and reflections, but they are not really based in any scientific theories just yet. However, is it possible in 50 summers, we have a whole new world. We have big surprises. We understand how minds work.
    • @martinkl: Google Realtime API is shutting down … — It’s so risky to rely on proprietary services for building apps.
    • @jeremiahg: Equifax’s stock price isn’t recovering post-breach as expected. If the stock remains flat over the next two months, it’ll been interesting to discuss why — what made them different.
    • @xmal: A possible solution to the Fermi Paradox is that any sufficiently advanced civilization is dedicating all its resources to bitcoin mining.
    • @rbhar90: The AI future where megacorporations control enormous datasets and near infinite compute letting them machine learn to predict our every action terrifies me. At NIPS, it's clear this future is nearer rather than further.
    • Sue Hartley: This plant has built this little structure. It's sort of like a barracks for the ant army. And they live inside. When herbevores arrive the ant army comes out and attacks the herbivores...The plant can spend up to 20% of its resources housing and feeding its army.
    • Netflix: With great elasticity comes great responsibility.
    • Michael Widmann: There’s a change in the (NATO) mindset to accept that computers, just like aircraft and ships, have an offensive capability. I need to do a certain mission and I have an air asset, I also have a cyber asset. What fits best for the me to get the effect I want?
    • There are so many more quotes. More. More. More. More...

Don't miss all that the Internet has to say on Scalability, click below and become eventually consistent with all scalability knowledge (which means this post has many more items to read so please keep on reading)...

Click to read more ...

Tuesday
Dec052017

Sponsored Post: Symbiont, Loupe, Etleap, Aerospike, Stream, Scalyr, VividCortex, Domino Data Lab, MemSQL, Zohocorp

Who's Hiring? 

  • Symbiont is a New York-based financial technology company building new kinds of computer networks to connect independent financial institutions together and allow them to share business logic and data in real time. This involves developing a distributed system which is also decentralized, and which allows for the creation of smart contracts, self-executing cryptographic agreements among counterparties. To do so, we're using a lot of techniques in blockchain technology, as well as those from traditional distributed systems, programming language design and cryptography. We are hiring for a number of roles, from entry-level to expert, including Haskell Backend Engineer, Database Engineer, Product Engineer, Site Reliability Engineer (SRE), Programming Language Engineer and SecOps Engineer. To find out more, just e-mail us your resume

  • Need excellent people? Advertise your job here! 

Fun and Informative Events

  • Webinar: January 23, 11am GMT & again at 11am Pacific / 2pm Eastern. How Microservices is Disrupting FinTech; Featuring Guest Speakers from Forrester Research and Genesis Global. The world of Finance is being digitally disrupted across areas such as mobile payments, money transfers, loans, fundraising, and wealth management. Speed of change is critical, and software has become the tip of the spear in this disruption with a new model called microservices, an approach where large applications are broken down into small, loosely coupled and composable autonomous pieces. Join us as our guest speakers Randy Heffner, VP and Principal Analyst at Forrester Research, and Stephen Murphy, CEO at Genesis Global, explain how microservices improve speed of execution, key emerging practices for doing microservices well, and how microservices are enabling disruption in financial services. Register at: https://www.aerospike.com/lp/microservices-disrupting-fintech-webinar/

  • Advertise your event here!

Cool Products and Services

  • .NET developers dealing with Errors in Production: You know the pain of troubleshooting errors with limited time, limited information, and limited tools. Managers want to know what’s wrong right away, users don’t want to provide log data, and you spend more time gathering information than you do fixing the problem. To fix all that, Loupe was built specifically as a .NET logging and monitoring solution. Loupe notifies you about any errors and tells you all the information you need to fix them. It tracks performance metrics, identifies which errors cause the greatest impact, and pinpoints the root causes. Learn more and try it free today.

  • Enterprise-Grade Database Architecture. The speed and enormous scale of today’s real-time, mission critical applications has exposed gaps in legacy database technologies. Read Building Enterprise-Grade Database Architecture for Mission-Critical, Real-Time Applications to learn: Challenges of supporting digital business applications or Systems of Engagement; Shortcomings of conventional databases; The emergence of enterprise-grade NoSQL databases; Use cases in financial services, AdTech, e-Commerce, online gaming & betting, payments & fraud, and telco; How Aerospike’s NoSQL database solution provides predictable performance, high availability and low total cost of ownership (TCO)

  • The Practical Guide to Managing Data Science at Scale. The ability to manage, scale, and accelerate an entire data science discipline increasingly separates successful organizations from those falling victim to hype and disillusionment. Download this practical guide for data science management, if you're currently, or aspiring to be, a data science manager. The paper demystifies and elevates the current state of data science management.

  • Etleap is a Redshift ETL tool that lets you bring all the data everyone wants into Redshift. It's easy enough for analysts to add and manage data connections on their own, without inundating IT/Engineering with requests for help. It takes just minutes to add new connections such as MySQL, Salesforce, S3, and many others, then you can "set it and forget it." Learn more about Redshift ETL with Etleap.

  • www.site24x7.com : Monitor End User Experience from a global monitoring network. 

  • Build, scale and personalize your news feeds and activity streams with getstream.io. Try the API now in this 5 minute interactive tutorial. Stream is free up to 3 million feed updates so it's easy to get started. Client libraries are available for Node, Ruby, Python, PHP, Go, Java and .NET. Stream is currently also hiring Devops and Python/Go developers in Amsterdam. More than 400 companies rely on Stream for their production feed infrastructure, this includes apps with 30 million users. With your help we'd like to ad a few zeros to that number. Check out the job opening on AngelList.

  • Scalyr is a lightning-fast log management and operational data platform.  It's a tool (actually, multiple tools) that your entire team will love.  Get visibility into your production issues without juggling multiple tabs and different services -- all of your logs, server metrics and alerts are in your browser and at your fingertips. .  Loved and used by teams at Codecademy, ReturnPath, Grab, and InsideSales. Learn more today or see why Scalyr is a great alternative to Splunk.

  • MemSQL envisions a world of adaptable databases and flexible data workloads - your data anywhere in real time. Today, global enterprises use MemSQL as a real-time data warehouse to cost-effectively ingest data and produce industry-leading time to insight. MemSQL works in any cloud, on-premises, or as a managed service. Start a free 30 day trial here: memsql.com/download/.

  • Advertise your product or service here!

If you are interested in a sponsored post for an event, job, or product, please contact us for more information.

Click to read more ...

Monday
Dec042017

The Eternal Cost Savings of Netflix's Internal Spot Market

 

Netflix used their internal spot market to save 92% on video encoding costs. The story of how is told by Dave Hahn in his now annual A Day in the Life of a Netflix Engineer. Netflix first talked about their spot market in a pair of articles published in 2015: Creating Your Own EC2 Spot Market Part 1 and Part 2.

The idea is simple:

  • Netflix runs out of three AWS regions and uses hundreds of thousands of EC2 instances; many are underutilized at various parts in the day.

  • Video encoding is 70% of Netflix’s computing needs, running on 300,000 CPUs in over 1000 different autoscaling groups.

  • So why not create a spot market out of their own underutilized reserved instances to process video encoding?

Before proceeding let's define what a spot market is:

Spot Instances enable you to request unused EC2 instances, which can lower your Amazon EC2 costs significantly. The hourly price for a Spot Instance (of each instance type in each Availability Zone) is set by Amazon EC2, and adjusted gradually based on the long-term supply of and demand for Spot Instances. Your Spot Instance runs whenever capacity is available and the maximum price per hour for your request exceeds the Spot price.

At any point in time AWS has a lot of underutilized instances. It turns out so does Netflix. To understand why creating an internal spot market helped Netflix so much, we'll first need to understand how they encode video.

How Netflix Encodes Video

Click to read more ...

Friday
Dec012017

Stuff The Internet Says On Scalability For December 1st, 2017

Hey, it's HighScalability time: 

  Isn't this all of software? @thomasfuchs: Here we see a group of JavaScript engineers implementing a method that adds two numbers

 

If you like this sort of Stuff then please support me on Patreon. And there's my new book, Explain the Cloud Like I'm 10, for complete cloud newbies. 


  • 82%: chance a file on GitHub is a duplicate; 11: new AWS regions; 42%: AWS yearly growth; 1,100: new AWS services in 2017; 300%: year of year growth in Lambda; 00000000: code to launch a Minuteman missile; 100 megawatts in 100 days: biggest battery in the world; 40: months in prison for VW engineer; 3,000 cores: Raspberry Pi cluster; 11: lost cities found by building a database from 4,000-year-old clay tablets; 1.25 million: Riot Games builds per year; 41.78: miles walked at reinvent; 

  • Quotable Quotes:
    • @gigastacey: This FCC is going to destroy net neutrality, strangle competition in media, let wireline providers off the hook for replacing copper with fiber or an equivalent to copper AND kill broadband access for the poor. This is an unprecedented attack on consumers.
    • @randyshoup: “My service is stateless, by which I mean I have state, but I store it somewhere else.” @samnewman #reInvent
    • @StuFlemingNZ: "Hi, I've found a fault with the English language and I need an entomologist" "An etymologist you mean?" "Νo. It's a bug, not a feature"
    • @copyconstruct: The future where "all the code you ever write is business logic" is one that will be facilitated by the huge cloud providers, leaving most infra startups either acquired or in the dust.
    • Mark Callaghan: At high-concurrency mysqld with jemalloc or tcmalloc can get ~4X more QPS on sysbench read-only compared to mysqld with glibc malloc courtesy of memory allocation stalls.
    • @aisipos: AWS Lambda functions can now use top memory size of 3GB. #reinvent2017
    • @cloud_opinion: It feels like AWS is putting more stress on containers than on serverless - Is it because they want to balance long game with short term revenue to fund the retail business? #reInvent
    • @__apf__: "how was your day" "today I parallelized a thing and slowed it down 100x" "you mean sped it up 100x?" "nope"
    • @mipsytipsy: It’s this simple: if you don’t sample, you don’t scale.
    • Daniel Dennett: The key insight, which I’ve known for years, is that we have to get away from the idea of there being the pure ultimate fixed proposition that captures the information in any informational state.
    • @kelseyhightower: I need to put my hands on EKS before I can speak on it, but my initial reaction: this is a good thing for the community and adds weight to the Kubernetes anywhere promise.
    • @Koffie_kopjes: Ok, so far for #Cloud9 It could be a great IDE, but requiring third party cookies.... thought @Werner told developers are the new security team, but if they require third party cookies in 2017, they aren't very aware... ;) #security #reinvent2017
    • @GossiTheDog: I honestly think IT is backsliding in InfoSec across the world at the moment. I’ve said it before, but a decade ago we had two factor VPNs etc - now there’s a massive tilt towards open RDP, AWS keys everywhere etc etc.
    • melissa mcewen: If I won the lottery would I still code? I would, but it would not be like work. It would be projects I enjoyed. And it would be fewer hours.
    • olalonde: I feel like that should be the other way around. When all the "blockchain startups" and ICOs blow up, Bitcoin will be left standing. The true innovation behind the "blockchain" was its decentralised consensus mechanism. That mechanism is only secure as long as no single entity controls over 50% of the hash rate. Some of the largest Bitcoin miners have so much hash rate today that they could attack any (SHA-256 based) blockchain but the Bitcoin one.
    • @ben11kehoe: "The future" in this keynote is apparently 2020, which will still be containers for most customers. #serverless is on a bit longer timeline for the masses #reinvent
    • @irwin: I’ve seen things you people wouldn’t believe. Gopher, Netscape with frames, the first Browser Wars. Searching for pages with AltaVista, pop-up windows self-replicating, trying to uninstall RealPlayer. All those moments will be lost in time, like tears in rain. Time to die. 
    • Andy Jassy: We're just at the beginning of mainstream enterprise mass migration to the cloud...The torrid pace of adoption and innovation in the serverless (Lambda) space has totally blown us away...in fact, he says that if Amazon.com were starting today, it would go serverless
    • Andy Jassy: In our [AWS] business, you have to be able to have access to capital. It's part of why I think it's hard at the scale that we're operating at. It’s hard for others to start from scratch and pursue it because not only do you need hundreds of services to have a competitive offering, but you need large amounts of capital.
    • RightScale: 70 percent of the 104 price points we include in our comparison have gone down since our last comparison in April 2017. Although these comprise a fraction of the total price points, they represent some of the most commonly used instances
    • RightScale: Overall, Azure is the cost leader, with the lowest price across scenarios about 71% of the time with the highest price just 8% of the time. AWS fell in the middle, while surprisingly, Google Cloud had the highest price half the time, 
    • Takashi Nishikawa: The power grid is quite robust against the propagation of failures — perhaps surprisingly robust, when we consider all the complexities involved
    • @vgcerf: Today is the 40th anniversary of the first three-network test of the Internet Protocols: joining ARPANET, Packet Radio and Packet Satellite networks linking the US and Europe!
    • Andy Jassy: What's different is with every successive year, as we launch a thousand plus features and services, we just have the capabilities to make it easier for the rest of the market to use us. So I think the total addressable market for the areas that we touch, which is infrastructure software, hardware and data center services, is trillions of dollars
    • @GossiTheDog: Again: stop paying the ransoms. We’re creating a billion dollar criminal industry instead of, well, setting up backups. We are monetising low skill crime.
    • @cogconfluence: Asked a bunch of mechanical turkers what one question they would ask to determine if they were talking to a human or AI. fave reply: When is the last time your teeth felt like they had little sweaters on them?
    • @EricJorgenson:  I still find this concept absolutely staggering: "On a daily basis, 15 percent of searches -- 500 million -- have never been seen before by Google's search engine, and that has continued for 15+ years"
    • @mijndert: Things I’m most excited about from the @awscloud #reInvent announcements: Fargate, EKS, Launch Templates, Aurora Multi-Master, Aurora Serverless, MediaLive, Inter-Region VPC Peering, and GuardDuty.
    • @0x424c41434b: You are probably tired of hearing me talk about rust but one reason I like it is that, I feel like a better programmer because it takes out that fear of something going wrong. Concentrating on the logic only made me do things much quicker than I did in the past. More confidence
    • Steve Konves: For those of us developers who have a unwavering love for our craft, there will always exist a bias to make decisions based on our passion for coding rather than profitability or cost savings.
    • @dcaoyuan: With new tuned #akka http client, our crawler can fetch and process 300k+ web page per day, 100 millis per year, on a 16 cores 64G memory machine.
    • @benschwarz: When Amazon changed their pricing to per minute billing I implemented an aggressive autoscaling policy. This policy (with tweaks and improvements along the way) has reduced EC2 costs by >30% and improved service dramatically.
    • @CodyBrown: Seriously. I don’t think people quite understand how many lawyers are getting into Space Law right now. Satellite internet is so feasible and the economics are changing. Massssssive terrestrial infrastructure is about to get competition
    • @Silver_Watchdog: The Evolution of Bitcoin 1. It's the future of global payments. The revolution! 2. So what about Mt.Gox and robo traders. Stocks are rigged too. 3. Yes it forks and creates more supply. Your point? 4. Everyone knows it's all traded for speculation and not used for payments. Geez
    • DHH: Etsy corrupted itself when it sold its destiny in endless rounds of venture capital funding. This wasn’t inevitable, it was a choice. One made by founders and executives who found it easier to ask investors for money than to develop the habits and skills to ask customers.
    • @bitfield: “In cloud-native, network issues—mapping IP addresses, latency, retries—are now falling into the lap of developers.”
    • There's more. So much more more more more.

    Don't miss all that the Internet has to say on Scalability, click below and become eventually consistent with all scalability knowledge (which means this post has many more items to read so please keep on reading)...

    Click to read more ...

Tuesday
Nov212017

Sponsored Post: Symbiont, Loupe, Etleap, Aerospike, Stream, Scalyr, VividCortex, Domino Data Lab, MemSQL, Zohocorp

Who's Hiring? 

  • Symbiont is a New York-based financial technology company building new kinds of computer networks to connect independent financial institutions together and allow them to share business logic and data in real time. This involves developing a distributed system which is also decentralized, and which allows for the creation of smart contracts, self-executing cryptographic agreements among counterparties. To do so, we're using a lot of techniques in blockchain technology, as well as those from traditional distributed systems, programming language design and cryptography. We are hiring for a number of roles, from entry-level to expert, including Haskell Backend Engineer, Database Engineer, Product Engineer, Site Reliability Engineer (SRE), Programming Language Engineer and SecOps Engineer. To find out more, just e-mail us your resume

  • Need excellent people? Advertise your job here! 

Fun and Informative Events

  • On-demand Webinar. Fast & Frictionless - The Decision Engine for Seamless Digital Business. In this session, guest speakers Michele Goetz, Principal Analyst at Forrester Research and Matthias Baumhof, VP Worldwide Engineering at ThreatMetrix, discuss: How risk-based authentication leveraging digital identities is key to empowering customer transactions; How real-time customer trust decisions can reduce fraud and improve customer satisfaction; How a high performance Hybrid Memory Architecture (HMA) database helps continuously evaluate across a multitude of factors to drive decisioning at the lowest operational cost. View now

  • Advertise your event here!

Cool Products and Services

  • .NET developers dealing with Errors in Production: You know the pain of troubleshooting errors with limited time, limited information, and limited tools. Managers want to know what’s wrong right away, users don’t want to provide log data, and you spend more time gathering information than you do fixing the problem. To fix all that, Loupe was built specifically as a .NET logging and monitoring solution. Loupe notifies you about any errors and tells you all the information you need to fix them. It tracks performance metrics, identifies which errors cause the greatest impact, and pinpoints the root causes. Learn more and try it free today.

  • Enterprise-Grade Database Architecture. The speed and enormous scale of today’s real-time, mission critical applications has exposed gaps in legacy database technologies. Read Building Enterprise-Grade Database Architecture for Mission-Critical, Real-Time Applications to learn: Challenges of supporting digital business applications or Systems of Engagement; Shortcomings of conventional databases; The emergence of enterprise-grade NoSQL databases; Use cases in financial services, AdTech, e-Commerce, online gaming & betting, payments & fraud, and telco; How Aerospike’s NoSQL database solution provides predictable performance, high availability and low total cost of ownership (TCO)

  • The Practical Guide to Managing Data Science at Scale. The ability to manage, scale, and accelerate an entire data science discipline increasingly separates successful organizations from those falling victim to hype and disillusionment. Download this practical guide for data science management, if you're currently, or aspiring to be, a data science manager. The paper demystifies and elevates the current state of data science management.

  • Etleap is a Redshift ETL tool that lets you bring all the data everyone wants into Redshift. It's easy enough for analysts to add and manage data connections on their own, without inundating IT/Engineering with requests for help. It takes just minutes to add new connections such as MySQL, Salesforce, S3, and many others, then you can "set it and forget it." Learn more about Redshift ETL with Etleap.

  • www.site24x7.com : Monitor End User Experience from a global monitoring network. 

  • Build, scale and personalize your news feeds and activity streams with getstream.io. Try the API now in this 5 minute interactive tutorial. Stream is free up to 3 million feed updates so it's easy to get started. Client libraries are available for Node, Ruby, Python, PHP, Go, Java and .NET. Stream is currently also hiring Devops and Python/Go developers in Amsterdam. More than 400 companies rely on Stream for their production feed infrastructure, this includes apps with 30 million users. With your help we'd like to ad a few zeros to that number. Check out the job opening on AngelList.

  • Scalyr is a lightning-fast log management and operational data platform.  It's a tool (actually, multiple tools) that your entire team will love.  Get visibility into your production issues without juggling multiple tabs and different services -- all of your logs, server metrics and alerts are in your browser and at your fingertips. .  Loved and used by teams at Codecademy, ReturnPath, Grab, and InsideSales. Learn more today or see why Scalyr is a great alternative to Splunk.

  • VividCortex is a SaaS database monitoring product that provides the best way for organizations to improve their database performance, efficiency, and uptime. Currently supporting MySQL, PostgreSQL, Redis, MongoDB, and Amazon Aurora database types, it's a secure, cloud-hosted platform that eliminates businesses' most critical visibility gap. VividCortex uses patented algorithms to analyze and surface relevant insights, so users can proactively fix future performance problems before they impact customers.

  • MemSQL envisions a world of adaptable databases and flexible data workloads - your data anywhere in real time. Today, global enterprises use MemSQL as a real-time data warehouse to cost-effectively ingest data and produce industry-leading time to insight. MemSQL works in any cloud, on-premises, or as a managed service. Start a free 30 day trial here: memsql.com/download/.

  • Advertise your product or service here!

If you are interested in a sponsored post for an event, job, or product, please contact us for more information.

Click to read more ...

Friday
Nov172017

Stuff The Internet Says On Scalability For November 17th, 2017

Hey, it's HighScalability time: 


The BOSS Great Wall. The largest structure yet found in the universe. Contains 830 galaxies. A billion light years across. 10,000 times the mass of the Milky Way.

 

If you like this sort of Stuff then please support me on Patreon. And there's my new book, Explain the Cloud Like I'm 10, for complete cloud newbies. 


  • $25 billion: Alibaba's Singles' Day sales; 6+ million: Slack daily active users; 4ms: boot time for a unikernel based VM; 1 billion: out of date Android devices; 10-20%: increase in RAM prices; 8 million: lines of code in F-35; $3 million: lost by Isaac Newton in the stock market; 30: it's RAID's birthday!; thousands: bugs fixed with Pentagon hackathon; 6+ terabytes: earth satellite data downloaded per day; 

  • Quotable Quotes:
    • Berners-Lee: When I invented the web, I didn’t have to ask Vint Cerf [the ‘father of the internet’] for permission to use the internet
    • Germaine de Stael: Ridicule dries up the imagination.
    • Alex Hudson: A lot of technical write-ups focus on scaling, performance and large-scale systems. It’s definitely interesting to see what problems Netflix have, and how they respond to them. It’s important to understand why Google take decisions in the way they do. However, most of their problems don’t apply to anyone else, and therefore many of the solutions may or may not be appropriate.
    • @jpetazzo: Step functions: they're great, but they don't support dynamic fan out (i.e. invoking an arbitrary number of "sub-lambdas" in parallel).
    • parasubvert: Perhaps one of the lessons of architecture that is missing is to teach people how to evaluate tradeoffs, or in other words, “taste”. I don’t think we’ve ever really had good taste as an industry. Buzzword bingo has always ruled, with some exceptions.
    • Calvin Biesecker: The cost to change one line of code on a piece of avionics equipment is $1 million, and it takes a year to implement. For Southwest Airlines, whose fleet is based on Boeing’s 737, it would “bankrupt” them if a cyber vulnerability was specific to systems on board 737s, he said, adding that other airlines that fly 737s would also see their earnings hurt.
    • @QConSF: @natekupp shares some of Thumbtack's learnings on their journey to scale: from a PHP/PostgreSQL monolith with a self-managed Hadoop cluster, to Dockerized #microservices paired with managed/serverless data infrastructure #qconsf
    • Bail Bloc: Mine Monero, waste electricity, generate CO2 and send less money to a charity than you could have just sent directly! What’s not to like?
    • @Xof: The notion that the only way to be a good programmer is to let it consume your life is toxic.
    • @swardley: AMZN is now worth an IBM + Oracle + CISCO and you'd still have enough  change left over to buy most of VMware. Not bad for a decade of growth.
    • @swardley: I've been a bit gobsmacked by who is using Lambda recently ... there was me thinking that big / traditional enterprise would be testing the waters slowly. How wrong.
    • @crichardson: If GoLang becomes #1 it will primarily due to fashion rather fitness for purpose. It's far too low level/lacking in expressiveness for many kinds of applications. Eg. Enterprise/business applications. It is not what Java's successor should be.
    • Dropbox: IPv6 does show slightly better performance over IPv4. However, without detailed client-side and network information, it is hard to say definitely where the IPv6 performance gain is from.
    • Stack Overflow: Two tags stand out in this analysis, both with tremendous growth, and they have something in common. Swift is Apple’s language for developing iOS apps that is a successor to Objective-C, and the angular tag
    • @Falkvinge: I've said it before and I'm saying it again and again: in order to beat old-world banking, crypto must be at least an order of magnitude better. Old-world banking offers free instant tx between private accts, and 15-cent txs to merchant accounts. Beat that or be obsoleted.
    • Alex Hudson: I want to hear more about projects that deferred decisions and put off architecting until much later in the process. I want to hear more about delivery at real speed. Small pieces of software that are not necessarily interesting but deliver business value are the real heroes in our industry, and the developers who create them the real stars. I especially want to hear more about developers working with systems that have constraints. I want to hear from people pushing standard stuff beyond its limits. I think we grossly underestimate what off-the-shelf systems can do, and grossly overestimate the capabilities of the things we develop ourselves. It’s time to talk much more about real-world, practical, medium-enterprise software architecture.
    • David Gerard: BTC is very clogged at the moment, with around 100,000 unconfirmed transactions as I write this, and peaks of 160,000 a few days ago. Transaction fees peaked at around $20 just to get your transaction through. This wasn’t helped by long delays between blocks, as mining capacity moved to BCH — the time between blocks peaking at 63 minutes a few days ago, on 11 November. fork.lol, which charts the relative profitability of the two, was overloaded and inaccessible. If shutdowns of mining progress in China, then whoever remains in mining will become the power. This is currently divided between Iceland, India, Japan, Georgia and the Czech Republic.
    • linkmotif: There’s a very common ethos that if people just focused on shipping they would somehow magically ship but that’s not how software works. You can’t just will shipping. You need to know what you’re doing.
    • sp527: I had a serious epiphany when I read that Braintree managed to vertically scale a two node (“HA”) Postgres setup to transaction volume in the millions and a massive valuation. Stack Overflow has had a similarly lean footprint for much of its history.
    • @ben11kehoe: This graphic from @googlecloud App Engine is nonsense. GAE literally makes you select instance sizes
    • @danielbryantuk: "Any change made to a complex adaptive system is a gamble. We mitigate risks, but we can't eliminate them" @relix42 #qconsf
    • ivanstepin: Flickr implemented Lanczos algorithm while Discord uses near-neighbor ( much less resource-consuming, but with slightly less quality ) algo. It may turn out that the gpu mem<->cpu mem data transfer can eat all the benefits for such simple algo as near-neighbor scaling.
    • There's more. Lots more.

Don't miss all that the Internet has to say on Scalability, click below and become eventually consistent with all scalability knowledge (which means this post has many more items to read so please keep on reading)...

Click to read more ...

Monday
Nov132017

Cassandra NoSQL Data Model Design 

We at Instaclustr recently published a blog post on the most common data modelling mistakes that we see with Cassandra. This post was very popular and led me to think about what advice we could provide on how to approach designing your Cassandra data model so as to come up with a quality design that avoids the traps.

There are a number of good articles around that with rules and patterns to fit your data model into: 6 Step Guide to Apache Cassandra Data Modelling and Data Modelling Recommended Practices.

However, we haven’t found a step by step guide to analysing your data to determine how to fit in these rules and patterns. This white paper is a quick attempt at filling that gap.

Phase 1: Understand the data

This phase has two distinct steps that are both designed to gain a good understanding of the data that you are modelling and the access patterns required.

Define the data domain

The first step is to get a good understanding of your data domain. As someone very familiar with relation data modelling, I tend to sketch (or at least think) ER diagrams to understand the entities, their keys and relationships. However, if you’re familiar with another notation then it would likely work just as well. The key things you need to understand at a logical level are:

• What are the entities (or objects) in your data model?
• What are the primary key attributes of the entities?
• What are the relationships between the entities (i.e. references from one to the other)?
• What is the relative cardinality of the relationships (i.e. if you have a one to many is it one to 10 or one to 10,000 on average)?

Basically, these are the same things you’d expect in from logical ER model (although we probably don’t need a complete picture of all the attributes) along with a complete understanding of the cardinality of relationships that you’d normally need for a relational model. An understanding of the demographics of key attributes (cardinality, distribution) will also be useful in finalising your Cassandra model. Also, understand which key attributes are fixed and which change over the life of a record.

Define the required access patterns

Click to read more ...

Friday
Nov102017

Stuff The Internet Says On Scalability For November 10th, 2017

Hey, it's HighScalability time: 


Ah, the good old days. This is how the FBI stored finger prints in 1944. (Alex Wellerstein). How much data? Estimates range from 30GB to 2TB.

 

If you like this sort of Stuff then please support me on Patreon. Also, there's my new book, Explain the Cloud Like I'm 10, for complete cloud newbies. 


  • 1 million: times we touch our phones per year; 13 million: lines of Javascript @ Facebook; 256K: RAM needed for TensorFlow on a microcontroller; 2,502%: increase in the sale of ransomware on the dark web; 800 million: monthly Instagram users; 40%: VMs in Azure run Linux; 40%: improved GCP network latency from new SDN stack; 50%: fat content of a woolly mammoth; 

  • Quotable Quotes:
    • Sean Parker: And that means that we [Facebook] need to sort of give you a little dopamine hit every once in a while, because someone liked or commented on a photo or a post or whatever. And that's going to get you to contribute more content, and that's going to get you ... more likes and comments
    • David Gerard: I spent yesterday afternoon on Twitter and /r/buttcoin, giggling. It was a popcorn overload moment for every acerbic cryptocurrency sceptic who ever thought that immutable, unfixable smart contracts were an obviously stupid idea that would continue to end in tears and massive losses, as they so often had previously.
    • @jessfraz: I remember now why I put everything into containers in the first place, it's because all software is 💩
    • Amin Vahdat: What we have found running our applications at Google is that latency is as important, or more important, for our applications than relative bandwidth. It is not just latency, but predictable latency at the tail of the distribution. If you have a hundred or a thousand applications talking to one another on some larger task, they are chatty with one another, exchanging small messages, and what they care about is making a request and getting a response back quickly, and doing so across what might a thousand parallel requests.
    • @SteveBellovin: Why anyone with any significant programming experience--and hence experience with bugs--every liked smart contracts is a mystery to me.
    • Neha Bagri: Startups worship the young. But research shows people are most innovative when they’re older
    • @manisha72617183: OH: I no longer tolerate complicated programming languages. My mental space is like Silicon Valley; rent is high and space is at a premium
    • @atoonk: On days like today, we're yet again reminded that the Internet is held together with duct tape.. #rockSolid #BGP #comcast #outage
    • @bradfitz: 0 days since last high impact bug in an experimental programming language on the Ethereum VM affecting millions of dollars.
    • TheScientist: The genetic, molecular, and morphological diversity of the brain leads to a functional diversification that is likely necessary for the higher-order cognitive processes that are unique to humans.
    • Woods' Theorem: As the complexity of a system increases, the accuracy of any single agent's own model of that system decreases rapidly.
    • Carlos E. Perez: The brain performs compensation when it encounters something it does not expect. It learns how to correct itself through perturbative methods. That’s what Deep Learning systems also do, and it’s got nothing to do with calculating probabilities. It’s just a whole bunch of “infinitesimal” incremental adjustments.
    • @erickschonfeld: “What can one expect of a few wretched wires?”—telegraph skeptic, 1841
    • @ErikVoorhees: The average Bitcoin transaction fee ($10.17) is now more than twice the cost of Bitcoin itself when I first learned of it ($5) in 2011 :(
    • LightShadow: StackOverflow should be one of the first internet companies to accept cryptocurrency micro payments. All they'd have to do is skim a small percentage from people tipping each other pennies for good answers
    • @lworonowicz: I feel like I killed a family dog - had to decommission an old #Solaris server with uptime of 6519 days.
    • Google: Andromeda 2.1 latency improvements come from a form of hypervisor bypass that builds on virtio, the Linux paravirtualization standard for device drivers. Andromeda 2.1 enhancements enable the Compute Engine guest VM and the Andromeda software switch to communicate directly via shared memory network queues, bypassing the hypervisor completely for performance-sensitive per-packet operations.
    • iAfrikan News: The first-ever fiber optic cable with a route between the U.S. And India via Brazil and South Africa will soon be a reality. This is according to a joint provisioning agreement entered into by Seaborn Networks ("Seaborn") and IOX Cable Ltd ("IOX").
    • @iamdevloper: 1969: -what're you doing with that 2KB of RAM? -sending people to the moon 2017: -what're you doing with that 1.5GB of RAM? -running Slack
    • Eric Schmidt: Bob Taylor invented almost everything in one form or another that we use today in the office and at home.
    • @ben11kehoe: I am so on board with CRDT-based data stores providing state to FaaS at the edge. 
    • VMG: “Code is Law” fails again.
    • Paul Frazee: In Bitcoin, acceptance of a change is signaled by the miners - once some percent of the miners agree, the change is accepted. This means that hashing power is used as a measure of voting power, and so the political system is essentially plutocratic. How is that significantly better than the board of a publicly traded company?
    • gtrubetskoy: Professor Tanenbaum is one of the most respected computer scientists alive, and for Intel to include Minix in their chip and not let him know is kind of unprofessional and not very nice to say the least. That is his only (and quite fair) point.
    • jsolson: Both approaches have tradeoffs, although I think even with ENA AWS hits ~70µs typical round-trip-times while GCE gets down to ~40µs. Amazon's largest VMs in some families do advertise higher bandwidth than GCE does currently.
    • @brendangregg: AWS put lots of work into optimizing Xen, including net & disk SR-IOV (direct metal access). But their new optimized KVM is even better.
    • @wheremattisat: “Facebook and Google are proto-AIs and we are their microbiome. The objective function of those AIs today is to make more money” @timoreilly
    • @ossia: "Weeks of programming can save you hours of planning." - Anonymous
    • @sallamar: For kicks, we run over 6.2 billion requests a month on lambda (450% yoy) at @ExpediaEng. Still cheaper than renting an apartment for a year.
    • @Joab_Jackson: At this point,IBM #openwhisk is the most viable #open source #serverless platform—@ryan_sb @thecloudcastnet #podcast
    • Polvi: I think PaaS is dead. That's why you see OpenShift and Cloud Foundry and everyone pivoting to Kubernetes. What's going to happen is PaaS will be reborn as serverless on the other side of the Kubernetes transition.
    • mmgutz: We're running our Debian farm on Azure thanks to startup perks. It's been up 100% for us the last 2.5 years. Azure service is no less or better than AWS.
    • zzzeek: I switch between multiple versions of MySQL and MariaDB all day long. If you aren't using specific things like MySQL's JSON type or NDB storage engine or expecting CHECK constraints to enforce on MySQL (oddly omitted from this feature comparison!), there is nothing different at all from a developer point of view, beyond the default values of flags which honestly change more between MySQL releases than anything else.
    • SEJeff: They [Azure] allow you to have native RDMA[1] for your VMs, something neither amazon or google will give you. As an oldhat Linux/Unix guy, it is somewhat amusing to think of Microsoft's cloud offering as the high perf one, but the facts don't lie. If you have true HPC style workloads such as bioinformatics, oil/natgas exploration, finance, etc, the extra node to node communication bits are necessary. The QDR fabric they have has a native speed of 40 Gbps. It is a shame they don't have FDR (56G) or EDR (100G), but still is quite impressive depending on your app. This also could be a game changer for large MPI jobs.
    • johnnycarcin: I've honestly yet to see a customer moving to Azure who has more than 50% Windows based systems. Almost everyone I've worked with only uses Windows Server for their SQL Server services, outside of that it's RHEL, CentOS or Ubuntu.
    • lurchedsawyer: So to answer your question as to what is needed for Azure to become a viable alternative to AWS: I would say about 10 years.
    • @mjpt777: If Google thinks latency trumps bandwidth then they should look to software before hardware for the main source of latency.
    • Ben Kehoe: Like so many things in life, serverless is not an all-or-nothing proposition. It’s a spectrum — and more than that, it has multiple dimensions along which the degree of serverlessness can vary
    • zzzeek: If I was doing brand new development somewhere I'm sure I'd use Postgresql, since from a developer point of view it's the most consistent and flexible. While for the last few years I've worked way more with MySQL / MariaDB and at the moment the MySQL side of things is a bit more familiar to me, I still appreciate PG's vastly superior query planner and index features.
    • There's more. Much more. Click through for more. More. More. More.

Don't miss all that the Internet has to say on Scalability, click below and become eventually consistent with all scalability knowledge (which means this post has many more items to read so please keep on reading)...

Click to read more ...