advertise
Monday
Apr092018

Give Meaning to 100 billion Events a Day - The Analytics Pipeline at Teads

This is a guest post by Alban Perillat-Merceroz, Software Engineer at Teads.tv.

In this article, we describe how we orchestrate Kafka, Dataflow and BigQuery together to ingest and transform a large stream of events. When adding scale and latency constraints, reconciling and reordering them becomes a challenge, here is how we tackle it.


Teads for Publisher, one of the webapps powered by Analytics

 

In digital advertising, day-to-day operations generate a lot of events we need to track in order to transparently report campaign’s performances. These events come from:

  • Users’ interactions with the ads, sent by the browser. These events are called tracking events and can be standard (start, complete, pause, resume, etc.) or custom events coming from interactive creatives built with Teads Studio. We receive about 10 billion tracking events a day.
  • Events coming from our back-ends, regarding ad auctions’ details for the most part (real-time bidding processes). We generate more than 60 billion of these events daily, before sampling, and should double this number in 2018.

In the article we focus on tracking events as they are on the most critical path of our business.

Simplified overview of our technical context with the two main event sources

 

Tracking events are sent by the browser over HTTP to a dedicated component that, amongst other things, enqueues them in a Kafka topic. Analytics is one of the consumers of these events (more on that below).

We have an Analytics team whose mission is to take care of these events and is defined as follows:

We ingest the growing amount of logs,
We transform them into business-oriented data,
Which we serve efficiently and tailored for each audience.

To fulfill this mission, we build and maintain a set of processing tools and pipelines. Due to the organic growth of the company and new products requirements, we regularly challenge our architecture.

Why we moved to BigQuery

Click to read more ...

Friday
Apr062018

Stuff The Internet Says On Scalability For April 6th, 2018

Hey, it's HighScalability time:

 

Programmable biology - engineered cells execute programmable multicellular full-adder logics. (Programmable full-adder computations)

If you like this sort of Stuff then please support me on Patreon. And I'd appreciate if you would recommend my new book—Explain the Cloud Like I'm 10—to anyone who needs to understand the cloud (who doesn't?). I think they'll learn a lot, even if they're already familiar with the basics. 

  • $1: AI turning MacBook into a touchscreen; $2000/month: BMW goes subscription; 20MPH: 15′ Tall, 8000 Pound Mech Suit; 1,511,484 terawatt hours: energy use if bitcoin becomes world currency; $1 billion: Fin7 hacking group; 1.5 million: ethereum TPS, sort of; 235x: AWK faster than Hadoop cluster; 37%: websites use a vulnerable Javascript library; $0.01: S3, 1 Gig, 1 AZ; 

  • Quotable Quotes:
    • Huang’s Law~ GPU technology advances 5x per year because the whole stack can be optimized. 
    • caseysoftware: Metcalfe lives here in Austin and is involved in the local startup community in a variety of ways.  One time I asked him how he came up with the law and he said something close to: "It's simple! I was selling network cards! If I could convince them it was more valuable to buy more, they'd buy more!" As an EE who studied networks, etc in college, it was jarring but audacious and impressive.  He was either BSing all of us ~40 years ago or in that conversation a few years ago.. but either way, he helped make our industry happen.
    • Adaptive nodes: the consensus that the learning process is attributed solely to the synapses is questioned. A new type of experiments strongly indicates that a faster and enhanced learning process occurs in the neuronal dendrites, similarly to what is currently attributed to the synapses
    • @dwmal1: Spotted this paper via @fanf. The idea is amazing: "We offer a new metric for big data platforms, COST, or the Configuration that Outperforms a Single Thread", and find that several frameworks fail to beet a single core even when given 128 cores. 
    • David Rosenthal: But none of this addresses the main reason that flash will take a very long time to displace hard disk from the bulk storage layer. The huge investment in new fabs that would be needed to manufacture the exabytes currently shipped by hard disk factories, as shown in the graph from Aaron Rakers. This investment would be especially hard to justify because flash as a technology is close to the physical limits, so the time over which the investment would have to show a return is short.
    • @asymco: There are 400 million registered bike-sharing users and 23 million shared bikes in China. There were approximately zero of either in 2016. Fastest adoption curve I’ve ever seen (and I’ve seen 140).
    • @anildash: Google’s decision to kill Google Reader was a turning point in enabling media to be manipulated by misinformation campaigns. The difference between individuals choosing the feeds they read & companies doing it for you affects all other forms of media.
    • The Memory Guy: has recently been told that memory makers’ research teams have found a way to simplify 3D NAND layer count increases.
    • @JohnONolan: First: We seem to be approaching (some would argue, long surpassed) Slack-team-saturation. It’s just not new or shiny anymore, and where Slack was the “omg this is so much better than before” option just a few years ago — it has now become the “ugh… not another one” thing
    • Memory Guy: The industry has  moved a very long way over the last 40 years, but I need not mention this to anyone who’s involved in semiconductors.  In 1978 a Silicon Valley home cost about $100,000, or about the cost of a gigabyte of DRAM.  Today, 40 years later, the average Silicon Valley home costs about $1 million and a gigabyte of DRAM costs about $7.
    • CockroachDB: A three-node, fully-replicated, and multi-active CockroachDB 2.0 cluster achieves a maximum throughput of 16,150 tpmC on a TPC-C dataset. This is a 62% improvement over our 1.1 release. Additionally, the latencies on 2.0 dropped by up to 81% compared to 1.1, using the same workload parameters. That means that our response time improved by 544%.
    • @Carnage4Life: Interesting thread about moving an 11,000 user community from Slack to Discourse. It's now quite clear that Slack is slightly better than email for small groups but is actually worse than the alternatives for large groups
    • Paul Barham: You can have a second computer once you’ve shown you know how to use the first one.
    • Nate Kupp~ petabyte hadoop cluster Apple uses to understand battery life on iphone and ipad looking at logging data coming off those devices.
    • More quotes. More stuff. Go get it.

Don't miss all that the Internet has to say on Scalability, click below and become eventually consistent with all scalability knowledge (which means this post has many more items to read so please keep on reading)...

Click to read more ...

Thursday
Apr052018

Do you have too many microservices? - Five Design Attributes that can Help

This is a guest Post by Jake Lumetta, Founder and CEO, ButterCMS, an API-first CMS. For more content like this, follow @ButterCMS on Twitter and subscribe to our blog.

Are your microservices too small or too tightly coupled? Are you confident in your decision-making about service boundaries? In interviews with dozens of experienced CTOs, they offered design attributes that they consider when creating a set of microservices. This article distills that wisdom into five key principles to help you better design microservices.

The importance of microservice boundaries

The design attributes discussed below matter because reaping the benefits of microservices requires designing thoughtful microservice boundaries.

One of the major challenges when it comes to creating a new system with a microservice architecture. It came about when I mentioned that one of the core benefits of developing new systems with microservices is that the architecture allows developers to build and modify individual components independently — but problems can arise when it comes to minimizing the number of callbacks between each API. The solution according to McFadden, is to apply the appropriate service boundaries.

But in contrast to the sometimes difficult-to-grasp and abstract concept of domain driven design (DDD) —  a framework for microservices — I’ll be as practical as I can in this chapter as I discuss the need for well defined microservice boundaries with some of our industry’s tops CTOs.

First, avoid arbitrary rules

Click to read more ...

Monday
Apr022018

How ipdata serves 25M API calls from 10 infinitely scalable global endpoints for $150 a month

This is a guest post by Jonathan Kosgei, founder of ipdata, an IP Geolocation API. 

I woke up on Black Friday last year to a barrage of emails from users reporting 503 errors from the ipdata API.

Our users typically call our API on each page request on their websites to geolocate their users and localize their content. So this particular failure was directly impacting our users’ websites on the biggest sales day of the year. 

I only lost one user that day but I came close to losing many more.

This sequence of events and their inexplicable nature — cpu, mem and i/o were nowhere near capacity. As well as concerns on how well (if at all) we would scale, given our outage, were a big wake up call to rethink our existing infrastructure.

Our Tech stack at the time

Click to read more ...

Friday
Mar302018

Stuff The Internet Says On Scalability For March 30th, 2018

Hey, it's HighScalability time:

 

Objective painting is not good painting unless it is good in the abstract sense. A hill or tree cannot make a good painting just because it is a hill or tree. It is lines and colors put together so that they may say something.” – Georgia O’Keeffe

 

If you like this sort of Stuff then please support me on Patreon. And I'd appreciate if you would recommend my new book—Explain the Cloud Like I'm 10—to anyone who needs to understand the cloud (who doesn't?). I think they'll learn a lot, even if they're already familiar with the basics.

 

  • 6,000: new viri spotted by AI; 300,000: Uber requests per second; 10TB & 600 years: new next-gen optical disk; 32,000: sites running Coinhive’s JavaScript miner code; $1 billion: Uber loss per quarter; 3.5%: global NAND flash output lost to power outage; 100TB: new SSD; 48TB: RAM on one server; 200 million: Telegram monthly active users; 2,000: days Curiosity Rover on Mars; 225: emerging trends; 4,425: SpaceX satellites approved; 

  • Quotable Quotes:
    • @msuriar: Uber's worst outage ever: - Master log replication to S3 failed. - Logs backup up on primary. - Alerts fire but are ignored. - Disk full on primary. - Engineer deletes unarchived WAL files. - Config error prevents failover/promotion. #SREcon
    • @thecomp1ler: Most powerful Xeon is the 28 core Platinum 2180 at $10k RSP and >200W TDP. Due to the nature of Intel turbo boost it almost always operates at TDP under load. Our ARM is the Centriq 2452 at less than $1400 RSP, 46 cores and 120W TDP, that it never hits. Beats Xeon 9/10 workloads.
    • Arjun Narayan: This is also finally the year when people start to wake up and realize they care about serializability, because Jeff Dean said its important. Michael Stonebraker, meanwhile, has been shouting for about 20 years and is very frustrated that people apparently only care when The Very Important Googlers say its important.
    • @xleem: Simple Recipe: 1) Identify system boundaries 2) Define capabilities exposed 3) Plain english definitions of availabilty 4) Define technical SLO 5) measure baseline 6) set targets 7) iterate#SREcon
    • Jordan Ellenberg~ For seven years, a group of students from MIT exploited a loophole in the Massachusetts State Lottery’s Cash WinFall game to win drawing after drawing, eventually pocketing more that $3 million. How did they do it? How did they get away with it? And what does this all have to do with mathematical entities like finite geometries, variance of probability distributions, and error-correcting codes?
    • Jeff Dean: ML hardware is at its infancy. Even faster systems and wider deployment will lead to many more breakthroughs across a wide range of domains. Learning in the core of all of our computer systems will make them better/more adaptive. There are many opportunities for this.
    • Teller: Here's a compositional secret. It's so obvious and simple, you'll say to yourself, "This man is bullshitting me." I am not. This is one of the most fundamental things in all theatrical movie composition and yet magicians know nothing of it. Ready? Surprise me.
    • @kcoleman: OH (from an awesome Lyft driver): “Today has been great. I’ve been blessed by the algorithm.” Immediately had an eerie feeling that this could become an increasingly common way to describe a day.
    • Jim Whitehurst [Red Hat chief executive]: We added hundreds of customers in the last year, while Pivotal only added 44 new customers. Their average deal size is $1.5 million, quite large. So they are more the top-down, big company kind of focus. We have over 650 customers [for OpenShift], we added hundreds this past year, and we are growing faster than Pivotal. We thought we were are performing favorably compared to them, but this is the first time we had the data to really compare.
    • @danctheduck: My GDC takeaway: Everyone who is making games-as-a-service is getting most of their actual traction by building co-op MMOs. But very few of them realize this is what they are doing. So they keep sabotaging their communities with bizarro design philosophies. Short version (1/2) - People want to do fun activities with friends. The higher order bit. Super sticky. - Core gameplay collapses to this. Ex: Team vs Team plus match making is just another way of make 'fair' Team vs Environment (aka coop) - We focus a lot on PvP, competition or esports. But many of those are *aspirational* for players. Not actually desirable. - Or we fixate on single player genre tropes. Which may be a familiar reason to *start* playing, but aren't always key to why people *continue* playing.
    • @iamtrask: Lots of folks are optimistic about #blockchain. I recently came across a difficult question...If a zero-knowledge proof can prove to users that a centralized service performed honest computation, why decentralize it? We live in free markets... I see a correction coming...
    • Katherine Bourzac: Shanbhag thinks it’s time to switch to a design that’s better suited for today’s data-intensive tasks. In February, at the International Solid-State Circuits Conference (ISSCC), in San Francisco, he and others made their case for a new architecture that brings computing and memory closer together. The idea is not to replace the processor altogether but to add new functions to the memory that will make devices smarter without requiring more power. Industry must adopt such designs, these engineers believe, in order to bring artificial intelligence out of the cloud and into consumer electronics.
    • @dylanbeattie: When npm was first released in 2010, the release cycle for typical nodeJS package was 4 months, and npm restore took 15-30 seconds on an average project. By early 2018, the average release cycle for a JS package was 11 days, and the average npm restore step took 3-4 minutes. 1/11
    • @davemark: THREAD:  J.C.R. Licklider was one of the true pioneers of computer science. Back in about 1953, Licklider built something called a Watermelon Box. If it heard the word watermelon, it would light up an LED. That’s all it did. But it was the start of a huge wave. //@reneritchie
    • smudgymcscmudge: I have to admit that the switch from “free software” to “open source” worked on me. Early in my career I was intrigued by the idea, but couldn’t get past how “free” software was a sustainable model. I started to get it at around the same time the terminology changed.
    • @msuriar: CPU attack: spin up something that burns 100% CPU. (openssl or something). What do you expect to happen? What actually happens? #SREcon
    • Forrest Brazeal: The way I describe it is: functions as a service are cloud glue. So if I’m building a model airplane, well, the glue is a necessary part of that process, but it’s not the important part. Nobody looks at your model airplane and says: “Wow, that’s amazing glue you have there.” It’s all about how you craft something that works with all these parts together, and FaaS enables that.
    • You want more quotes? There are lots more. Can you handle the truth? Click through and test yourself.

Don't miss all that the Internet has to say on Scalability, click below and become eventually consistent with all scalability knowledge (which means this post has many more items to read so please keep on reading)...

Click to read more ...

Tuesday
Mar272018

Sponsored Post: Educative, Clover, Triplebyte, Exoscale, Symbiont, Loupe, Etleap, Aerospike, Scalyr, Domino Data Lab, MemSQL

Who's Hiring? 

  • Clover is looking for seasoned software engineers to help us solve the most complicated problem in the world: healthcare. We're using sophisticated data analytics, custom software, and machine learning to coordinate care and build a clearer model of our member's health and risk factors. We are on a mission to help seniors and low-income members live healthier while keeping costs down. This is an opportunity for those who want to be at the intersection of health and technology and thrive in a collaborative environment as well as the freedom of self-direction. If you're interested, please directly apply here!

  • Triplebyte now hires software engineers for top tech companies and hundreds of the most exciting startups like Apple, Dropbox, Mixpanel, and Instacart. They identify your strengths from an online coding quiz and let you skip resume and recruiter screens at multiple companies at once. It's free, confidential, and background-blind. Apply here.

  • Symbiont is a New York-based financial technology company building new kinds of computer networks to connect independent financial institutions together and allow them to share business logic and data in real time. This involves developing a distributed system which is also decentralized, and which allows for the creation of smart contracts, self-executing cryptographic agreements among counterparties. To do so, we're using a lot of techniques in blockchain technology, as well as those from traditional distributed systems, programming language design and cryptography. We are hiring for a number of roles, from entry-level to expert, including Haskell Backend Engineer, Database Engineer, Product Engineer, Site Reliability Engineer (SRE), Programming Language Engineer and SecOps Engineer. To find out more, just e-mail us your resume

  • Need excellent people? Advertise your job here! 

Fun and Informative Events

  • 5 Signs You’ve Outgrown DynamoDB. Companies often select a database that seems to be the best choice at first glance, as well as the path of least resistance, and then are subsequently surprised by cost overruns and technology limitations that quickly hinder productivity and put the business at risk. This seems to be the case with many enterprises that chose Amazon Web Service’s (AWS) DynamoDB. In this white paper we’ll cover elements of costing as well as the results of benchmark-based testing. Read 5 Signs You’ve Outgrown DynamoDB to determine if your organization has outgrown this technology.

  • Advertise your event here!

Cool Products and Services

  • For heads of IT/Engineering responsible for building an analytics infrastructure, Etleap is an ETL solution for creating perfect data pipelines from day one. Unlike older enterprise solutions, Etleap doesn’t require extensive engineering work to set up, maintain, and scale. It automates most ETL setup and maintenance work, and simplifies the rest into 10-minute tasks that analysts can own. Read stories from customers like Okta and PagerDuty, or try Etleap yourself.

  • Educative provides interactive courses for software engineering interviews created by engineers from Facebook, Microsoft, eBay, and Lyft. Prepare in programming languages like Java, Python, JavaScript, C++, and Ruby. Design systems like Uber, Netflix, Instagram and more. More than 10K software engineers have used Coderust and Grokking the System Design Interview to get jobs at top tech companies like Facebook, Google, Amazon, Microsoft, etc. Ace your software engineering interviews today. Get started now

  • Gartner’s 2018 Magic Quadrant for Data Science and Machine Learning Platforms. Read Gartner’s most recent 2018 release of the Magic Quadrant for Data Science and Machine Learning Platforms. A complimentary copy of this important research report into the data science platforms market is offered by Domino. Download the report to learn: 
    • How Gartner defines the Data Science Platform category, and their perspective on the evolution of the data science platform market in 2018. 
    • Which data science platform is right for your organization. 
    • Why Domino was named a Visionary in 2018.

  • Exoscale GPU Cloud Servers. Powerful on-demand GPU. Perfect for your machine learning, artificial, and encoding workloads. GPU instances work exactly like other instances: they are billed by the minute and integrate seamlessly with your existing infrastructure. Tap the GPU's full power with direct passthrough access. Speed-up Tensorflow or any other Deep Learning, Big Data, AI, or Encoding workload. Start your GPU instances via our API or with your existing deployment management tools. Add parallel computational power to your stack with no effort. Get Started

  • .NET developers dealing with Errors in Production: You know the pain of troubleshooting errors with limited time, limited information, and limited tools. Managers want to know what’s wrong right away, users don’t want to provide log data, and you spend more time gathering information than you do fixing the problem. To fix all that, Loupe was built specifically as a .NET logging and monitoring solution. Loupe notifies you about any errors and tells you all the information you need to fix them. It tracks performance metrics, identifies which errors cause the greatest impact, and pinpoints the root causes. Learn more and try it free today.

  • Enterprise-Grade Database Architecture. The speed and enormous scale of today’s real-time, mission critical applications has exposed gaps in legacy database technologies. Read Building Enterprise-Grade Database Architecture for Mission-Critical, Real-Time Applications to learn: Challenges of supporting digital business applications or Systems of Engagement; Shortcomings of conventional databases; The emergence of enterprise-grade NoSQL databases; Use cases in financial services, AdTech, e-Commerce, online gaming & betting, payments & fraud, and telco; How Aerospike’s NoSQL database solution provides predictable performance, high availability and low total cost of ownership (TCO)

  • Scalyr is a lightning-fast log management and operational data platform.  It's a tool (actually, multiple tools) that your entire team will love.  Get visibility into your production issues without juggling multiple tabs and different services -- all of your logs, server metrics and alerts are in your browser and at your fingertips. .  Loved and used by teams at Codecademy, ReturnPath, Grab, and InsideSales. Learn more today or see why Scalyr is a great alternative to Splunk.

  • MemSQL envisions a world of adaptable databases and flexible data workloads - your data anywhere in real time. Today, global enterprises use MemSQL as a real-time data warehouse to cost-effectively ingest data and produce industry-leading time to insight. MemSQL works in any cloud, on-premises, or as a managed service. Start a free 30 day trial here: memsql.com/download/.

  • Advertise your product or service here!

If you are interested in a sponsored post for an event, job, or product, please contact us for more information.


Scale your Job Search with Triplebyte

Triplebyte is unique because they're a team of engineers running their own centralized technical interview. The evaluation quality is so good that companies like Apple, Dropbox, Mixpanel, and Instacart now let every engineer Triplebyte recommends skip steps in the application process.

They give personal assistance to discover which roles you're most excited about, schedule your final interviews back-to-back, and help you negotiate with multiple companies at once.

Triplebyte now works with top tech companies and hundreds of the most exciting pre-screened startups.

It's free, confidential, and background-blind for engineers. Take Triplebyte's online coding quiz to see if they can help you scale your career faster. (Engineers with architecture and system design experience tend to do especially well.)


The Solution to Your Operational Diagnostics Woes

Scalyr gives you instant visibility of your production systems, helping you turn chaotic logs and system metrics into actionable data at interactive speeds. Don't be limited by the slow and narrow capabilities of traditional log monitoring tools. View and analyze all your logs and system metrics from multiple sources in one place. Get enterprise-grade functionality with sane pricing and insane performance. Learn more today


If you are interested in a sponsored post for an event, job, or product, please contact us for more information.

Friday
Mar162018

Stuff The Internet Says On Scalability For March 16th, 2018

Hey, it's HighScalability time:

 

Hermetic symbolism was an early kind of programming. Symbols explode into layers of other symbols, like a programming language, only the instruction set is the mind.

 

If you like this sort of Stuff then please support me on Patreon. And I'd appreciate if you would recommend my new book—Explain the Cloud Like I'm 10—to anyone who needs to understand the cloud (who doesn't?). I think they'll learn a lot, even if they're already familiar with the basics.

 

  • ~30: AWS services used by iRobot; 450,000: Shopify S3 operations per second; $240: yearly value of your data; ~day: time to load a terabyte from Postgres into BigQuery; 5 million: viewers for top Amazon Prime shows; 130,000: Airbusians move from Microsoft Office to Google Suite; trillion: rows per second processed by MemSQL; 38 million: Apple Music paid members; 4 million: Microsoft git commits for a Windows release; 

  • Quotable Quotes:
    • Stephen Hawking: Although I cannot move and I have to speak through a computer, in my mind I am free.
    • Roger Penrose: Despite [Stephen Hawking] terrible physical circumstance, he almost always remained positive about life. He enjoyed his work, the company of other scientists, the arts, the fruits of his fame, his travels. He took great pleasure in children, sometimes entertaining them by swivelling around in his motorised wheelchair. Social issues concerned him. He promoted scientific understanding. He could be generous and was very often witty. On occasion he could display something of the arrogance that is not uncommon among physicists working at the cutting edge, and he had an autocratic streak. Yet he could also show a true humility that is the mark of greatness.
    • Raymond Wong: The most revealing part of the report exposes how Apple didn't even have plans to integrate Siri into HomePod until after the Amazon Echo launched
    • @ajaynairthinks: When I tell people I am the founding PM on Lambda, the question I often get is how the idea for #AWS Lambda/#Serverless  came about in the first place. The truth is, its way too hard to point to one person/event as the defining moment.
    • @ImerM1: Just did some budget analysis. Turns out we've managed to reduce our AWS costs for RNA-seq by 90%, by using Lambda, Batch and Step Functions 
    • Daniel Lemire: My numbers are clear: in my tests, it is three times faster to sum up the values in a LinkedHashSet.
    • @Tr0llyTr0llFace: The Bitcoin network is processing 200,000 transactions per day at a cost of $3B per year. Visa is processing 150,000,000 transactions per day at a cost of $8B per year. Visa also does insurance, credit, customer support...With Bitcoin your funds are lost if you forget your PIN.
    • Linus Torvalds: It looks like the IT security world has hit a new low. 
    • Steve Jobs~ When you're the janitor, reasons matter. Somewhere between the janitor and the CEO, reasons stop mattering. That Rubicon is crossed when you become a VP.
    • @ByRosenberg: The San Jose Mercury News, Oakland Tribune, Contra Costa Times and their sister papers had 1,000 editorial employees in 2000. Now they're down to 100. Devastating to see Bay Area news coverage decimated in one of the world's most important places to cover 
    • @jimwebber: Amazed at the medical/genomics paper I've just reviewed where the whole thing was built in Neo4j and processed via Cypher. Our user community is extraordinary.
    • @maria_fibonacci: Mansplainings aside, I build things with k8s as my day job, and I build things with Elixir because it's my fav lang. Even if they're different types of software, the reasons people end up using them are basically the same, yet the additional layer of complexity matters.
    • @gwenshap: Things I learned at #StrataData last week. I only attended one session, but talked to 100+ attendees. So, it is the "hallway" view. 1. Machine learning is sexier than ever. Lots of talk. I got the impression that organizations built around ML can do it, but existing business still didn't make the leap. My bet is that in 2-3 years we'll start seeing the "early majority".
    • Jim Handy: In a nutshell Mark [Thirsk] is telling us that there may be some stress on DRAM and NAND flash wafer supplies, but the companies that will feel the greatest impact will be tier 2 chip makers who purchase the lowest cost wafers. 
    • michaelt: So I'm a mid-level manager at a company with a few hundred developers. The more developers you hire, the higher the chance you'll start collecting people with obscure (but usually reasonably easy to satisfy) tool preferences. You know, the guy who changes his IDE to emacs key bindings, the guy who does all his e-mail using mutt when everyone else uses gmail, the girl who uses a Dvorak keyboard layout, the guy who insists he works best on a 1024x768 screen, and so on. Having tried a variety of industry tools and thought about about how you work best is usually a good sign[1]. Their favourite tools aren't my favourite tools, but they work for me so if they're not happy, I'm not happy.
    • Paul Kunert: Airbus will organise information around “teams, topics and programmes” and “let people go to the information that they need for their jobs… almost the opposite from an environment that is based on email where you receive whatever it is that others decide you can receive.”
    • Geoff Huston: However, I also suspect that the intelligence agencies are already focussing elsewhere. If the network is no longer the rich vein of data that it used to be, then the data collected by content servers is a more than ample replacement. If the large content factories have collected such a rich profile of my activities, then it seems entirely logical that they will be placed under considerable pressure to selectively share that profile with others. So, I’m not optimistic that I have any greater level of personal privacy than I had before. Probably less. Meet the new boss. Same as the old boss.
    • @PaulDJohnston: Counter prediction: very few software engineers will need to know about k8s et al because #serverless. It's code that matters and orchestration is becoming a commodity. (Sounding like @swardley). Also ,zip is a more robust deployment artifact 
    • Joel Hruska: In all cases, the pirated version of the [Final Fantasy XV] was faster, by 5 percent to a whopping 33 percent, depending on the scene...The implications of these findings are straightforward: The piracy protections baked into the game are hitting overall performance, causing a significant set of issues. Companies regularly deny it happens, but tests like this punch holes in such claims. 
    • Mathieu Ripert: we [Instacart] found out that with quantile regression we were able to plan deliveries closer to their due time without increasing late percentage. This effect allowed us to explore more trip combinations in our fulfillment engine and therefore increase efficiency (one of our most important metric) by 4%.
    • @vijaypande: We argue that the future of predicting the interactions between a drug and its prospective target demands more than simply applying deep learning algorithms from other domains, like vision and natural language, to molecules. 
    • John Allspaw: The increasing significance of our systems, the increasing potential for economic, political, and human damage when they don’t work properly, the proliferation of dependencies and associated uncertainty — all make me very worried. And, if you look at your own system and its problems, I think you will agree that we need to do more than just acknowledge this — we need to embrace it.
    • Kevlin Henney: Move Slow and Mend Things
    • Read on for more quotes.

Don't miss all that the Internet has to say on Scalability, click below and become eventually consistent with all scalability knowledge (which means this post has many more items to read so please keep on reading)...

Click to read more ...

Tuesday
Mar132018

Sponsored Post: Clover, Triplebyte, Exoscale, Symbiont, Loupe, Etleap, Aerospike, Scalyr, Domino Data Lab, MemSQL

Who's Hiring? 

  • Clover is looking for seasoned software engineers to help us solve the most complicated problem in the world: healthcare. We're using sophisticated data analytics, custom software, and machine learning to coordinate care and build a clearer model of our member's health and risk factors. We are on a mission to help seniors and low-income members live healthier while keeping costs down. This is an opportunity for those who want to be at the intersection of health and technology and thrive in a collaborative environment as well as the freedom of self-direction. If you're interested, please directly apply here!

  • Triplebyte now hires software engineers for top tech companies and hundreds of the most exciting startups like Apple, Dropbox, Mixpanel, and Instacart. They identify your strengths from an online coding quiz and let you skip resume and recruiter screens at multiple companies at once. It's free, confidential, and background-blind. Apply here.

  • Symbiont is a New York-based financial technology company building new kinds of computer networks to connect independent financial institutions together and allow them to share business logic and data in real time. This involves developing a distributed system which is also decentralized, and which allows for the creation of smart contracts, self-executing cryptographic agreements among counterparties. To do so, we're using a lot of techniques in blockchain technology, as well as those from traditional distributed systems, programming language design and cryptography. We are hiring for a number of roles, from entry-level to expert, including Haskell Backend Engineer, Database Engineer, Product Engineer, Site Reliability Engineer (SRE), Programming Language Engineer and SecOps Engineer. To find out more, just e-mail us your resume

  • Need excellent people? Advertise your job here! 

Fun and Informative Events

  • 5 Signs You’ve Outgrown DynamoDB. Companies often select a database that seems to be the best choice at first glance, as well as the path of least resistance, and then are subsequently surprised by cost overruns and technology limitations that quickly hinder productivity and put the business at risk. This seems to be the case with many enterprises that chose Amazon Web Service’s (AWS) DynamoDB. In this white paper we’ll cover elements of costing as well as the results of benchmark-based testing. Read 5 Signs You’ve Outgrown DynamoDB to determine if your organization has outgrown this technology.

  • Advertise your event here!

Cool Products and Services

  • Educative provides interactive courses for software engineering interviews created by engineers from Facebook, Microsoft, eBay, and Lyft. Prepare in programming languages like Java, Python, JavaScript, C++, and Ruby. Design systems like Uber, Netflix, Instagram and more. More than 10K software engineers have used Coderust and Grokking the System Design Interview to get jobs at top tech companies like Facebook, Google, Amazon, Microsoft, etc. Ace your software engineering interviews today. Get started now

  • Gartner’s 2018 Magic Quadrant for Data Science and Machine Learning Platforms. Read Gartner’s most recent 2018 release of the Magic Quadrant for Data Science and Machine Learning Platforms. A complimentary copy of this important research report into the data science platforms market is offered by Domino. Download the report to learn: 
    • How Gartner defines the Data Science Platform category, and their perspective on the evolution of the data science platform market in 2018. 
    • Which data science platform is right for your organization. 
    • Why Domino was named a Visionary in 2018.

  • Exoscale GPU Cloud Servers. Powerful on-demand GPU. Perfect for your machine learning, artificial, and encoding workloads. GPU instances work exactly like other instances: they are billed by the minute and integrate seamlessly with your existing infrastructure. Tap the GPU's full power with direct passthrough access. Speed-up Tensorflow or any other Deep Learning, Big Data, AI, or Encoding workload. Start your GPU instances via our API or with your existing deployment management tools. Add parallel computational power to your stack with no effort. Get Started

  • .NET developers dealing with Errors in Production: You know the pain of troubleshooting errors with limited time, limited information, and limited tools. Managers want to know what’s wrong right away, users don’t want to provide log data, and you spend more time gathering information than you do fixing the problem. To fix all that, Loupe was built specifically as a .NET logging and monitoring solution. Loupe notifies you about any errors and tells you all the information you need to fix them. It tracks performance metrics, identifies which errors cause the greatest impact, and pinpoints the root causes. Learn more and try it free today.

  • Enterprise-Grade Database Architecture. The speed and enormous scale of today’s real-time, mission critical applications has exposed gaps in legacy database technologies. Read Building Enterprise-Grade Database Architecture for Mission-Critical, Real-Time Applications to learn: Challenges of supporting digital business applications or Systems of Engagement; Shortcomings of conventional databases; The emergence of enterprise-grade NoSQL databases; Use cases in financial services, AdTech, e-Commerce, online gaming & betting, payments & fraud, and telco; How Aerospike’s NoSQL database solution provides predictable performance, high availability and low total cost of ownership (TCO)

  • Etleap is a Redshift ETL tool that lets you bring all the data everyone wants into Redshift. It's easy enough for analysts to add and manage data connections on their own, without inundating IT/Engineering with requests for help. It takes just minutes to add new connections such as MySQL, Salesforce, S3, and many others, then you can "set it and forget it." Learn more about Redshift ETL with Etleap.

  • Scalyr is a lightning-fast log management and operational data platform.  It's a tool (actually, multiple tools) that your entire team will love.  Get visibility into your production issues without juggling multiple tabs and different services -- all of your logs, server metrics and alerts are in your browser and at your fingertips. .  Loved and used by teams at Codecademy, ReturnPath, Grab, and InsideSales. Learn more today or see why Scalyr is a great alternative to Splunk.

  • MemSQL envisions a world of adaptable databases and flexible data workloads - your data anywhere in real time. Today, global enterprises use MemSQL as a real-time data warehouse to cost-effectively ingest data and produce industry-leading time to insight. MemSQL works in any cloud, on-premises, or as a managed service. Start a free 30 day trial here: memsql.com/download/.

  • Advertise your product or service here!

If you are interested in a sponsored post for an event, job, or product, please contact us for more information.


Scale your Job Search with Triplebyte

Triplebyte is unique because they're a team of engineers running their own centralized technical interview. The evaluation quality is so good that companies like Apple, Dropbox, Mixpanel, and Instacart now let every engineer Triplebyte recommends skip steps in the application process.

They give personal assistance to discover which roles you're most excited about, schedule your final interviews back-to-back, and help you negotiate with multiple companies at once.

Triplebyte now works with top tech companies and hundreds of the most exciting pre-screened startups.

It's free, confidential, and background-blind for engineers. Take Triplebyte's online coding quiz to see if they can help you scale your career faster. (Engineers with architecture and system design experience tend to do especially well.)


The Solution to Your Operational Diagnostics Woes

Scalyr gives you instant visibility of your production systems, helping you turn chaotic logs and system metrics into actionable data at interactive speeds. Don't be limited by the slow and narrow capabilities of traditional log monitoring tools. View and analyze all your logs and system metrics from multiple sources in one place. Get enterprise-grade functionality with sane pricing and insane performance. Learn more today


If you are interested in a sponsored post for an event, job, or product, please contact us for more information.

Friday
Mar092018

Stuff The Internet Says On Scalability For March 9th, 2018

Hey, it's HighScalability time:

 

The largest simulation of the cosmos ever run finally produces a universe similar to our own. All it required was 24,000 processors, more than two months, and it produced 500 terabytes of data.

 

If you like this sort of Stuff then please support me on Patreon. And I'd appreciate if you would recommend my new book—Explain the Cloud Like I'm 10—to anyone who needs to understand the cloud (who doesn't?). I think they'll learn a lot, even if they're already familiar with the basics.

 

  • 72 bits: Google's new quantum computer; 50,000: sites infected with cryptocurrency mining malware; $40 billion: purchases via talking tubes by 2022; $12,000: value of 1 million YouTube views a month; $15 billion: Netflix 2018 revenue; 

  • Quotable Quotes:
    • @ValaAfshar: Jeff Bezos, CEO @amazon: I very frequently get the question: "what's going to change in he next 10 years?" I almost never get the question: "what's not going to change in the next 10 years?" I submit to you that the second question is actually the more important of the two.
    • @svscarpino: Sharing and consuming fake news is highly concentrated. 0.1% of Twitter accounts share 80% of the fake news and 1% share 100%!!!  Incredible results by @davidlazer and collaborators. #complenet18
    • Tim Wu: An unwelcome consequence of living in a world where everything is “easy” is that the only skill that matters is the ability to multitask. At the extreme, we don’t actually do anything; we only arrange what will be done, which is a flimsy basis for a life.
    • pauldjohnston: if the CGI execution model was horizontally scalable, fault tolerant, spread across availability zones, stable and with a managed infrastructure and a secure and certified API Gateway in front of it with DDoS protection built in... and someone else was looking after it for me... (Not to mention all the other stuff AWS provides) then yes it's exactly like cgi-bin was back in the day.
    • Abu Sebastian: Computational memory: A memory unit that performs certain computational tasks in place. 
    • @MIT_CSAIL: The world's first online transaction happened over 45 years ago between MIT & Stanford students - and it was for weed
    • @cloud_opinion: don't forget "good enough" often wins. Fargate will be seen by many as good enough. Will k8s continue to be adopted?. yes. But, Fargate will reduce TAM for k8s distro companies.
    • @kylewillett: After finally getting hands-on with #akka streams to implement a solution to what would normally be a tricky async problem, I can now agree with the praise I've heard - an awesome api and great tool to have in the toolbox.
    • @awscloud: Amazon Redshift uses machine learning to automatically hop short queries to an express queue for fast processing. 
    • @postwait: Look, I'm gonna be the last one to defend InfluxDB... but for f*ck's sake don't use units per day in computing.  Please, please, please.... use per second numbers so you don't mislead or attempt to look large.
    • @hypervisible: Researchers build AI to identify gang members. When asked about potential misuses, presenter (a computer scientist at Harvard) says "I'm just an engineer."
    • @Jason: 4/We should also allow folks to build apartments and homes with NO PARKING spaces BUT with carports capable of getting ridesharing cars and people off streets during drop off and pickup. Right now we force folks to build X spots per Y residents, which is dated.
    • antiviral: OK... so they [Facebook] are saying they don't use your microphone to target ads. But how about precisely enumerating how FB uses your microphone?
    • Yep, there are a lot more quotes. Go get 'em or forever live in darkness.

Don't miss all that the Internet has to say on Scalability, click below and become eventually consistent with all scalability knowledge (which means this post has many more items to read so please keep on reading)...

Click to read more ...

Friday
Mar022018

Stuff The Internet Says On Scalability For March 2nd, 2018

Hey, it's HighScalability time: 

 

Algorithms described like IKEA instructions. Can anyone assemble these? (Algorithms and data structures)

 

If you like this sort of Stuff then please support me on Patreon. And please consider recommending my new book—Explain the Cloud Like I'm 10—to whole entire world. 

 

  • $75 million: Dropbox saved moving out of S3; 159 million: Spotify monthly active users; 80 million: more records added to Have I Been Pwned; 9%: universe expanding faster than predicted; $2,222,279: Warren Buffett won his long bet against hedge fund mangers; 60,000: Mayan houses found in Guatemala using LiDAR; $14.2 billion: PaaS revenue; ~180 million: years until first sun after the big whatever it was; $1,599: cost of stolen Extended Validation (EV) certificate; 8,000X: query speedup using GPU database; 2.4 million: Google requests to be forgotten; 6 minutes: time to IoT device attack on the internet; 103 million: tweets sent about the Olympics; 320,000: increase in Chloe Kim's twitter followers; 150 kg: acorns stored by woodpeckers in a telecom antenna; 0.14ms: Fsync performance on Intel PC-3700; Q: earliest known article on Wikipedia; 800Gbps+: memcached reflection/amplification attacks; 2M+: Google-Landmarks image training set; 3 million: graphics cards purchased by cryptocurrency miners; 30%: Uber and Lyft drivers lose money; 

  • Quotable Quotes:
    • @mikko: Interesting point raised in a reddit thread: Satoshi's original bitcoins are now a quantum canary. Once we see them moving, we’ll know that someone has a functioning advanced quantum computer. It's just too big a prize not to be the first thing you’d do with a quantum computer.
    • @brettberson: I just learned from a former longtime Amazon employee, the idea for Prime came from an IC [individual contributor] engineer. He wrote up a 6 page memo. He was inspired by the Costco membership model. It was built as a test. It's now the key pillar of Amazon. The best ideas can come from anywhere.
    • Erica Klarreich: In a statistical analysis of nearly 1,000 networks drawn from biology, the social sciences, technology and other domains, researchers found that only about 4 percent of the networks (such as certain metabolic networks in cells) passed the paper’s strongest tests. And for 67 percent of the networks, including Facebook friendship networks, food webs and water distribution networks, the statistical tests rejected a power law as a plausible description of the network’s structure.
    • @kpkelleher: 531 ICOs that appeared in 2017 have already vanished. Together, they raised $233 million.
    • @darrenrovell: In November 2013, Jamie Siminoff came on Shark Tank valuing his WiFi enabled video doorbell at $7 million. Four sharks passed & @kevinolearytv offered his typical loan/royalty deal. Siminoff passed. That company became @ring & today sold for more than $1 billion to Amazon.
    • @tyleralove: As of today @bustle has fully adopted serverless. We’re down to 15 ec2 instances mostly comprised of self-managed HA Redis. We serve upwards of a billion requests to 80 million people using SSR preact and react a month. We are a thriving example of modern JavaScript at scale. We do all of this with a relatively tiny engineering team of 12 while simultaneously building compelling to use product that was never focused on social media audience gaming or egregious engagement metric hacking.
    • abetusk: I've heard, and agree with, that 95% of programming doesn't require any deep CS knowledge. The flip side of that is 5% of the time you will and for those 1/20 times you encounter a problem that requires theory, you're dead in the water unless you know how to identify it, how to solve it or where to look for solutions to it.
    • @geofft: At which point Trustico's CEO decided to EMAIL 23,000 CUSTOMER PRIVATE KEYS to Digicert, apparently in order to trigger that clause.
    • @swardley: The ONLY reason that Amazon is as big as it is today and continuing to grow rather than being constrained (as normally happens) is because competitor executives have utterly failed to adapt. This is not a market failure, it's a failure of executives ..
    • Andrei Barysevich: Contrary to a common belief that the security certificates circulating in the criminal underground are stolen from legitimate owners prior to being used in nefarious campaigns, we confirmed with a high degree of certainty that the certificates are created for a specific buyer per request only and are registered using stolen corporate identities, making traditional network security appliances less effective.
    • @KentonVarda: After I open sourced Protocol Buffers, the promo committee denied me for promotion (from Senior to Staff) because my packet contained no peer reviews from more-senior engineers who worked closely with me. (There were no such engineers.)
    • Jared Diamond: Why is there such widespread public opposition to science and scientific reasoning in the United States, the world leader in every major branch of science?
    • @sama: I wonder how much cryptocurrency is slowing the rate of AI progress by wildly driving up the price of GPUs...
    • So many more quotes. Don't miss out on all the smart stuff people have said. You will become so much smarter.

Don't miss all that the Internet has to say on Scalability, click below and become eventually consistent with all scalability knowledge (which means this post has many more items to read so please keep on reading)...

Click to read more ...