hot links

Stuff The Internet Says On Scalability For May 3rd, 2019

High Scalability

03 May 2019 — 22 min read

Wake up! It's HighScalability time:

Event horizon? Nope. It's a close up of a security hologram. Makes one think.

Do you like this sort of Stuff? I'd greatly appreciate your support on Patreon. I wrote Explain the Cloud Like I'm 10 for people who need to understand the cloud. And who doesn't these days? On Amazon it has 45 mostly 5 star reviews (105 on Goodreads). They'll learn a lot and hold you in awe.

Number Stuff:

$1 trillion: Microsoft is the most valuable company in the world (for now)
20%: global enterprises will have deployed serverless computing technologies by 2020
390 million: paid Apple subscriptions, revenue from the services business climbed from $9.9 billion to $11.5 billion, services now account for “one-third” of the company’s gross profits
1011: CubeStat missions
$326 billion: USA farm expenses in 2017
61%: increase in average cyber attack losses from $229,000 last year to $369,000 this, a figure exceeding $700,000 for large firms versus just $162,000 in 2018.
$550: can yield 20x profit on the sale of compromised login credentials

Quotable Stuff:

Robert Lightfoot~ Protecting against risk and being safe are not the same thing. Risk is just simply a calculation of likelihood and consequence. Would we have ever launched Apollo in the environment we’re in today? Would Buzz and Neil have been able to go to the moon in the risk posture we live in today? Would we have launched the first shuttle with a crew? We must move from risk management to risk leadership. From a risk management perspective, the safest place to be is on the ground. From a risk leadership perspective, I believe that’s the worst place this nation can be.
Paul Kunert: In dollar terms, Jeff Bezos's cloud services wing grew 41 per cent year on year to $7.6bn, figures from Canalys show. Microsoft was up 75 per cent to $3.4bn and Google grew a whopping 83 per cent to $2.3bn.
@codinghorror: 1999 "MIT - We estimate that the puzzle will require 35 years of continuous computation to solve" 2019 "🌎- LOL" https://www.csail.mit.edu/news/programmers-solve-mits-20-year-old-cryptographic-puzzle …
@dvassallo: TIL what EC2's "Up to" means. I used to think it simply indicates best effort bandwidth, but apparently there's a hard baseline bottleneck for most EC2 instance types (those with an "up to"). It's significantly smaller than the rating, and it can be reached in just a few minutes. This stuff is so obscure that I bet 99% of Amazon SDEs that use EC2 daily inside Amazon don't know about these limits. I only noticed this by accident when I was benchmarking S3 a couple of weeks ago
@Adron: 1997: startup requires about a million $ just to get physical infra setup for a few servers. 2007: one can finally run stuff online and kind of skip massive hardware acquisitions just to run a website. 2017: one can scale massively & get started for about $10 bucks of infra.
Wired: Nadella’s approach as “subtle shade.” He never explicitly eighty-sixed a division or cut down a product leader, but his underlying intentions were always clear. His first email to employees ran more than 1,000 words—and made no mention of Windows. He later renamed the cloud offering Microsoft Azure. “Satya doesn’t talk shit—he just started omitting ‘Windows’ from sentences,” this executive says. “Suddenly, everything from Satya was ‘cloud, cloud, cloud!’ ”
@ThreddyTheTrex: My approach to side projects has evolved. Beginning of my career: “I will build everything from scratch using C and manage my own memory and I don’t mind if it takes 3 years.” Now: “I will use existing software that takes no more than 15 minutes to configure.”
btown: The software wouldn't have crashed if the user didn't click these buttons in a weird order. The bug was only one factor in a chain of events that led to the segfault.
@Tjido: There are more than 10,000 data lakes in AWS. @strataconf #datalakes #stratadata
Nicolas Kemper: Accretive projects are everywhere: Museums, universities, military bases – even neighborhoods and cities. Key to all accretive projects is that they house an institution, and key to all successful institutions is mission. Whereas scope is a detailed sense of both the destination and the journey, a mission must be flexible and adjust to maximum uncertainty across time. In the same way, an institution and a building are often an odd pair, because whereas the building is fixed and concrete, finished or unfinished, an institution evolves and its work is never finished.
@markmadsen: Your location-identified tweets plus those of two friends on twitter predict your location to within 100m 77% of the time. Location data is PII and must be treated as such #StrataData
Backblaze: The Annualized Failure Rate (AFR) for Q1 is 1.56%. That’s as high as the quarterly rate has been since Q4 2017 and its part of an overall upward trend we’ve seen in the quarterly failure rates over the last few quarters. Let’s take a closer look.
Theron Mohamed: Google's advertising revenue rose by 15% to $30.72 billion, a sharp slowdown from 24% growth a year ago, according to its earnings report for the first quarter of 2018. Paid clicks rose 39%, a significant decrease from 59% year-on-year growth in the first quarter of 2018. Cost-per-click also fell 19%, after sliding 19% in the same period of 2018.
@ajaynairthinks: “It was what we know to do so it was faster” -> this is the key challenge. Right now, the familiar path is not easy/effective in the long term, and the effective path is not familiar in the short term. We need make this gap visible, and we need to make the easy things familiar.
@NinjaEconomics: "For the first time ever there are now more people in the world older than 65 than younger than 5."
Filipe Oliveira: with the new AWS public cloud C5n Instances designed for compute-heavy applications and to deliver performance that is just about indistinguishable from bare metal, with 100 Gbps Networking along with a higher ceiling on packets per second, we should be able deliver at least the same 50 Million operations per second bellow 1 millisecond with less VM nodes
Nima Khajehnouri: Snap’s monetization algorithms have the single biggest impact to our advertisers and shareholders
Carmen Bambach: He is an artist of his time and one that transcends his time. He is very ambitious. It’s important to remember that although Leonardo was a “disciple of experience,” as he called himself, he is also paying great attention to the sources of his time. After having devoured and looked at and bought many books, he realizes he can do better. He really wants to write books, but it’s a very steep learning curve. The way we should look at his notebooks and the manuscripts is that they are essentially the raw material for what he had intended to produce as treatises. His great contribution is being able to visualize knowledge in a way that had not been done before.
Charlie Demerjian: The latest Intel roadmap leak blows a gaping hole in Intel’s 10nm messaging. SemiAccurate has said all along that the process would never work right and this latest info shows that 10nm should have never been released.
@mipsytipsy: Abuse and misery pile up when you are building and running large software systems without understanding them, without good feedback loops. Feedback loops are not a punishment. They mature you into a wise elder engineer. They give you agency, mastery, autonomy, direction. And that is why software engineers, management, and ops engineers should all feel personally invested in empowering software engineers to own their own code in production.
Skip: Serverless has made it possible to scale Skip with a small team of engineers. It’s also given us a programming model that lets us tackle complexity early on, and gives us the ability to view our platform as a set of fine-grained services we can spread across agile teams.
seanwilson: Imagine having to install Trello, Google Docs, Slack etc. manually everywhere you wanted to use it, deal with updates yourself and ask people you wanted to collaborate with to do the same. That makes no sense in terms of ease of use.
Darryl Campbell: The slick PR campaign masked a design and production process that was stretched to the breaking point. Designers pushed out blueprints at double their normal pace, often sending incorrect or incomplete schematics to the factory floor. Software engineers had to settle for re-creating 40-year-old analog instruments in digital formats, rather than innovating and improving upon them. This was all done for the sake of keeping the Max within the constraints of its common type certificate.
Stripe: We have seen such promising results from our remote engineers that we are greatly increasing our investment in remote engineering. We are formalizing our Remote engineering hub. It is coequal with our physical hubs, and will benefit from some of our experience in scaling engineering organizations.
Joel Hruska: According to Intel in its Q1 2019 conference call, NAND price declines were a drag on its earnings, falling nearly twice the expected amount. This boom and bust cycle is common in the DRAM industry, where it drove multiple players to exit the market over the past 18 years. This is one reason we’re effectively down to just three DRAM manufacturers — Samsung, SK Hynix, and Micron. There are still a few more players in the NAND market, though we’ve seen consolidations there as well.
Alastair Edwards: The cloud infrastructure market is moving into a new phase of hybrid IT adoption, with businesses demanding cloud services that can be more easily integrated with their on-premises environment. Most cloud providers are now looking at ways to enter customers’ existing data centres, either through their own products or via partnerships
Paul Johnston: And yes I can absolutely see how the above company could have done this whole solution better as a Serverless solution but they don’t have the money for rearchitecting their back end (I don’t imagine) and what would be the value anyway? It’s up and running, with paying clients. The value at this point doesn’t seem valuable. Additional features may be a good fit for a Serverless approach, but not the whole thing if it’s all working. The pain of migrating to a new backend database, the pain of server migrations even at this level of simplicity, the pain of having to coordinate with other teams on something that seems so trivial, but never is that trivial has been really hard.
@rseroter: In serverless ... Functions are not the point. Managed services are not the point. Ops is not the point. Cost is not the point. Technology is not the point. The point is focus on customer value. @ben11kehoe laying it all out. #deliveragile2019
@jessitron: Serverless is a direction, not a destination. There is no end state. @ben11kehoe Keep moving technical details out of the team’s focus, in favor of customer value. #deliverAgile
@ondayd: RT RealGeneKim "RT jessitron: When we rush development, skip tests and refactoring, we get “Escalating Risk.” Please give up the “technical debt” description; it gives businesspeople a very wrong impression of the tradeoffs. From Janellekz #deliverAgile "
@ben11kehoe: Good points in here about event-driven architectures. I do think the "bounded context" notions from microservices are still applicable, and that we don't have good enough tools for establishing contracts for events and dynamic routing for #serverless yet.
Riot Games: We use MapReduce, a common cluster computing model, to calculate data in a distributed fashion. Below is an example of how we calculate the cosine similarity metric - user data is mapped to nodes, the item-item metric is calculated for each user, and the results are shuffled and sent to a common node so they can be aggregated together in the reduce stage. It takes approximately 1000 compute hours to carry out the entire offer generation process, from snapshotting data to running all of the distributed algorithms. That’s 50 machines running for 20 hours each.
Will Knight: Sze’s hardware is more efficient partly because it physically reduces the bottleneck between where data is stored and where it’s analyzed, but also because it uses clever schemes for reusing data. Before joining MIT, Sze pioneered this approach for improving the efficiency of video compression while at Texas Instruments.
Hersleb hypothesis~ coding is a socio-technical process where code and humans interact. According to what we call the the Hersleb hypothesis, the following anti-pattern is a strong predictor for defects: • If two code sections communicate...• But the programmers of those two sections do not...• Then that code section is more likely to be buggy
Joel Hruska: But the adoption of chiplets is also the engineering acknowledgment of constraints that didn’t used to exist. We didn’t used to need chiplets. When companies like TSMC publicly predict that their 5nm node will deliver much smaller performance and power improvements than previous nodes did, it’s partly a tacit admission that the improvements engineers have gotten used to delivering from process nodes will now have to be gained in a different fashion. No one is particularly sure how to do this, and analyses of how effectively engineers boost performance without additional transistors to throw at the problem have not been optimistic.
Bryan Meyers: To some extent I think we should view chiplets as a stop-gap until other innovations come along. They solve the immediate problems of poor yields and reticle limits in exchange for a slight increase in integration complexity, while opening the door to more easily integrating application-specific accelerators cost-effectively. But it's also not likely that CPU sockets will get much larger. We'll probably hit the limit of density when chiplet-based SoC's start using as much power as high-end GPUs. So really we're waiting on better interconnects (e.g. photonics or wireless NoC) or 3D integration to push much farther. Both of which I think are still at least a decade away.
Olsavsky: And that will be a constant battle between growth, geographic expansion in AWS, and also efficiencies to limit how much we actually need. I think we are also getting much better at adding capacity faster, so there is less need to build it six to twelve 12 months in advance.
Malith Jayasinghe: We noticed that a non-blocking system was able to handle a large number of concurrent users while achieving higher throughput and lower latency with a small number of threads. We then looked at how the number of processing threads impacts the performance. We noticed a minimal impact on throughput and average latency on the number of threads. However, as the number of threads increases, we see a significant increase in the tail latencies (i.e. latency percentiles) and load average.
Paul Berthaux: We [Algolia] run multiple LBs for resiliency - the LB selection is made through round robin DNS. For now this is fine, as the LBs are performing very simple tasks in comparison to our search API servers, so we do not need an even load balancing across them. That said, we have some very long term plans to move from round-robin DNS to something based on Anycast routing. he detection of upstream failures as well as retries toward different upstreams is embedded inside NGINX/OpenResty. I use the log_by_lua directive from OpenResty with some custom Lua code to count the failures and trigger the removal of the failing upstream from the active Redis entry and alert the lb-helper after 10 failures in a row. I set up this failure threshold to avoid lots of unnecessary events in case of short self resolving incidents like punctual packet loss. From there the lb-helper will probe the failing upstream FQDN and put It back in Redis once it'll recover.

Useful Stuff:

You've been using DynamoDB wrong. The question is are we smart enough to use NoSQL right? Information abounds on building normalized relational schemas and performant indexes. NoSQL? Not so much. Rick Houlihan gave a talk—Amazon DynamoDB Deep Dive: Advanced Design Patterns for DynamoDB (slides)—changing all that with incredible insight on how to model data and indexes when using a NoSQL database. If you've been using NoSQL merely as a simple key-value store a whole new world will open up. Complex access patterns can be done efficiently at scale with NoSQL.
- Big idea: as you model data use indexes to regroup, resort, and reaggregate the data to support secondary access patterns.
- Insight: in its ultimate expression a NoSQL database is a finely tuned machine implementing a very specific service and just that service. It's not general in any sense.
- Why NoSQL? Optimized for compute, not storage. Good at modeling denormalized/hierarchical data structures. NoSQL can't reshape data so it likes simple queries, not ad hoc queries. Build for OLTP at scale and scales horizontally. Matches about 90% of the applications we built.
- A table is like a catalogue in a relational database, it can store many different kinds of items, they don't need to have the same attributes. Every inserted item must have a primary key. An optional sort key let's you execute complex range queries. The primary key should disperse data widely, don't create hot partitions. You don't want a high velocity access pattern to a small number of keys. Use something like a UUID for a key instead of a low cardinality type like color with the value red or blue or gender.
- Two types of DynamoDB indexes: Local Secondary Index (LSI) and Global Secondary Index (GSI). LSI allow you to resort the data in the partitions. Can resort data on an attribute like order state to handle a query like "give me all the back ordered items for customer X within the last 24 hours." LSIs must always use the same partition key as the table. It's a way to resort the data, but not regroup the data. A GSI let you create a completely new aggregation of the data. You can group the orders by warehouse instead of customer by saying the partition key is warehouse ID instead and the sort key would be order date. To get the orders for a given warehouse in the last hour query the GSI key with the warehouse ID with a sort key operator saying greater than one hour ago. LSI are strongly consistent and GSIs are eventually consistent.
- Huge idea: bottom line is data is relational, it has to be or we wouldn't care about it. Data doesn't stop being relational because you're using a different database. The question is how to model relational data in NoSQL? This idea took me by surprise, but on reflection he's right. When developing a relational model you group attributes in a table that are functionally dependent on a primary key. That doesn't change with NoSQL. The difference with NoSQL is you are changing how the attributes are grouped together to support more efficient access patterns. A RDBMS must skip around the disk to put together a denormalized view, which slows down the database. NoSQL collapses those data hierarchies so you can stream a single table from disk. NoSQL doesn't perform as complex an operation to assemble a view.
- Selecting a sort key is about modeling the one-to-many and many-to-many relationships that need support in the data model. Examples: Orders and OrderItems. You want to build sort keys the support very efficient select patterns. It's about querying across entities with a single trip to the database. Get all the items you need to support an access pattern. You don't want to go back multiple times to the database. Don't go get the customer item and the order items for that customer. That's the relational pattern. This is a very inefficient access pattern because you're managing the join at the application layer.
- With a relational database you normalize a data model and then come up with indexes. With NoSQL it's the opposite. You must understand every access pattern because if you don't you can't model the data that allows for efficient access.
- Understand the nature of your use case. Is the app for OLTP, OLAP, DSS (decision support)?
- Define the entity-relationship model. Know the data and how it's related.
- Identify lifecycle of data. Does it need TTL, backup/archival, etc.?
- Define all the access patterns. What's the read pattern and write pattern for your data? What aggregations are you trying to support? Document all the workflows up front. You'are designing a data model that's very specifically tuned to those access patterns.
- Big idea: NoSQL is not flexible. NoSQL is not good with change. NoSQL is efficient, but the data model is not flexible because you're building in the access patterns. The more you tune a data model to an application the more tightly coupled it becomes to a service.
- Begin data-modeling. Don't make the mistake of using multiple tables. Each service has one table. This reduces round trips and simplifies access patterns.
- Identify primary keys. How will items be inserted and read?
- Define indexes for secondary access patterns.
- NoSQL is not good at answering questions like averages, counts, sum, max, min, and complex computed aggregatons. One problem they had with Oracle at Amazon is developers would deploy some bad code in a stored procedure and it would impact everyone because it's a shared space. With DynamoDB Streams and AWS Lambda the processing happens in a different space so there's isolation between code bases. Streams is a change log for DynamoDB. Writer operations appear on the stream and can invoke a lambda function.
- The lambda function has an invocation role that defines what attributes it can see and an execution role that defines what it can do. A common thing to do with streams and lambda is computed aggregations.
- By reading data off the stream you can compute running averages, sums, and other complex application metrics and write them back to the table as meta data items.
- What we want to do with NoSQL is offload the CPU. We don't want to compute things we want things to be precomputed. The example is time series data. You keep time based partitions then you can load those time based partitions and they don't change. You can run your aggregations over them and produce all your time based metrics and write them back as meta data so from then on you only need to read them. You don't have to calculate them again. Of course the downside is the complete lack of flexibility. If you want to use a different time window or know something you didn't already precompute—you are out of luck.
- Things you can do with lambda: update search, push to kinesis for stream processing, interact with external systems, compute metrics. Maybe lambda is not the most cost efficient approach. You may want an EC2 cluster and set up a static stream reader.
- Most people use NoSQL as a key-value store. That's not the most efficient way to use NoSQL. You want to store your hierarchical data in the table.
- Instead of using query filters you want to use composite keys to create hierarchies using the sort key structure. For example, if you want to filter on the a Game table by status and date you could use a select query. Under NoSQL you want to create a composite key the combines both the data and status attributes. So you create a new concatenated key called StatusDate. For example: "DONE_2014-10-02". Now you can search on Game where the StatusDate BEGINS_WTIH 'DONE' or 'PENDING' or whatever.
- To mimic the transactional flow of an item that takes muliple passes to construct you keep a version history. This wasn't very clear to me.
- DynamoDB has a transactions API. Do not use it to maintain normalized data. Do use it to commit changes across items and conditional batch inserts/updates. It does support multiple tables, but remember you should not have multiple tables.
- In a NoSQL database you want single queries to deliver multiple items in a single round trip.
- NoSQL is the future for the vast majority of workloads simply because of the scale.
- At this point I'm going to wimp out. Rick starts to talk about different access patterns for different applications, but he goes very fast, with small type, and not a lot of clear detail. You'll need to watch it about a million times for it to make sense. Forrest Brazeal took on the challenge of explaining what's going in From relational DB to single DynamoDB table: a step-by-step exploration.
- And I think this is a problem for NoSQL. At this advanced level it's too magical. It's like writing assembler instead of using a compiler.
- See also, Building with AWS Databases: Match Your Workload to the Right Database, Best Practices for DynamoDB, Rick on Quora, Example of Modeling Relational Data in DynamoDB, Why the PIE theorem is more relevant than the CAP theorem

People wasting your cloud budget on dumb service usage patterns? Make the joke on them. @rchrdbyd: Switch it to “requester pays” may be a bit heavy-handed but probably your most sustainable option.

When can we expect GitHub to add this feature? Amazon's warehouse-worker tracking system can automatically pick people to fire without a human supervisor's involvement

This will likely cause some agita. Though I think we all know in every human endeavor some people end up doing more and better work than others. Why Software Projects need Heroes (Lessons Learned from 1100+ Projects): The established wisdom in the literature is to depreciate “heroes”, i.e., a small percentage of the staff who are responsible for most of the progress on a project. But, based on a study of 1100+ open source GitHub projects, we assert: • Overwhelmingly, most projects are hero projects. This result holds true for small, medium, and large projects. • Hero developers are far less likely to introduce bugs into the codebase than their non-hero counterparts. Thus having heroes in projects significantly affects the code quality. Our empirical results call for a revision of a long-held truism in software engineering. Software heroes are far more common and valuable than suggested by the literature, particularly from code quality perspective. Organizations should reflect on better ways to find and retain more of these software heroes. Also, We can't judge another programmer's abilities in a 60 min interview

Facebook.com got a rewrite. Facebook has contributed much to web development over the years. Will their new just-in-time dependency driven code and data approach be the new wave of the future? Learn how they do it in Building the New facebook.com with React, GraphQL and Relay. Facebook's website has always seemed fast to me, but they are doing a lot to make it optimally fast in every possible scenario. The big theme of the change just-in-time delivery of code and data through some sophisticated infrastructure. They determine when code and data is needed and deliver them both when needed. Not only that, they deliver the least possible amount of code and data needed to quickly render a view and they download more over time and progressively render the rest. The pendulum as swung from separating everything back to colocating everything. Data in React now is defined with the component. GraphQL is used to get data in one request. Relay handles parallelizing data retrieval, schedule delivery, and using cached data. For example, GraphQL lazy fetches only the kind of data needed to render a particular kind of post. Relay downloads only the code needed to render that particular kind of post, not every kind of post. The result is you never download unused data and unneeded code. Code is treated as data. Code begins downloading as soon as possible. Relay keeps a local inmemory cache. So if you navigate to your profile it will used the cached data, render what it has, and then progressively render as the missing data and code is returned.

It takes a lot of thrust to overcome data gravity. GitLab’s journey from Azure to GCP.
- There were several reasons why we decided on the Google Cloud Platform. One top priority was that we wanted GitLab.com to be suitable for mission-critical workloads, and GCP offered the performance and consistency we needed. A second reason is that we believe Kubernetes is the future, and with so much development geared toward cloud native, Google was the clear choice as a partner going forward.
- Looking for the simplest solution, we considered whether we could just stop the whole site: Copy all the data from Azure to GCP, switch the DNS over to point to GCP, and then start everything up again. The problem was that we had too much data to do this within a reasonable time frame. Once we shut down the site, we'd need to copy all the data between two cloud providers, and once the copy was complete, we'd need to verify all the data (about half a petabyte) and make sure it was correct. This plan meant that GitLab.com could be down for several days, and considering that thousands and thousands of people rely on GitLab on a daily basis, this wouldn’t work.
- The average errors for the pre-migration period in Azure was 8.2 errors per day, while post-migration in GCP it’s down to just one error a day.
- Leading up to the migration, our availability was 99.61 percent and are now on track to reach our target of 99.95 percent availability.
- GitLab.com is not only faster, it’s more predictable, with fewer outlier values taking an unacceptably long time.

SpaceX Gets FCC Approval to Sell Wireless High-Speed Home Internet from Space. Here's a cool animation showing the network in gorgeous detail: Starlink revisions, Nov 2018. The round trip from London to Singapore takes an estimated 90 msecs, which is half of what you can get on the current internet, but your actual latency seems depends on the route. The new architecture doesn't have as many paths between various points. The target is $100-300 for a pizza box sized antenna. What bandwidth? What latency? What cost? Don't know yet.

F8 day 1 and day 2 videos are now available. Apparently the future is private. Privacy is code for Facebook wanting to be considered a platform. Zucked: Waking Up to the Facebook Catastrophe. As a platform Facebook will be free from the responsibility of policing their users. They think. Until governments step in and start regulating the hell out of everything because companies aren't taking responsibility for their platforms. You might like A Lighter, Faster, Simpler Messenger; Reliable Code at Scale; and Using Machine Learning for Developer Productivity.

Serverless and Microservices: a match made in heaven?: "Serverless is all about events...Within a serverless solution, the intent is to be response to events that occur, and an API is actually only a mechanism for generating events." Interesting question: is there a difference between events and APIs? Yes...and no. Underneath it's all messages and most of what we call a stack is about routing and distributing messages to some handling function and context. Serverless simply collapses the stack. At a teleological level, implicit in an API is a protocol uniting a tribe of functions with some purpose. That can be true of an event. An event can trigger complex behaviours, but it's not required. Events have loose coupling. APIs have tight cohesion. It would be an anti-pattern to think of events as both independent and a part of state machine. They have very different purposes.

Accepting passwords? Feel free to run them through haveibeenpwned.com. Oh, and turn on two-factor. Hackers went undetected in Citrix’s internal network for six months: Citrix said in a later update on April 4 that the attack was likely a result of password spraying, which attackers use to breach accounts by brute-forcing from a list of commonly used passwords that aren’t protected with two-factor authentication. Also, The Economy of Credential Stuffing Attacks

I forgot how to shoe a horse. I forgot how to manage a server.

Who knew code was so fragile it had to be remade so often? @shanselman: "Within Microsoft, BuildXL runs 30,000+ builds per day on mono-repo codebases up to a half-terabyte in size with a half-million pips per build, using distribution to 1000s of machines and petabytes of source code, package, and build output caching." @archisgore: "At @Polyverse_io: Hold my beer." 25 people company, 200K build jobs at any given time, 1M+ jobs at peak, 10 Complete Linux Distributions rebuilt twice a day for every machine."

Lots of notes from Looking Back At SmashingConf San Francisco 2019. You might like Making A Difference With Differential Serving.

Key Takeaways from Flink Forward San Francisco 2019: The Flink community is, like many other communities, tired of maintaining duplicate logic, especially in streaming and batch applications for the same system — the Flink Team is unifying the APIs for data analytics, such as the Table API, so devs can leverage the same code for both needs. Other projects, like Apache Beam, also aim to allow one codebase for both streaming and batch. They recently released the first stage of support for targeting Flink as a runner.

Videos from the ACCU Conference (C++) are now available. You might like De-fragmenting C++: Making exceptions more affordable and usable, How does Git actually work?, The cell as a computer: Turing complete and massively parallel.

3 Takeaways from SXSW Interactive Festival: Digital Channels and Brick-and-Mortar Work Better Together; Technology Is Becoming More Intimate; AR/VR Launches Result in Great Publicity, but ROI Still in Question

Actually, go ahead and create accounts for all your developers as well; infra as code makes it easy, and serverless makes it cheap. A journey into serverless, part 2. A good example of how to move to serverless. Nothing especially dramatic, but a well told story to serve as a guide.

eBay/beam (article): Beam is a distributed knowledge graph store, sometimes called an RDF store or a triple store. Knowledge graphs are suitable for modeling data that is highly interconnected by many types of relationships, like encyclopedic information about the world. A knowledge graph store enables rich queries on its data, which can be used to power real-time interfaces, to complement machine learning applications, and to make sense of new, unstructured information in the context of the existing knowledge.

Unikernels: The Next Stage of Linux’s Dominance: In this paper, we posit that an upstreamable unikernel target is achievable from the Linux kernel, and, through an early Linux unikernel prototype,
demonstrate that some simple changes can bring dramatic performance advantages.

Stuff The Internet Says On Scalability For May 3rd, 2019

High Scalability

Number Stuff:

Quotable Stuff:

Useful Stuff:

Read more

Kafka 101

Capturing A Billion Emo(j)i-ons

Brief History of Scaling Uber

Behind AWS S3’s Massive Scale