advertise
Friday
Mar202015

Stuff The Internet Says On Scalability For March 20th, 2015

Hey, it's HighScalability time:


What a view! The solar eclipse at sunrise from the International Space Station.
  • 60 billion: rows in DynamoDB; 18.5 billion: BuzzFeed impressions
  • Quotable Quotes:
    • @postwait: Hell is other people’s APIs.
    • @josephruscio: .@Netflix is now 34% of US Internet traffic at night. 2B+ hours of streaming a month. #SREcon15
    • Geo Curnoff: Everything he said makes an insane amount of sense, but it might sound like a heresy to most people, who are more interested in building software cathedrals rather than solving real problems.
    • Mike Acton: Reality is not a hack you're forced to deal with to solve your abstract, theoretical problem. Reality is the actual problem.
    • @allspaw: "The right tool for the job!" said someone whose assumptions, past experience, motivations, and definition of "job" aren't explicit.
    • Sam Cutler: Mechanical ignorance is, in fact, not a strength.
    • @Grady_Booch: Beautiful quote from @timoreilly “rms is sort of like an Old Testament prophet, with lots of ‘Thou shalt not.'" 
    • @simonbrown: "With event-sourcing, messaging is back in the hipster quadrant" @ufried at #OReillySACon
    • @ID_AA_Carmack: I just dumped the C++ server I wrote last year for a new one in Racket. May not scale, but it is winning for development even as a newbie.
    • @mfdii: Containers aren't going to reduce the need to manage the underlying services that containers depend on. Exhibit A: 
    • @bdnoble: "DevOps: The decisions you make now affect the quality of sleep you get later." @caitie at #SREcon15
    • @giltene: By that logic C++ couldn't possibly multiply two integers faster than an add loop on CPUs with no mul instruction, right?
    • @mjpt777: Aeron beats all the native C++ messaging implementations and it is written in Java. 
    • @HypertextRanch: Here's what happens to your Elasticsearch performance when you upgrade the firmware on your SSDs.
    • @neil_conway: Old question: "How is this better than Hadoop?". New question: How is this better than GNU Parallel?"
    • @evgenymorozov: "Wall Street Firm Develops New High-Speed Algorithm Capable Of Performing Over 10,000 Ethical Violations Per Second"

  • And soon the world's largest army will have no soldiers. @shirazdatta: In 2015 Uber, the world's largest taxi company owns no vehicles, Facebook the world's most popular media owner creates no content, Alibaba, the most valuable retailer has no inventory and Airbnb the world's largest accommodation provider owns no real estate.

  • Not doing something is still the #1 performance improver. Coordination Avoidance in Database Systems: after looking at the problem from a fresh perspective, and without breaking any of the invariants required by TPC-C, the authors were able to create a linearly scalable system with 200 servers processing 12.7M tps – about 25x the next-best system.

  • Tesla and the End of the Physical World. Tesla downloading new software to drastically improve battery usage is cool, but devices have been doing this forever. Routers, switches, set tops, phones, virtually every higher end connected device knows how to update itself. Cars aren't any different. Cars are just another connected device. Also, interesting that Tesla is Feature Flagging their new automatic steering capability.

  • The Apple Watch is technology fused with fashion and ecosystem in a way we've never seen before. Which is a fascinating way of routing around slower moving tech cycles. Cycles equal money. Do you need a new phone or tablet every year? Does the technology demand it? Not so much. But fashion will. Fashion is a force that drives cycles to move for no reason at all. And that's what you need to make money. Crazy like a fox.

Don't miss all that the Internet has to say on Scalability, click below and become eventually consistent with all scalability knowledge (which means this post has many more items to read so please keep on reading)...

Click to read more ...

Tuesday
Mar172015

Sponsored Post: Signalfuse, InMemory.Net, Sentient, Couchbase, VividCortex, Internap, Transversal, MemSQL, Scalyr, FoundationDB, AiScaler, Aerospike, AppDynamics, ManageEngine, Site24x7

Who's Hiring?

  • Sentient Technologies is hiring several Senior Distributed Systems Engineers and a Senior Distributed Systems QA Engineer. Sentient Technologies, is a privately held company seeking to solve the world’s most complex problems through massively scaled artificial intelligence running on one of the largest distributed compute resources in the world. Help us expand our existing million+ distributed cores to many, many more. Please apply here.

  • Linux Web Server Systems EngineerTransversal. We are seeking an experienced and motivated Linux System Engineer to join our Engineering team. This new role is to design, test, install, and provide ongoing daily support of our information technology systems infrastructure. As an experienced Engineer you will have comprehensive capabilities for understanding hardware/software configurations that comprise system, security, and library management, backup/recovery, operating computer systems in different operating environments, sizing, performance tuning, hardware/software troubleshooting and resource allocation. Apply here.

  • UI EngineerAppDynamics, founded in 2008 and lead by proven innovators, is looking for a passionate UI Engineer to design, architect, and develop our their user interface using the latest web and mobile technologies. Make the impossible possible and the hard easy. Apply here.

  • Software Engineer - Infrastructure & Big DataAppDynamics, leader in next generation solutions for managing modern, distributed, and extremely complex applications residing in both the cloud and the data center, is looking for a Software Engineers (All-Levels) to design and develop scalable software written in Java and MySQL for backend component of software that manages application architectures. Apply here.

Fun and Informative Events

  • Rise of the Multi-Model Database. FoundationDB Webinar: March 10th at 1pm EST. Do you want a SQL, JSON, Graph, Time Series, or Key Value database? Or maybe it’s all of them? Not all NoSQL Databases are not created equal. The latest development in this space is the Multi Model Database. Please join FoundationDB for an interactive webinar as we discuss the Rise of the Multi Model Database and what to consider when choosing the right tool for the job.

Cool Products and Services

  • SignalFx: just launched an advanced monitoring platform for modern applications that's already processing 10s of billions of data points per day. SignalFx lets you create custom analytics pipelines on metrics data collected from thousands or more sources to create meaningful aggregations--such as percentiles, moving averages and growth rates--within seconds of receiving data. Start a free 30-day trial!

  • InMemory.Net provides a Dot Net native in memory database for analysing large amounts of data. It runs natively on .Net, and provides a native .Net, COM & ODBC apis for integration. It also has an easy to use language for importing data, and supports standard SQL for querying data. http://InMemory.Net

  • Top Enterprise Use Cases for NoSQL. Discover how the largest enterprises in the world are leveraging NoSQL in mission-critical applications with real-world success stories. Get the Guide.
    http://info.couchbase.com/HS_SO_Top_10_Enterprise_NoSQL_Use_Cases.html

  • VividCortex goes beyond monitoring and measures the system's work on your MySQL and PostgreSQL servers, providing unparalleled insight and query-level analysis. This unique approach ultimately enables your team to work more effectively, ship more often, and delight more customers.

  • SQL for Big Data: Price-performance Advantages of Bare Metal. When building your big data infrastructure, price-performance is a critical factor to evaluate. Data-intensive workloads with the capacity to rapidly scale to hundreds of servers can escalate costs beyond your expectations. The inevitable growth of the Internet of Things (IoT) and fast big data will only lead to larger datasets, and a high-performance infrastructure and database platform will be essential to extracting business value while keeping costs under control. Read more.

  • MemSQL provides a distributed in-memory database for high value data. It's designed to handle extreme data ingest and store the data for real-time, streaming and historical analysis using SQL. MemSQL also cost effectively supports both application and ad-hoc queries concurrently across all data. Start a free 30 day trial here: http://www.memsql.com/

  • In-Memory Computing at Aerospike Scale. How the Aerospike team optimized memory management by switching from PTMalloc2 to JEMalloc.

  • Diagnose server issues from a single tab. The Scalyr log management tool replaces all your monitoring and analysis services with one, so you can pinpoint and resolve issues without juggling multiple tools and tabs. It's a universal tool for visibility into your production systems. Log aggregation, server metrics, monitoring, alerting, dashboards, and more. Not just “hosted grep” or “hosted graphs,” but enterprise-grade functionality with sane pricing and insane performance. Trusted by in-the-know companies like Codecademy – try it free! (See how Scalyr is different if you're looking for a Splunk alternative.)

  • aiScaler, aiProtect, aiMobile Application Delivery Controller with integrated Dynamic Site Acceleration, Denial of Service Protection and Mobile Content Management. Also available on Amazon Web Services. Free instant trial, 2 hours of FREE deployment support, no sign-up required. http://aiscaler.com

  • ManageEngine Applications Manager : Monitor physical, virtual and Cloud Applications.

  • www.site24x7.com : Monitor End User Experience from a global monitoring network.

If any of these items interest you there's a full description of each sponsor below. Please click to read more...

Click to read more ...

Tuesday
Mar172015

In-Memory Computing at Aerospike Scale: When to Choose and How to Effectively Use JEMalloc

This is a guest post by Psi Mankoski (email), Principal Engineer, Aerospike.

When a customer’s business really starts gaining traction and their web traffic ramps up in production, they know to expect increased server resource load. But what do you do when memory usage still keeps on growing beyond all expectations? Have you found a memory leak in the server? Or else is memory perhaps being lost due to fragmentation? While you may be able to throw hardware at the problem for a while, DRAM is expensive, and real machines do have finite address space. At Aerospike, we have encountered these scenarios along with our customers as they continue to press through the frontiers of high scalability.

In the summer of 2013 we faced exactly this problem: big-memory (192 GB RAM) server nodes were running out of memory and crashing again within days of being restarted. We wrote an innovative memory accounting instrumentation package, ASMalloc [13], which revealed there was no discernable memory leak. We were being bitten by fragmentation.

This article focuses specifically on the techniques we developed for combating memory fragmentation, first by understanding the problem, then by choosing the best dynamic memory allocator for the problem, and finally by strategically integrating the allocator into our database server codebase to take best advantage of the disparate life-cycles of transient and persistent data objects in a heavily multi-threaded environment. For the benefit of the community, we are sharing our findings in this article, and the relevant source code is available in the Aerospike server open source GitHub repo. [12]

Executive Summary

  • Memory fragmentation can severely limit scalability and stability by wasting precious RAM and causing server node failures.

  • Aerospike evaluated memory allocators for its in-memory database use-case and chose the open source JEMalloc dynamic memory allocator.

  • Effective allocator integration must consider memory object life-cycle and purpose.

  • Aerospike optimized memory utilization by using JEMalloc extensions to create and manage per-thread (private) and per-namespace (shared) memory arenas.

  • Using these techniques, Aerospike saw substantial reduction in fragmentation, and the production systems have been running non-stop for over 1.5 years.

Introduction

Click to read more ...

Monday
Mar162015

How and Why Swiftype Moved from EC2 to Real Hardware

This is a guest post by Oleksiy Kovyrin, Head of Technical Operations at Swiftype. Swiftype currently powers search on over 100,000 websites and serves more than 1 billion queries every month.

When Matt and Quin founded Swiftype in 2012, they chose to build the company’s infrastructure using Amazon Web Services. The cloud seemed like the best fit because it was easy to add new servers without managing hardware and there were no upfront costs.

Unfortunately, while some of the services (like Route53 and S3) ended up being really useful and incredibly stable for us, the decision to use EC2 created several major problems that plagued the team during our first year.

Swiftype’s customers demand exceptional performance and always-on availability and our ability to provide that is heavily dependent on how stable and reliable our basic infrastructure is. With Amazon we experienced networking issues, hanging VM instances, unpredictable performance degradation (probably due to noisy neighbors sharing our hardware, but there was no way to know) and numerous other problems. No matter what problems we experienced, Amazon always had the same solution: pay Amazon more money by purchasing redundant or higher-end services.

The more time we spent working around the problems with EC2, the less time we could spend developing new features for our customers. We knew it was possible to make our infrastructure work in the cloud, but the effort, time and resources it would take to do so was much greater than migrating away.

After a year of fighting the cloud, we made a decision to leave EC2 for real hardware. Fortunately, this no longer means buying your own servers and racking them up in a colo. Managed hosting providers facilitate a good balance of physical hardware, virtualized instances, and rapid provisioning. Given our previous experience with hosting providers, we made the decision to choose SoftLayer. Their excellent service and infrastructure quality, provisioning speed, and customer support made them the best choice for us.

After more than a month of hard work preparing the inter-data center migration, we were able to execute the transition with zero downtime and no negative impact on our customers. The migration to real hardware resulted in enormous improvements in service stability from day one, provided a huge (~2x) performance boost to all key infrastructure components, and reduced our monthly hosting bill by ~50%.

This article will explain how we planned for and implemented the migration process, detail the performance improvements we saw after the transition, and offer insight for younger companies about when it might make sense to do the same.

Preparing for the switch

Click to read more ...

Friday
Mar132015

Stuff The Internet Says On Scalability For March 13th, 2015

Hey, it's HighScalability time:


1957: 13 men delivering a computer. 2017: a person may wear 13 computing devices (via Vala Afshar)
  • 5.3 million/second: LinkedIn metrics collected; 1.7 billion: Tinder ratings per day
  • Quotable Quotes:
    • @jankoum: WhatsApp crossed 1B Android downloads. btw our android team is four people + Brian. very small team, very big impact.
    • @daverog: Unlike milk, software gets more expensive, per unit, in larger quantities (diseconomies of scale) @KevlinHenney #qconlondon
    • @DevOpsGuys: Really? Wow! RT @DevOpsGuys: Vast majority of #Google software is in a single repository. 60million builds per year. #qconlondon
    • Ikea: We are world champions in making mistakes, but we’re really good at correcting them.
    • Chris Lalonde: I know a dozen startups that failed from their own success, these problems are only going to get bigger.
    • Chip Overclock: Configuration Files Are Just Another Form of Message Passing (or Maybe Vice Versa)
    • @rbranson: OH: "call me back when the number of errors per second is a top 100 web site."
    • @wattersjames: RT @datacenter: Worldwide #server sales reportedly reach $50.9 billion in 2014
    • @johnrobb: Costs of surveillance.  Bots are cutting the costs by a couple of orders of magnitude more.

  • Is this a sordid story of intrigue? How GitHub Conquered Google, Microsoft, and Everyone Else. Not really. The plot arises from the confluence of the creation of Git, the evolution of Open Source, small network effects, and a tide rising so slowly we may have missed it. Brian Doll (a GitHub VP) is letting us know the water is about nose height: Of GitHub’s ranking in the top 100, Doll says, “What that tells me is that software is becoming as important as the written word.”

  • This is audacious. Peter Lawrey Describes Petabyte JVMs. For when you really really want to avoid a network hit and access RAM locally. The approach is beautifully twisted: It’s running on a machine with 6TB and 6 NUMA regions. Since, as previously noted, we want to try and restrict the heap or at least the JVM to a single NUMA region, you end up with 5 JVMs with a heap of up to 64GB each, and memory mapped caches for both the indexes and the raw data, plus a 6th NUMA region reserved just for the operating system and monitoring tasks.

  • There's a parallel between networks and microprocessors in that network latency is not getting any faster and microprocessor speeds are not getting any faster, yet we are getting more and more bandwidth and more and more MIPS per box. Programming for this reality will be a new skill. Observed while listening to How Did We End Up Here?

  • Advances in medicine will happen because of big data. We need larger sample sizes to figure things out. Sharing Longevity Data.

Don't miss all that the Internet has to say on Scalability, click below and become eventually consistent with all scalability knowledge (which means this post has many more items to read so please keep on reading)...

Click to read more ...

Thursday
Mar122015

Paper: Staring into the Abyss: An Evaluation of Concurrency Control with One Thousand Cores

How will your OLTP database perform if it had to scale up to 1024 cores? Not very well according to this fascinating paper: Staring into the Abyss: An Evaluation of Concurrency Control with One Thousand Cores, where a few intrepid chaos monkeys report the results of their fiendish experiment. The conclusion: we need a completely redesigned DBMS architecture that is rebuilt from the ground up.

Summary:

Click to read more ...

Wednesday
Mar112015

Cassandra Migration to EC2

This is a guest post by Tommaso Barbugli the CTO of getstream.io, a web service for building scalable newsfeeds and activity streams.

In January we migrated our entire infrastructure from dedicated servers in Germany to EC2 in the US. The migration included a wide variety of components, web workers, background task workers, RabbitMQ, Postgresql, Redis, Memcached and our Cassandra cluster. Our main requirement was to execute this migration without downtime.

This article covers the migration of our Cassandra cluster. If you’ve never run a Cassandra migration before, you’ll be surprised to see how easy this is. We were able to migrate Cassandra with zero downtime using its awesome multi-data center support. Cassandra allows you to distribute your data in such a way that a complete set of data is guaranteed to be placed on every logical group of nodes (eg. nodes that are on the same data-center, rack, or EC2 regions...). This feature is a perfect fit for migrating data from one data-center to another. Let’s start by introducing the basics of a Cassandra multi-datacenter deployment.

Cassandra, Snitches and Replication strategies

Click to read more ...

Monday
Mar092015

AppLovin: Marketing to Mobile Consumers Worldwide by Processing 30 Billion Requests a Day

This is a guest post from AppLovin's VP of engineering, Basil Shikin, on the infrastructure of its mobile marketing platform. Major brands like Uber, Disney, Yelp and Hotels.com use AppLovin's mobile marketing platform. It processes 30 billion requests a day and 60 terabytes of data a day.

AppLovin's marketing platform provides marketing automation and analytics for brands who want to reach their consumers on mobile. The platform enables brands to use real-time data signals to make effective marketing decisions across one billion mobile consumers worldwide.

Core Stats

  • 30 Billion ad requests per day

  • 300,000 ad requests per second, peaking at 500,000 ad requests per second

  • 5ms average response latency

  • 3 Million events per second

  • 60TB of data processed daily

  • ~1000 servers

  • 9 data centers

  • ~40 reporting dimensions

  • 500,000 metrics data points per minute

  • 1 Pb Spark cluster

  • 15GB/s peak disk writes across all servers

  • 9GB/s peak disk reads across all servers

  • Founded in 2012, AppLovin is headquartered in Palo Alto, with offices in San Francisco, New York, London and Berlin.

 

Technology Stack

Click to read more ...

Monday
Mar092015

The Architecture of Algolia’s Distributed Search Network

Guest post by Julien Lemoine, co-founder & CTO of Algolia, a developer friendly search as a service API.

Algolia started in 2012 as an offline search engine SDK for mobile. At this time we had no idea that within two years we would have built a worldwide distributed search network.

Today Algolia serves more than 2 billion user generated queries per month from 12 regions worldwide, our average server response time is 6.7ms and 90% of queries are answered in less than 15ms. Our unavailability rate on search is below 10-6 which represents less than 3 seconds per month.

The challenges we faced with the offline mobile SDK were technical limitations imposed by the nature of mobile. These challenges forced us to think differently when developing our algorithms because classic server-side approaches would not work.

Our product has evolved greatly since then. We would like to share our experiences with building and scaling our REST API built on top of those algorithms.

We will explain how we are using a distributed consensus for high-availability and synchronization of data in different regions around the world and how we are doing the routing of queries to the closest locations via an anycast DNS.

The data size misconception

Click to read more ...

Friday
Mar062015

Stuff The Internet Says On Scalability For March 6th, 2015

Hey, it's HighScalability time:


The future of technology in one simple graph (via swardley)
  • $50 billion: the worth of AWS (is it low?); 21 petabytes: size of the Internet Archive; 41 million: # of views of posts about a certain dress

  • Quotable Quotes:
    • @bpoetz: programming is awesome if you like feeling dumb and then eventually feeling less dumb but then feeling dumb about something else pretty soon
    • @Steve_Yegge: Saying you don't need debuggers because you have unit tests is like saying you don't need detectives because you have jails.
    • Nasser Manesh: “Some of the things we ran into are issues [with Docker] that developers won’t see on laptops. They only show up on real servers with real BIOSes, 12 disk drives, three NICs, etc., and then they start showing up in a real way. It took quite some time to find and work around these issues.”
    • Guerrilla Mantras Online: Best practices are an admission of failure.
    • Keine Kommentare: Using master-master for MySQL? To be frankly we need to get rid of that architecture. We are skipping the active-active setup and show why master-master even for failover reasons is the wrong decision.
    • Ed Felten: the NSA’s actions in the ‘90s to weaken exportable cryptography boomeranged on the agency, undermining the security of its own site twenty years later.
    • @trisha_gee: "Java is both viable and profitable for low latency" @giltene at #qconlondon
    • @michaelklishin: Saw Eric Brewer trending worldwide. Thought the CAP theorem finally went mainstream. Apparently some Canadian hockey player got traded.
    • @ThomasFrey: Mobile game revenues will grow 16.5% in 2015, to more than $3B
    • John Allspaw: #NoEstimates is an example of something that engineers seem to do a lot, communicating a concept by saying what it’s not.

  • Improved thread handling contention, NDB API receive processing, scans and PK lookups in the data nodes has lead to a monster 200M reads per second in MySQL Cluster 7.4. That's on an impressive hardware configuration to be sure, but it doesn't matter how mighty the hardware if your software can't drive it.

  • Have you heard of this before? LinkedIn shows how to use SDCH, an HTTP/1.1-compatible extension, which reduces the required bandwidth through the use of a dictionary shared between the client and the server, to achieve impressive results: When sdch and gzip are combined, we have seen additional compression as high as 81% on certain files when compared to gzip only. Average additional compression across our static content was about 24%. 

  • Double awesome description of How we [StackExchange] upgrade a live data center. It was an intricate multi-day highly choreographed dance. Toes were stepped on, but there was also great artistry. The result of the new beefier hardware: The decrease on question render times (from approx30-35ms to 10-15ms) is only part of the fun. Great comment thread on reddit.

  • A jaunty exploration of Microservices: What are They and Why Should You Care?  Indix is following Twitter and Netflix by tossing their monolith for microservices. The main idea: Microservices decouples your systems and gives more options and choices to evolve them independently.

  • Wired's new stack: WordPress, PHP, Stylus for CSS, jQuery, starting with React.js, JSON, Vagrant, Gulp for task automation, Git hooks, Lining, GitHub, Jenkins.

  • Have you ever wanted to search Hacker News? Algolia has created a great search engine for HN.

Don't miss all that the Internet has to say on Scalability, click below and become eventually consistent with all scalability knowledge (which means this post has many more items to read so please keep on reading)...

Click to read more ...

Page 1 ... 5 6 7 8 9 ... 186 Next 10 Entries »