advertise
Tuesday
Jan142014

Ask HS: Design and Implementation of scalable services?

We have written agents deployed/distributed across the network. Agents sends data every 15 Secs may be even 5 secs. Working on a service/system to which all agent can post data/tuples with marginal payload. Upto 5% drop rate is acceptable. Ultimately the data will be segregated and stored into DBMS System (currently we are using MSQL).

Question(s) I am looking for answer

1. Client/Server Communication: Agent(s) can post data. Status of sending data is not that important. But there is a remote where Agent(s) to be notified if the server side system generates an event based on the data sent.

- Lot of advices from internet suggests using Message Bus (ActiveMQ) for async communication. Multicast and UDP are the alternatives.

2. Persistence: After some evaluation data to be stored in DBMS System.

- End of processing data is an aggregated record for which MySql looks scalable. But on the volume of data is exponential. Considering HBase as an option.

Looking if there are any alternatives for above two scenarios and get expert advice.

Monday
Jan132014

NYTimes Architecture: No Head, No Master, No Single Point of Failure

Michael Laing, a Systems Architect at NYTimes, gave this great decription of their use of RabbitMQ and their overall architecture on the RabbitMQ mailing list. The closing sentiment marks this as definitely an architecture to learn from:

Although it may seem complex, Fabrik has simple components and is mostly principles and plumbing. The key point to grasp is that there is no head, no master, no single point of failure. As I write this I can see components failing (not RabbitMQ), and we are fixing them so they are more reliable. But the system doesn't fail, users can connect, and messages are delivered, regardless - all within design parameters.

Since it's short, to the point, and I couldn't say it better, I'll just reproduce several original sources here:

Click to read more ...

Friday
Jan102014

Stuff The Internet Says On Scalability For January 10th, 2014

Hey, it's HighScalability time:


Run pneumatic tubes alongside optical fiber cables and we unite the digital and material worlds.
  • 1 billion: searches on DuckDuckGo in 2013
  • Quotable Quotes: 
    • pg: We [Hacker News] currently get just over 200k uniques and just under 2m page views on weekdays (less on weekends). 
      • rtm: New server: one Xeon E5-2690 chip, 2.9 GHz, 8 cores total, 32 GB RAM.
    • Kyle Vanhemert: The graph shows the site’s [Reddit] beginnings in the primordial muck of porn and programming.
    • Drake Baer: But it's not about knowing the most people possible. Instead of being about size, a successful network is about shape.
    • @computionist: Basically when you cache early to scale you've admitted you have no idea what you're doing.
    • Norvig's Law: Any technology that surpasses 50% penetration will never double again.  
    • mbell: Keep in mind that a single modern physical server that is decently configured (12-16 cores, 128GB of ram) is the processing equivalent of about 16 EC2 m1.large instances.
    • @dakami: Learned databases because grep|wc -l got slow. Now I find out that's pretty much map/reduce.
    • Martin Thompson: I think "Futures" are a poor substitute for being pure event driven and using state machines. Futures make failure cases very ugly very quickly and they are really just methadone for trying to wean programmers off synchronous designs :-) 
    • @wattersjames: Can your PaaS automate Google Compute Engine? If it can, you will find that it can create VMs in only 35 seconds, at most any scale.
    • Peter M. Hoffmann: Considering the inherent drive of matter to form ever more complex structures, life seems inevitable.

  • The marker for a new generation, like kids who will never know a card catalogue, Kodak film, pay phones, phone books, VHS tapes, typewriters, or floppy disks: Co-Founder Of Snapchat Admits He's Never Owned A Physical Server

  • Want the secret of becoming a hugely popular site? Make it fast and it will become popular. It's science. Are Popular Websites Faster?: No matter what distribution of websites you look at – the more popular websites trend faster. Even the slowest popular website is much faster than those that are less popular. On the web, the top 500 sites are nearly 1s faster (by the median), and on mobile it is closer to 1.5s faster. 

  • In 1956 we may not have had BigData, but BigStorage was definitely in. Amazing picture of IBM's 5 mega-byte drive weighing in at more than 2,000 pounds.

Don't miss all that the Internet has to say on Scalability, click below and become eventually consistent with all scalability knowledge...

Click to read more ...

Wednesday
Jan082014

Under Snowden's Light Software Architecture Choices Become Murky

Adrian Cockcroft on the future of Cloud, Open Source, SaaS and the End of Enterprise Computing:

Most big enterprise companies are actively working on their AWS rollout now. Most of them are also trying to get an in-house cloud to work, with varying amounts of success, but even the best private clouds are still years behind the feature set of public clouds, which is has a big impact on the agility and speed of product development

While the Snowden revelations have tattered the thin veil of trust secreting Big Brother from We the People, they may also be driving a fascinating new tension in architecture choices between Cloud Native (scale-out, IaaS), Amazon Native (rich service dependencies), and Enterprise Native (raw hardware, scale-up).

This tension became evident in a recent HipChat interview where HipChat, makers of an AWS based SaaS chat product, were busy creating an on-premises version of their product that could operate behind the firewall in enterprise datacenters. This is consistent with other products from Atlassian in that they do offer hosted services as well as installable services, but it is also an indication of customer concerns over privacy and security.

The result is a potential shattering of backend architectures into many fragments like we’ve seen on the front-end. On the front-end you can develop for iOS, Android, HTML5, Windows, OSX, and so on. Any choice you make is like declaring for a cold war power in a winner take all battle for your development resources. Unifying this mess has been the refuge of cloud based services over HTTP. Now that safe place is threatened.

To see why...

Click to read more ...

Tuesday
Jan072014

Sponsored Post: Netflix, Logentries, Host Color, Booking, Apple, ScaleOut, MongoDB, BlueStripe, AiScaler, Aerospike, LogicMonitor, AppDynamics, ManageEngine, Site24x7

Who's Hiring?

  • Apple is hiring for multiple positions. Imagine what you could do here. At Apple, great ideas have a way of becoming great products, services, and customer experiences very quickly.
    • Sr Software Engineer. The iOS Systems Team is looking for a Software Engineer to work on operations, tools development and support of worldwide iOS Device sales and activations. Please apply here
    • Sr. Security Software Developer. We are looking for an excellent programmer who's done extensive security programming. This individual will participate in various security projects from the start to the end. In addition to security concepts, it's important to have intricate knowledge of different flavors of Unix operating systems to develop code that's compact and optimal. Familiarity with key exchange protocols, authentication protocols and crypto analysis is a plus. Please apply here.
    • Senior Server Side Engineer. The Emerging Technology team is looking for a highly motivated, detail-oriented, energetic individual with experience in a variety of big data technologies.  You will be part of a fast growing, cohesive team with many exciting responsibilities related to Big Data, including: Develop scalable, robust systems that will gather, process, store large amount of data Define/develop Big Data technologies for Apple internal and customer facing applications. Please apply here.
    • Senior Server Side Engineer. The Emerging Technology team is looking for a highly motivated, detail-oriented, energetic individual with experience in a variety of big data technologies.  You will be part of a fast growing, cohesive team with many exciting responsibilities related to Big Data, including: Develop scalable, robust systems that will gather, process, store large amount of data Define/develop Big Data technologies for Apple internal and customer facing applications. Please apply here.
    • Senior Engineer: Emerging Technology. Apple’s Emerging Technology group is looking for a senior engineer passionate about exploring emerging technologies to create paradigm shifting cloud based solutions. Please apply here

  • The Netflix Cloud Performance Team is hiring. Help tackle the more complex scalability challenges emerging on the cloud today, wrangling tens of thousands of instances, handling billions of requests a day. We are searching for a Senior Performance Architect and a Senior Cloud Performance Tools Engineer

  • We need awesome people @ Booking.com - We want YOU! Come design next
    generation interfaces, solve critical scalability problems, and hack on one of the largest Perl codebases. Apply: http://www.booking.com/jobs.en-us.html

  • UI EngineerAppDynamics, founded in 2008 and lead by proven innovators, is looking for a passionate UI Engineer to design, architect, and develop our their user interface using the latest web and mobile technologies. Make the impossible possible and the hard easy. Apply here.

  • Software Engineer - Infrastructure & Big DataAppDynamics, leader in next generation solutions for managing modern, distributed, and extremely complex applications residing in both the cloud and the data center, is looking for a Software Engineers (All-Levels) to design and develop scalable software written in Java and MySQL for backend component of software that manages application architectures. Apply here.

Fun and Informative Events

  • Your amazing event here.

Cool Products and Services

  • Log management made easy with Logentries Billions of log events analyzed every day to unlock insights from the log data the matters to you. Simply powerful search, tagging, alerts, live tail and more for all of your log data. Automated AWS log collection and analytics, including CloudWatch events. 

  • HostColorCloud Servers based on OpenNebula Cloud automation IaaS. Cloud Start features: 256 MB RAM; 1000 GB Premium Bandwidth; 10 GB fast, RAID 10 protected Storage; 1 CPU Core;  /30 IPv4 (4 IPs) and /48 IPv6 subnet and costs only $9.95/mo. Client could choose an OS (CentOS is default). 

  • LogicMonitor is the cloud-based IT performance monitoring solution that enables companies to easily and cost-effectively monitor their entire IT infrastructure stack – storage, servers, networks, applications, virtualization, and websites – from the cloud. No firewall changes needed - start monitoring in only 15 minutes utilizing customized dashboards, trending graphs & alerting

  • Rapidly Develop Hadoop MapReduce Code. With ScaleOut hServer™ you can use a subset of your Hadoop data and run your MapReduce code in seconds for fast code development and you don’t need to load and manage the Hadoop software  stack, it's a self-contained Hadoop MapReduce execution environment. To learn more check out www.scaleoutsoftware.com/prototypehadoop/

  • MongoDB Backup Free Usage Tier Announced. We're pleased to introduce the free usage tier to MongoDB Management Service (MMS). MMS Backup provides point-in-time recovery for replica sets and consistent snapshots for sharded systems with minimal performance impact. Start backing up today at mms.mongodb.com.

  • BlueStripe FactFinder Express is the ultimate tool for server monitoring and solving performance problems. Monitor URL response times and see if the problem is the application, a back-end call, a disk, or OS resources.

  • Aerospike Capacity Planning Kit. Download the Capacity Planning Kit to determine your database storage capacity and node requirements. The kit includes a step-by-step Capacity Planning Guide and a Planning worksheet. Free download.

  • aiScaler, aiProtect, aiMobile Application Delivery Controller with integrated Dynamic Site Acceleration, Denial of Service Protection and Mobile Content Management. Cloud deployable. Free instant trial, no sign-up required.  http://aiscaler.com/

  • ManageEngine Applications Manager : Monitor physical, virtual and Cloud Applications.

  • www.site24x7.com : Monitor End User Experience from a global monitoring network.

If any of these items interest you there's a full description of each sponsor below. Please click to read more...

Click to read more ...

Monday
Jan062014

How HipChat Stores and Indexes Billions of Messages Using ElasticSearch and Redis

This article is from an interview with Zuhaib Siddique, a production engineer at HipChat, makers of group chat and IM for teams.

HipChat started in an unusual space, one you might not think would have much promise, enterprise group messaging, but as we are learning there is gold in them there enterprise hills. Which is why Atlassian, makers of well thought of tools like JIRA and Confluence, acquired HipChat in 2012.

And in a tale not often heard, the resources and connections of a larger parent have actually helped HipChat enter an exponential growth cycle. Having reached the 1.2 billion message storage mark they are now doubling the number of messages sent, stored, and indexed every few months.

That kind of growth puts a lot of pressure on a once adequate infrastructure. HipChat exhibited a common scaling pattern. Start simple, experience traffic spikes, and then think what do we do now? Using bigger computers is usually the first and best answer. And they did that. That gave them some breathing room to figure out what to do next. On AWS, after a certain inflection point, you start going Cloud Native, that is, scaling horizontally. And that’s what they did.

But there's a twist to the story. Security concerns have driven the development of an on-premises version of HipChat in addition to its cloud/SaaS version. We'll talk more about this interesting development in a post later this week.

While HipChat isn’t Google scale, there is good stuff to learn from HipChat about how they index and search billions of messages in a timely manner, which is the key difference between something like IRC and HipChat. Indexing and storing messages under load while not losing messages is the challenge.

This is the road that HipChat took, what can you learn? Let’s see…

Click to read more ...

Friday
Jan032014

Stuff The Internet Says On Scalability For January 3rd, 2014

Hey, it's HighScalability time, can you handle the truth?


Should software architectures include parasites? They increase diversity and complexity in the food web.
  • 10 Million: classic hockey stick growth pattern for GitHub repositories
  • Quotable Quotes:
    • Seymour Cray: A supercomputer is a device for turning compute-bound problems into IO-bound problems.
    • Robert Sapolsky: And why is self-organization so beautiful to my atheistic self? Because if complex, adaptive systems don’t require a blue print, they don’t require a blue print maker. If they don’t require lightning bolts, they don’t require Someone hurtling lightning bolts.
    • @swardley: Asked for a history of PaaS? From memory, public launch - Zimki ('06), BungeeLabs ('06), Heroku ('07), GAE ('08), CloudFoundry ('11) ...
    • @neil_conway: If you're designing scalable systems, you should understand backpressure and build mechanisms to support it.
    • Scott Aaronson...the brain is not a quantum computer. A quantum computer is good at factoring integers, discrete logarithms, simulating quantum physics, modest speedups for some combinatorial algorithms, none of these have obvious survival value. The things we are good at are not the same thing quantum computers are good at.
    • @rbranson: Scaling down is way cooler than scaling up.
    • @rbranson: The i2 EC2 instances are a huge deal. Instagram could have put off sharding for 6 more months, would have had 3x the staff to do it.
    • @mraleph: often devs still approach performance of JS code as if they are riding a horse cart but the horse had long been replaced with fusion reactor
  • Now we know the cost of bandwidth: Netflix’s new plan: save a buck with SD-only streaming
  • Massively interesting Stack Overflow thread on Why is processing a sorted array faster than an unsorted array? Compilers may grant a hidden boon or turn traitor with a deep deceit. How do you tell? It's about branch prediction.
  • Can your database scale to 1000 cores? Nope. Concurrency Control in the Many-core Era: Scalability and Limitations: We conclude that rather than pursuing incremental solutions, many-core chips may require a completely redesigned DBMS architecture that is built from ground up and is tightly coupled with the hardware.
  • Not all SSDs are created equal. Power-Loss-Protected SSDs Tested: Only Intel S3500 PassesWith a follow up. If data on your SSD can't survive a power outage it ain't worth a damn. 

Don't miss all that the Internet has to say on Scalability, click below and become eventually consistent with all scalability knowledge...

Click to read more ...

Thursday
Jan022014

xkcd: How Standards Proliferate:

The great thing about standards is there are so many to choose from. What is it about human nature that makes this so recognizably true?

Wednesday
Jan012014

Paper: Nanocubes: Nanocubes for Real-Time Exploration of Spatiotemporal Datasets

How do you turn Big Data into fast, useful, and interesting visualizations? Using R and technology called Nanocubes. The visualizations are stunning and amazingly reactive. Almost as interesting as the technologies behind them.

David Smith wrote a great article explaining the technology and showing a demo by Simon Urbanek of a visualization that uses 32Tb of Twitter data. It runs smoothly and interactively on a single machine with 16Gb of RAM.  For more information and demos go to nanocubes.net.

David Smith sums it up nicely:

Despite the massive number of data points and the beauty and complexity of the real-time data visualization, it runs impressively quickly. The underlying data structure is based on Nanocubes, a fast datastructure for in-memory data cubes. The basic idea is that nanocubes aggregate data hierarchically, so that as you zoom in and out of the interactive application, one pixel on the screen is mapped to just one data point, aggregated from the many that sit "behind" that pixel. Learn more about nanocubes, and try out the application yourself (modern browser required) at the link below.

Abstract from Nanocubes for Real-Time Exploration of Spatiotemporal Datasets:

Consider real-time exploration of large multidimensional spatiotemporal datasets with billions of entries, each defined by a location, a time, and other attributes. Are certain attributes correlated spatially or temporally? Are there trends or outliers in the data? Answering these questions requires aggregation over arbitrary regions of the domain and attributes of the data. Many relational databases implement the well-known data cube aggregation operation, which in a sense precomputes every possible aggregate query over the database. Data cubes are sometimes assumed to take a prohibitively large amount of space, and to consequently require disk storage. In contrast, we show how to construct a data cube that fits in a modern laptop’s main memory, even for billions of entries; we call this data structure a nanocube. We present algorithms to compute and query a nanocube, and show how it can be used to generate well-known visual encodings such as heatmaps, histograms, and parallel coordinate plots. When compared to exact visualizations created by scanning an entire dataset, nanocube plots have bounded screen error across a variety of scales, thanks to a hierarchical structure in space and time. We demonstrate the effectiveness of our technique on a variety of real-world datasets, and present memory, timing, and network bandwidth measurements. We find that the timings for the queries in our examples are dominated by network and user-interaction latencies.
Tuesday
Dec242013

Sponsored Post: Netflix, Logentries, Host Color, Booking, Spokeo, Apple, ScaleOut, MongoDB, BlueStripe, AiScaler, Aerospike, New Relic, LogicMonitor, AppDynamics, ManageEngine, Site24x7

Who's Hiring?

  • Apple is hiring for multiple positions. Imagine what you could do here. At Apple, great ideas have a way of becoming great products, services, and customer experiences very quickly.
    • Quality Assurance Engineer. The iOS Systems team is looking for a Quality Assurance engineer. In this role you will be expected to work hand-in-hand with the software engineering team to find and diagnose software defects. Please apply here.
    • Sr Software Engineer iPhone. Do you love building highly scalable, distributed web applications? Does the idea of a fast-paced environment make your heart leap? Do you want your technical abilities to be challenged every day, and for your work to make a difference in the lives of millions of people? Please apply here.
    • Sr Software Engineer. The iOS Systems Team is looking for a Software Engineer to work on operations, tools development and support of worldwide iOS Device sales and activations. Please apply here
    • Sr. Security Software Developer. We are looking for an excellent programmer who's done extensive security programming. This individual will participate in various security projects from the start to the end. In addition to security concepts, it's important to have intricate knowledge of different flavors of Unix operating systems to develop code that's compact and optimal. Familiarity with key exchange protocols, authentication protocols and crypto analysis is a plus. Please apply here.

  • The Netflix Cloud Performance Team is hiring. Help tackle the more complex scalability challenges emerging on the cloud today, wrangling tens of thousands of instances, handling billions of requests a day. We are searching for a Senior Performance Architect and a Senior Cloud Performance Tools Engineer

  • We need awesome people @ Booking.com - We want YOU! Come design next
    generation interfaces, solve critical scalability problems, and hack on one of the largest Perl codebases. Apply: http://www.booking.com/jobs.en-us.html

  • Spokeo is hiring a Senior Backend Developer. We've spent years agonizing over the best way to construct an elegant, simplistic, yet highly powerful people search engine. Spokeo deals with problems of ginormous scale, so a strong understanding and appreciation for algorithms and efficiency is desired. Please apply here.

  • Spokeo is hiring a Senior Software Developer - Web Applications. Build features that involve any of our products, from the universal people search interface and functionality to the construction of family trees to a portal that connects customers to employees and employer data. Please apply here.

  • UI EngineerAppDynamics, founded in 2008 and lead by proven innovators, is looking for a passionate UI Engineer to design, architect, and develop our their user interface using the latest web and mobile technologies. Make the impossible possible and the hard easy. Apply here.

  • Software Engineer - Infrastructure & Big DataAppDynamics, leader in next generation solutions for managing modern, distributed, and extremely complex applications residing in both the cloud and the data center, is looking for a Software Engineers (All-Levels) to design and develop scalable software written in Java and MySQL for backend component of software that manages application architectures. Apply here.

  • New Relic is looking for a Java Instrumentation Engineer, Java Scalability Engineer,  Distributed Systems Engineer and Android app engineer in Portland, OR. Ready to scale a web service with more incoming bits/second than Twitter? 

Fun and Informative Events

  • Your amazing event here.

Cool Products and Services

  • Log management made easy with Logentries Billions of log events analyzed every day to unlock insights from the log data the matters to you. Simply powerful search, tagging, alerts, live tail and more for all of your log data. Automated AWS log collection and analytics, including CloudWatch events. 

  • Why choose Host Color for your webhosting needs?  Redundant network, 100% Uptime SLA guarantee. Excellent location and high class data center. Powerful resource-rich, server systems. 24/7 Support - responsive, friendly and knowledgeable. Scalable, fairly priced, risk-free and hosting service. 

  • LogicMonitor is the cloud-based IT performance monitoring solution that enables companies to easily and cost-effectively monitor their entire IT infrastructure stack – storage, servers, networks, applications, virtualization, and websites – from the cloud. No firewall changes needed - start monitoring in only 15 minutes utilizing customized dashboards, trending graphs & alerting

  • Rapidly Develop Hadoop MapReduce Code. With ScaleOut hServer™ you can use a subset of your Hadoop data and run your MapReduce code in seconds for fast code development and you don’t need to load and manage the Hadoop software  stack, it's a self-contained Hadoop MapReduce execution environment. To learn more check out www.scaleoutsoftware.com/prototypehadoop/

  • MongoDB Backup Free Usage Tier Announced. We're pleased to introduce the free usage tier to MongoDB Management Service (MMS). MMS Backup provides point-in-time recovery for replica sets and consistent snapshots for sharded systems with minimal performance impact. Start backing up today at mms.mongodb.com.

  • BlueStripe FactFinder Express is the ultimate tool for server monitoring and solving performance problems. Monitor URL response times and see if the problem is the application, a back-end call, a disk, or OS resources.

  • Aerospike Capacity Planning Kit. Download the Capacity Planning Kit to determine your database storage capacity and node requirements. The kit includes a step-by-step Capacity Planning Guide and a Planning worksheet. Free download.

  • aiScaler, aiProtect, aiMobile Application Delivery Controller with integrated Dynamic Site Acceleration, Denial of Service Protection and Mobile Content Management. Cloud deployable. Free instant trial, no sign-up required.  http://aiscaler.com/

  • ManageEngine Applications Manager : Monitor physical, virtual and Cloud Applications.

  • www.site24x7.com : Monitor End User Experience from a global monitoring network.

If any of these items interest you there's a full description of each sponsor below. Please click to read more...

Click to read more ...