advertise

Entries in Scalability (10)

Tuesday
Jan142014

Ask HS: Design and Implementation of scalable services?

We have written agents deployed/distributed across the network. Agents sends data every 15 Secs may be even 5 secs. Working on a service/system to which all agent can post data/tuples with marginal payload. Upto 5% drop rate is acceptable. Ultimately the data will be segregated and stored into DBMS System (currently we are using MSQL).

Question(s) I am looking for answer

1. Client/Server Communication: Agent(s) can post data. Status of sending data is not that important. But there is a remote where Agent(s) to be notified if the server side system generates an event based on the data sent.

- Lot of advices from internet suggests using Message Bus (ActiveMQ) for async communication. Multicast and UDP are the alternatives.

2. Persistence: After some evaluation data to be stored in DBMS System.

- End of processing data is an aggregated record for which MySql looks scalable. But on the volume of data is exponential. Considering HBase as an option.

Looking if there are any alternatives for above two scenarios and get expert advice.

Tuesday
Nov052013

10 Things You Should Know About AWS

Authored by Chris Fregly:  Former Netflix Streaming Platform Engineer, AWS Certified Solution Architect and Purveyor of fluxcapacitor.com.

Ahead of the upcoming 2nd annual re:Invent conference, inspired by Simone Brunozzi’s recent presentation at an AWS Meetup in San Francisco, and collected from a few of my recent Fluxcapacitor.com consulting engagements, I’ve compiled a list of 10 useful time and clock-tick saving tips about AWS.

1) Query AWS resource metadata

 

Can’t remember the EBS-Optimized IO throughput of your c1.xlarge cluster?  How about the size limit of an S3 object on a single PUT?  awsnow.info is the answer to all of your AWS-resource metadata questions.  Interested in integrating awsnow.info with your application?  You’re in luck.  There’s now a REST API, as well!

Note:  These are default soft limits and will vary by account.

2) Tame your S3 buckets

 

Delete an entire S3 bucket with a single CLI command:  

aws s3 rb s3://<bucket-name> --force

Recursively copy a local directory to S3:

aws s3 cp <local-dir-name> s3://<bucket-name> --region <region-name> --recursive

3) Understand AWS cross-region dependencies

Click to read more ...

Thursday
May242012

Build your own twitter like real time analytics - a step by step guide

Major social networking platforms like Facebook and Twitter have developed their own architectures for handling the need for real-time analytics on huge amounts of data. However, not every company has the need or resources to build their own Twitter-like solution.

In this example we have taken the same Twitter/Facebook-like blueprint, and made it simple enough for developers to implement. We have taken the following approach in our implementation: 

  1. Use In Memory Data Grid (XAP) for handling the real time stream data-processing.
  2. BigData data-base (Cassandra) for storing the historical data and manage the trend analytics 
  3. Use Cloudify (cloudifysource.org)  for managing and automating the deployment on private or pubic cloud

The example demonstrate a simple case of word count analytics. It uses Spring Social to plug-in to real twitter feeds. The solution is designed to efficiently cope with getting and processing the large volume of tweets. First, we partition the tweets so that we can process them in parallel, but we have to decide on how to partition them efficiently. Partitioning by user might not be sufficiently balanced, therefore we decided to partition by the tweet ID, which we assume to be globally unique. Then we need persist and process the data with low latency, and for this we store the tweets in memory.

Click to read more ...

Tuesday
Nov012011

Finding the Right Data Solution for Your Application in the Data Storage Haystack

The InfoQ article Finding the Right Data Solution for Your Application in the Data Storage Haystack makes a series of concrete recommendations for a user who wants to find the right storage solution for his application.  

Few years back, there was a time SQL RDBMS were solution for almost all storage needs, but we all know how scaling came along and shattered the perfect dream. Then NoSQL happened, and now we are end up with a Haystack of solutions. For example, Local memory, Relational, Files, Distributed Cache, Column Family Storage, Document Storage, Name value pairs, Graph DBs, Service Registries, Queue, and Tuple Space etc. are some classes of such solutions.

We discuss about how to find the right storage solution, and we make choices often when we design. But, when comes to describe how to select the right one, we often end up giving very high-level guideline. The article argues that the way to make more concrete recommendations is to drill down into bit more detail and consider them case by case.

To that end the article takes four parameters about an application/usecase (Scale, Consistency, Type of Data, and Queries needed), then take some 40+ cases that arises from different value combination of those parameters and make one or more concrete recommendations on right storage solution for that case.

What follows are the four parameters and potential values they can take and the recommendations for structured, semi-structured, and unstructured data: 

Click to read more ...

Friday
Sep232011

The Real News is Not that Facebook Serves Up 1 Trillion Pages a Month…

It’s how much load that really generates and how it scales to meet the challenge.

imageThere’s some amount of debate whether Facebook really crossed over the one trillion page view per month threshold. While one report says it did, another respected firm says it did not; that its monthly page views are a mere 467 billion per month.

In the big scheme of things, the discrepancy is somewhat irrelevant, as neither show the true load on Facebook’s infrastructure – which is far more impressive a set of numbers than its externally measured “page view” metric.

Click to read more ...

Tuesday
Jul262011

Web 2.0 Killed the Middleware Star

It started out innocently enough with a simple question, “What exactly *is* the model for PaaS services scalability? If based on HTTP/REST API integration, fairly easy. If native middleware… input?” You’ll forgive the odd phrasing – Twitter’s limitations sometimes make conversations of this nature … interesting.

The discussion culminated in what appeared to be the sentiment that middleware was mostly obsolete with respect to PaaS.

Very briefly for those of you who are more infrastructure / network minded than application architecture fluent, let’s review the traditional middleware-based application architecture...

Click to read more ...

Monday
Jul182011

Building your own Facebook Realtime Analytics System  

Recently, I was reading Todd Hoff's write-up on FaceBook real time analytics system. As usual, Todd did an excellent job in summarizing this video from Engineering Manager at Facebook Alex Himel.

In the first post, I’d like to summarize the case study, and consider some things that weren't mentioned in the summaries. This will lead to an architecture for building your own Realtime Time Analytics for Big-Data that might be easier to implement, using Facebook's experience as a starting point and guide as well as the experience gathered through a recent work with few of GigaSpaces customers. The second post provide a summary of that new approach as well as a pattern and a demo for building your own Real Time Analytics system..

Click to read more ...

Tuesday
Apr122011

Caching and Processing 2TB Mozilla Crash Reports in memory with Hazelcast

Mozilla processes TB's of Firefox crash reports daily using HBase, Hadoop, Python and Thrift protocol. The project is called Socorro, a system for collecting, processing, and displaying crash reports from clients. Today the Socorro application stores about 2.6 million crash reports per day. During peak traffic, it receives about 2.5K crashes per minute. 

In this article we are going to demonstrate a proof of concept showing how Mozilla could integrate Hazelcast into Socorro and achieve caching and processing 2TB of crash reports with 50 node Hazelcast cluster. The video for the demo is available here.

 

To read the rest of the article please click below...

Click to read more ...

Monday
Oct182010

NoCAP

In this post i wanted to spend sometime on the CAP theorem and clarify some of the confusion that i often see when people associate CAP with scalability without fully understanding the implications that comes with it and the alternative approaches

You can read the full article here

Wednesday
Dec302009

Terrastore - Scalable, elastic, consistent document store.

Terrastore is a new-born document store which provides advanced scalability and elasticity features without sacrificing consistency.

Here are a few highlights:

  • Ubiquitous: based on the universally supported HTTP protocol.
  • Distributed: nodes can run and live everywhere on your network.
  • Elastic: you can add and remove nodes dynamically to/from your running cluster with no downtime and no changes at all to your configuration.
  • Scalable at the data layer: documents are partitioned and distributed among your nodes, with automatic and transparent re-balancing when nodes join and leave.
  • Scalable at the computational layer: query and update operations are distributed to the nodes which actually holds the queried/updated data, minimizing network traffic and spreading computational load.
  • Consistent: providing per-document consistency, you're guaranteed to always get the latest value of a single document, with read committed isolation for concurrent modifications.
  • Schemaless: providing a collection-based interface holding JSON documents with no pre-defined schema, you can just create your collections and put everything you want into.
  • Easy operations: install a fully working cluster in just a few commands and no XML to edit.
  • Features rich: support for push-down predicates, range queries and server-side update functions.

Read, participate, download and clone it!