advertise
Thursday
Aug202009

Dependency Injection and AOP frameworks for .NET 

We're looking to implement a framework to do Dependency Injection and AOP for a new solution we're working on. It will likely get hit pretty hard, so we'd like to chose a framework that's proven to scale well, and operates well under pressure.

Right now, we're looking closely at Spring.NET, Castle Project's Windsor framework, and Unity. Does anyone have any feedback on implementing any of these in large, high traffic environments?

Tuesday
Aug182009

Hardware Architecture Example (geographical level mapping of servers)

I have put down my thoughts in the architecture discussed in the blog. Although I have done substantial research to understand how things should work before deciding this architecture but I will be requiring huge amount of inputs from everyone to come to an architecture decision. Hardware entities which were thought while designing the entities are:
1. Master Web Server which will map different users to web servers placed in different geographical locations. (will prefer storing a mapping table in RAM)
2. Web Servers
3. Application Servers
4. Master Database Servers (to implement entity wise look up sharding)
5. Slave Database Servers.

Will really appreciate if some good inputs of using Cloud Computing are given and how to go about it against or in addition to the given architecture. Would like to in fact know people's view on when to decide using cloud computing techniques. Looking forward for inputs from the community.

Tuesday
Aug182009

Real World Web: Performance & Scalability

We've referenced this 189 slide masterpiece by Ask Bjorn Hansen before, but it was hidden without its own first class link. He describes his presentation as 3 hours of 5 minute lightening talks and that sounds about right.

The presentation covers: overall platform and architecture considerations involved in tuning applications from a holistic perspective. You’ll be shown design scalable architectures for dynamic, high-volume web sites. Topics covered include caching, scalable database design, replication architecture, load-balancing, and architectural decisions derived from many years of experience.

His prime directive of scaling: Think Horizontally at every point in your architecture, not just at the web tier.

You may not agree with everything, but there's a lot of useful advice. Here's a summary of some of what is covered:

  • Benchmarking
  • Vertical scaling sucks.
  • Horizontal scaling rocks.
  • Run many application servers
  • Don't keep state in the app server
  • Be stateless
  • Optimization is necessary, but is different than scalability.
  • Cache things you hit all the time.
  • Measure, don't assume, check.
  • Make pages static.
  • Caching is a trade-off.
  • Cache full pages.
  • Cache partial pages.
  • Cache complex data.
  • MySQL query cache is flushed on update.
  • Cache invalidation is hard.
  • Replication scales reads, not writes.
  • Partition to scale writes. 96% of applications can skip this step.
  • Master-master setup facilitates on-line schema changes.
  • Create summary tables and summary databases rather than do COUNT and GROUP-BY at runtime.
  • Make code idempotent. If it fails you should just be able to run it again.
  • Load data asynchronously. Aggregate updates into batches.
  • Move processing to application and out of the database as much as possible.
  • Stored procedures are dangerous.
  • Add more memory.
  • Enable query logging and take a look at what your app is doing.
  • Run different MySQL instances for different work loads.
  • Config tuning helps, query tuning works.
  • Reconsider persistent DB connections.
  • Don't overwork the database. It's hard to scale.
  • Work in parallel.
  • Use a job queuing system.
  • Log http requests.
  • Use light processes for light tasks.
  • Build on APIs internally. Clean loosely coupled APIs are easy to scale.
  • Don't incur technical debt.
  • Automatically handle failures.
  • Make services that always work.
  • Load balancing is the key to horizontal scaling.
  • Redundancy is not load-balancing. Always have n+1 capacity.
  • Plan for disasters.
  • Make backups.
  • Keep software deployments easy.
  • Have everything scripted.
  • Monitor everything. Graph everything.
  • Run one service per server.
  • Don't ever swap memory for disk.
  • Run memcached if you have extra memory.
  • Use memory to save CPU or IO. Balance memory vs CPU vs IO.
  • Netboot your application servers.
  • There's lot of good slides on what to graph.
  • Use a CDN.
  • Use YSlow to find client side problems.

    This is just a high level blitz through the presentation. Topics are given a lot more detail in the presentation. Audio of Ask's dulcet tones would be nice, but there's still a lot to learn here.
  • Sunday
    Aug162009

    TechDev Stages

    Tech Dev Stages explains the basic steps involved for the product development given business problems. A must read for newbie or starters for architecture development.

    Sunday
    Aug162009

    ThePort Network  Architecture

    ThePort Network's Director of Engineering, TJ Muehleman was kind of enough to share some of the architectural details for their white label social media system. It currently runs about 50 social networks varying in size from less than 1000 members to more than 300,000 members, all on a Microsoft stack. In addition to their social networking platform, they offer Javascript APIs and web service APIs (both REST and SOAP) which account for a significant percentage of overall system usage.

    ThePort is an excellent example of a real world in-the-trenches product offering real value to customers. One of the most interesting problems they have to solve is multi-tenancy. How do you provide good performance, complete customization, support, develop new features, and provide individual search indexes for each customer? It's not an easy problem to solve.

    How did they solve their problems and build a successful system? 

    Site: http://theport.com

    Platform

  • Microsoft.NET 3.5
  • C# / VB.NET
  • SQL Server 2005
  • Visual Studio 2008 Pro Edition
  • Prototype
  • Subversion
  • TortoiseSVN
  • Trac (for internal defect tracking. Will possibly move all internal and external issue tracking to it)
  • Beyond Compare 3
  • Web Tier
    * 6 x Dell blade servers running windows 2008 / IIS 7
  • Data Tier
    * 1 r/w SQL Cluster – dell 6850s (6 single core processors, 32 GB RAM)
    * 2 read-only dell 2950 (2 quad core processors, 16 GB RAM)
    * 1 distribution server – dell 2950 (2 quad core processors, 16 GB RAM)
  • We also use SQL Server Service Broker as a queuing system for some of our saves. It's an alternative to MSMQ that uses the DB for persistence in case of failure. We will most likely be moving to MSMQ in the near future to remove us from SQL dependence.
  • Caching
    * 2 Dell blade servers 8 GB RAM each to total 16 GB of available RAM
    * Running SharedCache (Basically an open source .NET port of MemCacheD. We initially looked at MemCacheD but our internal benchmarking indicated SharedCache had better performance – at least w/in a Microsoft environment. We may still investigate Microsoft's Velocity cache platform when it goes live)
  • Search
    * 2 Dell 2950s with 725 GB Storage
    * Running Lucene + SOLR
    * We chose Lucene over Lucene.NET because Lucene.NET's wildcard search was a little buggy in our initial beta testing. SQL Full Text wasn't a viable option because there was no clear and easy way to split indexes between customers. SOLR cores make this part easy. Above and beyond that, Lucene is lightning fast and is available with features we couldn't turn down (proximity search, searching w/in documents, and built-in RESTful APIs to name a few)

    How do you handle multi-tenancy?

    A multi-tenant platform has two primary hurdles to overcome:

    1. Preventing a single, large customer from overwhelming the system?

    The primary bottleneck for this is in the data layer. Our current DB architecture has helped mitigate this problem. The read-only servers help offset most of this by absorbing the bulk of the data calls. We did have to beef up the distribution server because latency between the r/w server and the read only servers had crept too high. Getting a new machine (2 quad cores with 16 GB of RAM) helped reduce the latency to less than a second.

    However robust the cluster is, we've concluded that we will eventually have to move to a sharded architecture with MySQL. MS SQL licensing fees makes both continuing to enhance the cluster and scaling out to multiple machines prohibitive. Additionally, sharding allows us to scale either by customer (because some may be more active than others) or by functional area (photos, comments, etc).

    2. Allowing clients to have total control over the look, feel, and user experience of their sites.


    Allowing CSS control isn't enough; we needed a templating system that allows total control over the site. We looked at using .NET master pages and user controls to accomplish this. But that assumes a level of knowledge in .NET for outside developers. We built a proprietary templating system that unfortunately became too limiting and would one day lead to a drag on performance.

    So we settled on using XML / XSLT. All of our business / entity objects are serializable to XML. This made XSLT a natural choice from the templating angle. We've seen a considerable boost in performance from this upgrade and an even greater increase in flexibility in terms of what our designers can do. Once the learning curve is overcome, the web designers love the amount of control they get.

    What did you do that was especially cool that people could learn from?

    XSLT as Custom Templating System

    Building a templating system in XSLT that actually allows the template author to make a web service call to our internal web service layer (or external web services) straight from the templating system. This allowed the development team to build a flexible, powerful system that allows a web designer to embed real-time calls into a given template. We accomplish this using XSLT Extension Objects. What we've found in our internal testing is that these extension objects scale way better than our previous templating system (a homegrown proprietary system). We've used ANTS profiler to compare the two and the difference is in orders of magnitude.

    Obviously we have to cache the hell out of this or the performance of the pages the calls are embedded in would suffer. For now, we make the internal web services calls via HTTP, but we will soon be moving this to a TCP call to take advantage of the better connection pooling offered by TCP. We're most likely to use WCF because of it's native support of TCP bindings. However, we haven't yet benchmarked that so it's possible it could change.

    Not Using the Database to Build Collections

    Another cool thing we've done is to move strongly away from using the database to retrieve collections of 'things'. For instance, if we needed a collection of comments, previously we'd hit the database for the 5, 10, 100, etc comments we wanted, do the sorting / filtering in the DB, return a single dataset, cache that, and then display.

    However, this is a database intensive operation, especially if you're going to join against user data (which you inevitably will). What we've started doing recently is caching the recent comment objects, and using our cache providers MultiGet ability to simultaneously retrieve all comments at the same time. We then sort / filter in memory in the application tier, discard whatever comments we don't need, and then display. We found that doing it this way, we save lots of hits to our database and in fact, saw a considerable performance gain from it.

    Our tests (on a developer laptop) fetched 10,000 objects from cache in about 1 second, then sorted them by date time in about .015 second.

    What prompted you to move to a SOA architecture?

    To better compartmentalize our code.

    Given the growth of our templating system mentioned above, we realized it was best to truly separate the tiers into discrete areas. Since our application is easily accessed via a set of REST APIs and our own internal skinning system (and who knows what in the future), dividing the application like this gives us a lot of leeway in being able to swap out components. Additionally, we're doing more and more queuing which lines up nicely with SOA.

    Performance

    Since modern web apps deal with complex data, breaking the work into more discrete operations handled by offline processes on their own infrastructure makes a lot of sense from a performance point of view.

    How do you handle consistency between the database and the search engine?

    We have a multi-threaded windows service that scans our database once every 5 minutes looking for new data. The service then adds the new items to the Lucene index. We keep audit columns on all our database tables so capturing new data is pretty simple. Once a night, we purge the Lucene index and run a full rescan of the database. We think this system will work for the near to mid term but long term, we'll take advantage of a queuing system to keep the index in sync.

    How you handle your release, support, bug fixing, development, etc.

    We have a decent sized dev team. 1 platform architect responsible for overall system architecture (selecting which systems to use, tuning them), 1 lead software architect, and 3 senior – mid level developers. Since we're a start-up in a fast evolving market (social media) we find that we're constantly having to adjust to market demands and the latest in social functionality. So we have a 2 month build cycle which is pretty aggressive.

    In terms of actual development, we've found the following to be keys to success:
    1. Daily stand-ups: it's absolutely necessary for everyone on the team to know what the other is doing. A code base as large as ours, it's very likely I'm writing a function someone has already written or solving a problem someone has solved previously. Daily stand-ups help with that
    2. Iterate: Build the core functionality, get it into QA and / or beta, beat the bugs out of it, move to the next piece. We've found this to be easier said than done. Market pressures sometimes dictate you roll with something more feature rich than you'd like. Sticking to an iterative cycle creates better code and more market ready products.
    3. Beta test: This goes hand-in-hand w/ #2 above. Get something done and get it in the hands of actual users. This is the best way to find where your app falls down

    With regard to support / bug fixing, we're moving to a forums based support model for many of our customers. We've found the same problems, especially in an app as configurable as ours, occur over and over. Getting those answers into an open, searchable format should hopefully cut down on confusion and get developers talking directly to developers.

    Internally we use Trac for bug tracking and devote roughly 20% of our week maintaining, supporting, and fixing issues. That may seem like a lot but given how configurable our system is, we're essentially running 50 heavily data driven websites.

    WCF sounds like a buggy underpeforming mess. How is it working?

    So far we have no complaints with WCF. I think baking it directly into .NET 3.5 helped iron a lot of the big kinks out. It does come with it's quirks, no doubt. We built our REST libraries on top of it and found that posting XML is not exactly the easiest thing in the world. But it was more than made up for with the ease in deploying all our GET operations with REST. Our next step will be to set up TCP and MSMQ bindings with WCF to handle our internal service requests and queuing, respectively. Since WCF exposes all of these bindings natively, we think we will see a lot of effective code re-use out of this.

    I'd like to thank TJ for taking the time and making the effort to write up their architecture for people to learn from. I'm sure it will help others when they are trying to build their own systems.

    You too can share the architecture for your amazing system. Come on, you've learned a lot from others, it's time to return the favor and give back. It's not that hard, really. If interested please contact me and we can get started.
  • Thursday
    Aug132009

    Reconnoiter - Large-Scale Trending and Fault-Detection

    One of the top recommendations from the collective wisdom contained in Real Life Architectures is to add monitoring to your system. Now! Loud is the lament for not adding monitoring early and often. The reason is easy to understand. Without monitoring you don't know what your system is doing which means you can't fix it and you can't improve it. Feedback loops require data.

    Some popular monitor options are Munin, Nagios, Cacti and Hyperic. A relatively new entrant is a product called Reconnoiter from Theo Schlossnagle, President and CEO of OmniTI, leading consultants on solving problems of scalability, performance, architecture, infrastructure, and data management. Theo's name might sound familiar. He gives lots of talks and is the author of the very influential Scalable Internet Architectures book.

    So right away you know Reconnoiter has a good pedigree. As Theo says, their products are born of pain, from the fire of solving real-life problems and that's always a harbinger of good things to come.

    The problem Reconnoiter is trying to solve is monitoring thousands of nodes across many datacenters where the nodes can vary widely in power, architecture, and software configuration. With that kind of problem what they really want is the ability to:

     

  • Configure everything from one place.
  • Cheap checks that are made on the specified time interval and aren't late and don't cause a heavy load on the machine.
  • Change the configuration from any datacenter without coordination.
  • Add checks in the field.
  • Separate data collection from visualization and fault-detection.
  • Analyze trends for long-term capacity planning and postmortem analysis.
  • Detect when faults have happened and when they are about to happen.
  • Support trending: the intelligent data correlation, regression analysis/curve fitting and looking into the past to see how much you go where you are now so you can do better next time.
  • Create a monitoring system that doesn't require a separate powerful network and its own set of hosts on which to run.

    If you've ever used or written a distributed stats collection system the architecture of Reconnoiter will look somewhat familiar:


    Some of the more interesting bits of the architecture are:
  • PostgresSQL stores all the data. The data isn't stuck in funky little files.
  • Fault-detection is based on Esper, a streaming complex event processing system. It's not clear how well this approach will work but the hooks are there.
  • A Comet-style web server is used to feed real-time updates. Much better than your traditional polling cycle.
  • Although the web console is PHP based, PHP is used mainly to execute Json calls. Rendering happens in the browser in an AJAX client.
  • Canvas is used for real time graphics. No images are created on the fly.
  • Data is transferred securely over SSL.
  • The system is robust against failures.
  • Data is not thrown away as it is with some systems so you can check against history.

    Reconnoiter isn't completely pain free. Lua for an extension language is an interesting choice. The installation and configuration process is very complex. There are a lot of separate steps and bits to configure. Another potential problem is monitoring produces a lot of real-time data. I have to wonder if PostgresSQL can handle that flow for very large systems. The data is partitioned by month, but a large number of machines and a large number of events can be crushing. And I wasn't sure if graph data could be correlated with released features or other system changes. In the video Theo mentions seeing in the graphs that using deflate improved performance, but I'm not sure just looking at the graph how you would be able correlate system data with system changes.

    It's droolingly clear where Reconnoiter shines is on creating complex graphs, charts, and other visualizations. The graphs look useful and quick to render. The real time visualizations are spectacular and extremely are difficult to do in other systems.

    Related Articles



  • OmniTI Reconnoiter: Web Management and Analysis by Eric J. Bruno
  • Reconnoiter Update by Theo Schlossnagle
  • Reconnoiter Project Home Page
  • Video: Reconnoiter: a whirlwind tour
  • Big Picture of the Overall System
  • Reconnoiter: Monitoring and Trend Analysis from OSCON
  • OmniTI Unveils Open Source Monitoring Tool, Reconnoiter by Jayashree Adkoli
  • The sad state of open source monitoring tools by Grig Gheorghiu
  • How to Succeed at Capacity Planning Without Really Trying : An Interview with Flickr's John Allspaw on His New Book.
  • New open source IT management tool: Lighter-weight than Nagios, more granular than Cacti by Matt Stansberry
  • Tuesday
    Aug112009

    13 Scalability Best Practices

    AFK Partners has release what they feel are the Best Practices for Scalability:

    1. Asynchronous - Use asynchronous communication when possible. 
    2. Swim Lanes – Create fault isolated “swim lanes” of hardware by customer segmentation.
    3. Cache - Make use of cache at multiple layers.
    4. Monitoring - Understand your application’s performance from a customer’s perspective. 
    5. Replication - Replicate databases for recovery as well as to off load reads to multiple instances.
    6. Sharding - Split the application and databases by service and / or by customer using a modulus. 
    7. Use Few RDBMS Features – Use the OLTP database as a persistent storage device as much as possible. 
    8. Slow Roll – Roll out new code versions slowly, to a small subset of your servers without bringing the entire site down.
    9. Load & Performance Testing – Test the performance of the application version before it goes into production. 
    10. Capacity Planning / Scalability Summits – Know how much capacity you have on all tiers and services in your system. 
    11. Rollback – Always have the ability to rollback a code release.
    12. Root Cause Analysis - Ensure you have a learning culture that is evident by utilizing Root Cause Analysis to find and fix the real cause of issues.
    13. Quality From The Beginning – Quality can’t be tested into a product, it must be designed in from the beginning.
    This is just a quick summary, more details on their site.

    Sunday
    Aug092009

    Writing about cisco loadbalancer?

    Guys,

    At one of my jobs I have to administer a CISCO ACE (application control engine) hardware load-balancer.
    I don't particularly love this beast, but it's very very powerful.

    There appears to be little real-world info out there, so it could be interesting writing an article on that.

    But I don't have other HW LB's to compare it to and I don't want to rehash the product page.


    What would interest you in a 'product review' of a loadbalancer?
    No replies means it's not an interesting topic, so no article then ;-)

    Sunday
    Aug092009

    NoSQL: If Only It Was That Easy

    In this blog post BJ Clark overviews the available key-value stores that aren't SQL based.
    On some projects he yields interesting experiences. Make sure to check out the comments as well.

    The post has spread throughout social tiny text services so there is a chance you've already read it, but still worthwhile to put here.

    Saturday
    Aug082009

    1dbase vs. many and cloud hosting vs. dedicated server(s)?

    Me and my partner are making a blueprint for an online webshop service. The purpose of this project is to make webshops available for small company's/ individuals automatically just by creating an account with us. Our webapp can be used to add products/pages/... to the store and we'll handle secure checkout by paypal.

    Our app should be scalable and manageable. Because we also want to offer free webshops, the amount of webshops could be +10.000 within a few years. We are building on the Zend framework and are using mysql for database.

    From the start we want to build our application for optimal and easy scalability in the future, to avoid a lot changes to our app/database in the future.

    Now our questions are:

    Should we use?:
    * one database for all shops (or limited to X shops );
    * one database for each new shop (each having products, orders... tables);

    I think both approaches have PRO/CONS. What do you think ? Does anyone has experience with this kind of structure ?

    PRO:
    one database: easier to make changes to database layout
    multiple databases:more scalable, easier to backup/restore

    CON:
    one database: harder to code because of extra keyfields in tables , slower or more difficult to backup/restore.
    multiple databases: Takes longer to push changes to database layout

    We are totally not clear on what hosting we should use. Would it be a solution to use a cloud service such as mosso/amazon/gogrid? Or is it better to just start with one dedicated server and expand 'manually' later?

    Thanks in advance for your help!