« Database parallelism choices greatly impact scalability | Main | .Net2 and AJAX scalability? »

Scaling Early Stage Startups

Mark Maunder of No VC Required--who advocates not taking VC money lest you be turned into a frog instead of the prince (or princess) you were dreaming of--has an excellent slide deck on how to scale an early stage startup. His blog also has some good SEO tips and a very spooky widget showing the geographical location of his readers. Perfect for Halloween! What is Mark's other worldly scaling strategies for startups?

Site: http://novcrequired.com/

Information Sources

  • Slides from Seattle Tech Startup Talk.
  • Scaling Early Stage Startups blog post by Mark Maunder.

    The Platform

  • Linxux
  • An ISAM type data store.
  • Perl
  • Httperf is used for benchmarking.
  • Websitepulse.com is used for perf monitoring.

    The Architecture

  • Performance matters because being slow could cost you 20% of your revenue. The UIE guys disagree saying this ain't necessarily so. They explain their reasoning in Usability Tools Podcast: The Truth About Page Download Time. The idea is: "There was still another surprising finding from our study: a strong correlation between perceived download time and whether users successfully completed their tasks on a site. There was, however, no correlation between actual download time and task success, causing us to discard our original hypothesis. It seems that, when people accomplish what they set out to do on a site, they perceive that site to be fast." So it might be a better use of time to improve the front-end rather than the back-end.

  • MySQL was dumped because of performance problems: MySQL didn't handle a high number of writes and deletes on large tables, writes blow away the query cache, large numbers of small tables (over 10,000) are not well supported, uses a lot of memory to cache indexes, maxed out at 200 concurrent read/write queuries per second with over 1 million records.

  • For data storage they evolved to a fixed length ISAM like record scheme that allows seeking directly to the data. Still uses file level locking and its benchmarked at 20,000+ concurrent reads/writes/deletes. Considering moving to BerkelyDB which is a very highly performing and is used by many large websites, especially when you primarily need key-value type lookups. I think it might be interesting to store json if a lot of this data ends up being displayed on the web page.

  • Moved to httpd.prefork for Perl. That with no keepalive on the application servers uses less RAM and works well.

    Lessons Learned

  • Configure your DB and web server correctly. MySQL and Apache's memory usage can easily spiral out of control which leads gridingly slow performance as swapping increases. Here are a few resources for helping with configuration issues.

  • Serve only the users you care about. Block content theives that crawl your site using a lot of valuable resources for nothing. Monitor the number of content pages they fetch per minute. If a threshold is exceeded and then do a reverse lookup on their IP address and configure your firewall to block them.

  • Cache as much DB data and static content as possible. Perl's Cache::FileCache was used to cache DB data and rendered HTML on disk.

  • Use two different host names in URLs to enable browser clients to load images in parallele.

  • Make content as static as possible Create a separate Image and CSS server to serve the static content. Use keepalives on static content as static content uses little memory per thread/process.

  • Leave plenty of spare memory. Spare memory allows Linux to use more memory fore file system caching which increased performance about 20 percent.

  • Turn Keepalive off on your dynamic content. Increasing http requests can exhaust the thread and memory resources needed to serve them.

  • You may not need a complex RDBMS for accessing data. Consider a lighter weight database BerkelyDB.
  • Reader Comments (8)

    "they evolved to a fixed length ISAM like record scheme" - I'm not clear, which application is that? They're not using BerkleyDB and no longer using MySQL, do they say what they are using?

    http://www.callum-macdonald.com/" title="Callum" target="_blank">Callum

    November 29, 1990 | Unregistered Commenterchmac

    Hi Callum,

    I got an email from Todd with your question. :)

    We built our own fast file storage routines from the ground up. It's loosely based on ISAM or MySQL's MyISAM in that it uses fixed length sequential records. It's a lot faster for certain specific operations that we require. Unfortunately it's not open source at this time but perhaps we'll release it in future.


    Mark Maunder
    FEEDJIT Founder & CEO

    November 29, 1990 | Unregistered CommenterMark Maunder

    In the powerpoint "slide deck" he says about MySQL "MySQL doesn’t support a large number of small tables (over 10,000)."

    Why on earth would you have over 10,000 tables? That sounds like bad design.

    November 29, 1990 | Unregistered CommenterDimitri Sokolov

    @Dimitri: Joining in a bit late but in answer to your question, some cases are more efficiently solved by using multiple small tables rather than one large one.

    A case in point is WordPress Multi User (http://mu.wordpress.org/faq/) creating tables for each blog.

    November 29, 1990 | Unregistered CommenterEkerete Akpan

    Just in case you don't want to follow the link :-)

    WordPress MU creates tables for each blog, which is the system we found worked best for plugin compatibility and scaling after lots of testing and trial and error. This takes advantage of existing OS-level and MySQL query caches and also makes it infinitely easier to segment user data, which is what all services that grow beyond a single box eventually have to do. We're practical folks, so we'll use whatever works best, and for the 2.3m and counting on WordPress.com, MU has been a champ.

    November 29, 1990 | Unregistered CommenterTodd Hoff

    We built our own fast file storage routines from the ground up. It's loosely based on ISAM or MySQL's MyISAM in that it uses fixed length sequential records. It's a lot faster for certain specific operations that we require. Unfortunately it's not open source at this time but perhaps we'll release it in future.

    November 29, 1990 | Unregistered Commenterflash games

    If you are starting a company, read this book:
    http://www.amazon.com/gp/product/0470345233?ie=UTF8&tag=innoblog-20&linkCode=as2&camp=1789&creative=9325&creativeASIN=0470345233">How to Castrate a Bull: Unexpected Lessons on Risk, Growth, and Success in Businesshttp://www.assoc-amazon.com/e/ir?t=innoblog-20&l=as2&o=1&a=0470345233" width="1" height="1" border="0" alt="" style="border:none !important; margin:0px !important;" /> by NetApp's founder, Dave Hitz, provides direct, honest, thoughtful business advice, applicable to business founders and leaders throughout the growth cycle of a business. He puts special emphasis on hard choices and decision-making processes, with an understanding that comes from a life-time of risk taking. If you are a first time entrepreneur, read this book. If you are entering a growth phase for your company, read this book. If you failed at your first venture and want to understand why, read this book. And if you want a few good laughs, read this book. It should make scaling your company more fun on the way.

    November 29, 1990 | Unregistered Commentergeekr

    PostPost a New Comment

    Enter your information below to add a new comment.
    Author Email (optional):
    Author URL (optional):
    Some HTML allowed: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <code> <em> <i> <strike> <strong>