advertise
Tuesday
Apr082008

Google AppEngine - A First Look

I haven't developed an AppEngine application yet, I'm just taking a look around their documentation and seeing what stands out for me. It's not the much speculated super cluster VM. AppEngine is solidly grounded in code and structure. It reminds me a little of the guy who ran a website out of S3 with a splash of Heroku thrown in as a chaser. The idea is clearly to take advantage of our massive multi-core future by creating a shared nothing infrastructure based firmly on a core set of infinitely scalable database, storage and CPU services. Don't forget Google also has a few other services to leverage: email, login, blogs, video, search, ads, metrics, and apps. A shared nothing request is a simple beast. By its very nature shared nothing architectures must be composed of services which are themselves already scalable and Google is signing up to supply that scalable infrastructure. Google has been busy creating a platform of out-of-the-box scalable services to build on. Now they have their scripting engine to bind it all together. Everything that could have tied you to a machine is tossed. No disk access, no threads, no sockets, no root, no system calls, no nothing but service based access. Services are king because they are easily made scalable by load balancing and other tricks of the trade that are easily turned behind the scenes, without any application awareness or involvement. Using the CGI interface was not a mistake. CGI is the perfect metaphor for our brave new app container world: get a request, process the request, die, repeat. Using AppEngine you have no choice but to write an app that can be splayed across a pointy well sharpened CPU grid. CGI was devalued because a new process had to be started for every request. It was too slow, too resource intensive. Ironic that in the cloud that's exactly what you want because that's exactly how you cause yourself fewer problems and buy yourself more flexibility. The model is pure abstraction. The implementation is pure pragmatism. Your application exists in the cloud and is in no way tied to any single machine or cluster of machines. CPUs run parallel through your application like a swarm of busy bees while wizards safely hidden in a pocket of space-time can bend reality as much as they desire without the muggles taking notice. Yet the abstraction is implemented in a very specific dynamic language that they already have experience with and have confidence they can make work. It's a pretty smart approach. No surprise I guess. One might ask: is LAMP dead? Certainly not in the way Microsoft was hoping. AppEngine is so much easier to use than the AWS environment of EC2, S3, SQS, and SDB. Creating an app in AWS takes real expertise. That's why I made the comparison of AppEngine to Heroku. Heroku is a load and go approach for RoR whereas AppEngine uses Python. You basically make a Python app using services and it scales. Simple. So simple you can't do much beyond making a web app. Nobody is going to make a super scalable transcoding service out of AppEngine. You simply can't load the needed software because you don't have your own servers. This is where Amazon wins big. But AppEngine does hit a sweet spot in the market: website builders who might have previously went with LAMP. What isn't scalable about AppEngine is the scalability of the complexity of the applications you can build. It's a simple request response system. I didn't notice a cron service, for example. Since you can't write your own services a cron service would give you an opportunity to get a little CPU time of your own to do work. To extend this notion a bit what I would like to see as an event driven state machine service that could drive web services. If email needs to be sent every hour, for example, who will invoke your service every hour so you can get the CPU to send the email? If you have a long running seven step asynchronous event driven algorithm to follow, how will you get the CPU to implement the steps? This may be Google's intent. Or somewhere in the development cycle we may get more features of this sort. But for now it's a serious weakness. Here's are a quick tour of a few interesting points. Please note I'm copying large chunks of their documentation in this post as that seems the quickest way to the finish line...

  • Very nice project page at Google App Engine. Already has a FAQ, articles, blog, forums, example applications, nice tutorial, and a nice touch is how to work with Django. Some hard chargers are already posting questions to the forum.
  • Python only. More languages will follow. As you are uploading clear text into the engine there's no hiding from mother Google.
  • You aren't getting root. Applications run in a sandbox, which is a secure environment that provides limited access to the underlying operating system. These limitations allow App Engine to distribute web requests for the application across multiple servers, and start and stop servers to meet traffic demands. - An application can only access other computers on the Internet through the provided URL fetch and email services and APIs. Other computers can only connect to the application by making HTTP (or HTTPS) requests on the standard ports. - An application cannot write to the file system. An app can read files, but only files uploaded with the application code. The app must use the App Engine datastore for all data that persists between requests. - Application code only runs in response to a web request, and must return response data within a few seconds. A request handler cannot spawn a sub-process or execute code after the response has been sent.
  • The data access trend continues with the RDBMS being dissed infavor of a properties type interface. - The datastore is not like a traditional relational database. Data objects, or "entities," have a kind and a set of properties. Queries can retrieve entities of a given kind filtered and sorted by the values of the properties. Property values can be of any of the supported property value types. - The datastore uses optimistic locking for concurrency control. An update of a entity occurs in a transaction that is retried a fixed number of times if other processes are trying to update the same entity simultaneously. - They have some notion of transaction: The datastore implements transactions across its distributed network using "entity groups." A transaction manipulates entities within a single group. Entities of the same group are stored together for efficient execution of transactions. Your application can assign entities to groups when the entities are created.
  • You've got mail: Applications can also send email messages using App Engine's mail service. The mail service also uses Google infrastructure to send email messages. If you've ever been marked a spammer because you send a little email, this is actually a nice feature.
  • It's mostly free for now: 500MB of storage, up to 5 million page views a month, and 10GB bandwidth per day. Additional resources will be available for $$$.
  • Limits exist on various features. If a request takes too long it's killed. You can only get 1,000 results at a time. That sort of thing. Pretty reasonable.
  • Developers must download a Windows, Mac OS X or Linux SDK. Python 2.5 is required.
  • The SDK includes a web server application that simulates the App Engine environment. So this in the RoR and GWT type mold where you have a nice local development environment that emulates what happens in the deployment environment.
  • Google App Engine supports any framework written in pure Python that speaks CGI (and any WSGI-compliant framework using a CGI adaptor), including Django, CherryPy, Pylons, and web.py. You can bundle a framework of your choosing with your application code by copying its code into your application directory.
  • Google has their own framework called webapp. Nice MS style naming.
  • Here's a hello world application using webapp:
    import wsgiref.handlers
    
    from google.appengine.ext import webapp
    
    class MainPage(webapp.RequestHandler):
      def get(self):
        self.response.headers['Content-Type'] = 'text/plain'
        self.response.out.write('Hello, webapp World!')
    
    def main():
      application = webapp.WSGIApplication(
                                           [('/', MainPage)],
                                           debug=True)
      wsgiref.handlers.CGIHandler().run(application)
    
    if __name__ == "__main__":
      main()
    
    This code defines one request handler, MainPage, mapped to the root URL (/). When webapp receives an HTTP GET request to the URL /, it instantiates the MainPage class and calls the instance's get method. Inside the method, information about the request is available using self.request. Typically, the method sets properties on self.response to prepare the response, then exits. webapp sends a response based on the final state of the MainPage instance. The application itself is represented by a webapp.WSGIApplication instance. The parameter debug=true passed to its constructor tells webapp to print stack traces to the browser output if a handler encounters an error or raises an uncaught exception. You may wish to remove this option from the final version of your application.
  • Google is standardizing components on their infrastructure. Take the login interface. When your application is running on App Engine, users will be directed to the Google Accounts sign-in page, then redirected back to your application after successfully signing in or creating an account.
  • Forms looks normal. Lots of embedded html. Take a look. Python like Perl has a nice bulk string handling syntax so this style isn't as fugly as it would be in C++ or Java.
  • Database access is built around Data Models: A model describes a kind of entity, including the types and configuration for its properties. Here's a taste:
    Example of creation:
    from google.appengine.ext import db
    from google.appengine.api import users
    
    class Pet(db.Model):
      name = db.StringProperty(required=True)
      type = db.StringProperty(required=True, choices=set("cat", "dog", "bird"))
      birthdate = db.DateProperty()
      weight_in_pounds = db.IntegerProperty()
      spayed_or_neutered = db.BooleanProperty()
      owner = db.UserProperty()
    
    pet = Pet(name="Fluffy",
              type="cat",
              owner=users.get_current_user())
    pet.weight_in_pounds = 24
    pet.put()
    
    Example of get, modify, save:
    if users.get_current_user():
      user_pets = db.GqlQuery("SELECT * FROM Pet WHERE pet.owner = :1",
                              users.get_current_user())
      for pet in user_pets:
        pet.spayed_or_neutered = True
    
      db.put(user_pets)
    
    Looks like your normal overly complex data access. Me, I appreciate the simplicity of a string based property interface.
  • You can use Django's HTML Template system.
  • Static files are served using automated mapping mechanism. You don't get local disk store for your css, flash, and js files.
  • Applications are loaded using a command line tool: appcfg.py update helloworld/.
  • Applications are accessed like: http://application-id.appspot.com. You get your domain name.
  • There's a dashboard that has six graphs that give you a quick visual reference of your system usage: Requests per Second, Errors per Second, Bytes Received per Second, Bytes Sent per Second, Megacycles per Second (The amount of CPU megacyles your application uses every second), Milliseconds Used per Second, Number of Quota Denials per Second. I have no idea what a megacycle is either. I think it's bigger than a pint of beer.
  • Also I wonder if this is meant to compete with Facebook more than Amazon?
  • Developers with a lot of little projects will find AppEngine especially useful, which always leaves open a Adoption Led Market play.

    Related Articles

  • HighScalability: Rumors of Signs and Portents Concerning Freeish Google Cloud.
  • Techcrunch: Google Jumps Head First Into Web Services With Google App Engine.
  • HighScalability: Heroku - Simultaneously Develop and Deploy Automatically Scalable Rails Applications in the Cloud.
  • Video of the announcement at Camp David, er CampFireOne.
  • Techdirt: Google Finally Realizes It Needs To Be The Web Platform.
  • TechCrunch Labs: Our Experience Building And Launching An App On Google App Engine
  • Zdnet: Let the PaaS wars begin.
  • ReadWriteWeb: Google: Cloud Control to Major Tom.
  • ComputingAtScale: The Live Web has another heart.
  • ComputingAtScale: Parallelism: The New New Thing!.
  • EETimes: CPU designers debate multi-core future.
  • EETimes: Multicore puts screws to parallel-programming models.
  • Slashdot: Panic in Multicore Land.
  • Mashable: Google App Engine: An Early Look.
  • DeftLabs: Gazing Into The Clouds.
  • Experimenting with Google App Engine. Bret Taylor writes a blog post about the blog he wrote for AppEngine that serves the blog post. Elegant touch.
  • ReadWriteWeb: Google App Engine: History's Next Step or Monopolistic Boondoggle?.
  • GigaOM: App Engine: Competition Is Good for Everyone
  • The Blist: Batteries sold separately - "they’ve got the scalable bit and the hosting bit, but there’s a surprising lack of, well, “web” and “application” going on here."
  • Foobar: Would you use Google App Engine? - " if you think you might like to compete with Google and/or become a really large web site or business yourself, then I don't think that Google App Engine should be your first choice"
  • Silicon Valley Insider: Google's App Engine: Aiming At Facebook, Not Amazon - "Google is not trying to provide pure utility here -- they are trying to provide utility tethered to their infrastructure."
  • Niall Kennedy: Google App Engine for developers - "Overall I am quite impressed with Google App Engine and its potential to remove operations management and systems administration from my task list. I am not confident in Google App Engine as a hosting solution for any real business..."

    Click to read more ...

  • Monday
    Apr072008

    Scalr - Open Source Auto-scaling Hosting on Amazon EC2

    Scalr is a fully redundant, self-curing and self-scaling hosting environment utilizing Amazon's EC2. It has been recently open sourced on Google Code. Scalr allows you to create server farms through a web-based interface using prebuilt AMI's for load balancers (pound or nginx), app servers (apache, others), databases (mysql master-slave, others), and a generic AMI to build on top of. Scalr promises automatic high-availability and scaling for developers by health and load monitoring. The health of the farm is continuously monitored and maintained. When the Load Average on a type of node goes above a configurable threshold a new node is inserted into the farm to spread the load and the cluster is reconfigured. When a node crashes a new machine of that type is inserted into the farm to replace it. 4 AMI's are provided for load balancers, mysql databases, application servers, and a generic base image to customize. Scalr allows you to further customize each image, bundle the image and use that for future nodes that are inserted into the farm. You can make changes to one machine and use that for a specific type of node. New machines of this type will be brought online to meet current levels and the old machines are terminated one by one. The open source scalr platform with the combination of the static EC2 IP addresses makes elastic computing easier to implement. Check out the blog announcement by Intridea for more info. As AWS conquers the scalable web application hosting space it is time to check out the new Programming Amazon Web Services: S3, EC2, SQS, FPS, and SimpleDB (Programming) book on amazon.com. What do you think of the opportunities of using scalr for automatic scalability?

    Click to read more ...

    Monday
    Apr072008

    Rumors of Signs and Portents Concerning Freeish Google Cloud

    Update 2: Rumor no more. Google Jumps Head First Into Web Services With Google App Engine. The quick and dirty of it: developers simply upload their Python code to Google, launch the application, and can monitor usage and other metrics via a multi-platform desktop application. There were 10,000 developer slots open and of course I was too late. More as the cobra strikes. Update: TechCrunch reports Google To Launch BigTable As Web Service next week. It competes with Amazon's SimpleDB. Though it won't be truly comparable until they also release an EC2 and S3 equivalent. An internet hit for each data access is a little painful. As Jimmy says in Goodfellas, "That's the way. You don't take no sh*t from nobody. " First Dave Winer hallucinates a pig on the mean streets of Walnut Creek that told him Google's long foretold cloud offering will be free for bloggers of "modest needs." GigaOM then says a free cloud service is how Google could eat Amazon's bacon for lunch. The reason for this free cloud buffet is said to be the easier integration of acquisitions who must presumably be in the Google cloud to be taken out. All the free stuff Google offers earns almost no money. They make money on search. Hosting every last CPU cycle on earth has to be costly. What's the return? Cheaper integration of new startups that will also provide no new revenue? Perhaps I am simply not clever enough to see the revolutionary brilliance in this line of thought. Though I would be quite pleased to have Google shareholders subsidize my projects. Folknologist thinks Google may keep costs down by requiring developers to code to a Cloud Virtual Machine based on Java byte codes... Applications would be built using G-ROR, a javascript style RoR framework. Revenue generation would come from an upsell of more memory and CPU. But aren't VMs already the perfect encapsulation from the cloud provider perspective? They just load 'em and run 'em. Seems cost effective enough. For the developer VMs also allow all required flexibility. You don't need to be locked into one environment. You can pick from a large number of operating systems and even wider variety of frameworks. Why lock in? If the model is to treat the cloud like one giant Tomcat application server so you can squeeze more users on the same amount of hardware then Google would just be the worlds largest shared hosting company. Not a cloud at all. And multi-tenant execution of applications in the same application server was always a really bad idea given how one badly programmed app can bring down the whole bunch. Not to mention security concerns. VMs offer better control, manageability, and security. I could see an Adoption Led market angle for Google. You could start small in a shared container and then as you grow move your app without change to a larger, more powerful, unshared container. We certainly do need a better way to create, deploy, and manage applications across VMs and data centers, but I don't quite see how this allows Google to make money offering an expensive service any better than the current VM approach. Though with all their cash maybe they plan to just wait it out until all the others bash themselves apart on the rocky shores of free. Just in case this is an April fools joke, I already know I am an idiot, so no harm done.

    Click to read more ...

    Monday
    Apr072008

    Lazy web sites run faster

    It is fairly obvious that web site performance can be increased by making the code run faster and optimising the response time. But that only scales up to a point. To really take our web sites to the next level, we need to look at the performance problem from a different angle.

    Click to read more ...

    Saturday
    Apr052008

    Skype Plans for PostgreSQL to Scale to 1 Billion Users

    Skype uses PostgreSQL as their backend database. PostgreSQL doesn't get enough run in the database world so I was excited to see how PostgreSQL is used "as the main DB for most of [Skype's] business needs." Their approach is to use a traditional stored procedure interface for accessing data and on top of that layer proxy servers which hash SQL requests to a set of database servers that actually carry out queries. The result is a horizontally partitioned system that they think will scale to handle 1 billion users.

  • Skype's goal is an architecture that can handle 1 billion plus users. This level of scale isn't practically solvable with one really big computer, so our masked superhero horizontal scaling comes to the rescue.
  • Hardware is dual or quad Opterons with SCSI RAID.
  • Followed common database progression: Start with one DB. Add new databases partitioned by functionality. Replicate read-mostly data for better read access. Then horizontally partition data across multiple nodes..
  • In a first for this blog anyway, Skype uses a traditional database architecture where all database access is encapsulated in stored procedures. This allows them to make behind the scenes performance tweaks without impacting frontend servers. And it fits in cleanly with their partitioning strategy using PL/Proxy.
  • PL/Proxy is used to scale the OLTP portion of their system by creating a horizontally partitioned cluster: - Database queries are routed by a proxy across a set of database servers. The proxy creates partitions based on a field value, typically a primary key. - For example, you could partition users across a cluster by hashing based on user name. Each user is slotted into a shard based on the hash. - Remote database calls are executed using a new PostgreSQL database language called plproxy. An example from Kristo Kaiv's blog:
    First, code to insert a user in a database:
    CREATE OR REPLACE FUNCTION insert_user(i_username text) RETURNS text AS $$
    BEGIN
        PERFORM 1 FROM users WHERE username = i_username;
        IF NOT FOUND THEN
            INSERT INTO users (username) VALUES (i_username);
            RETURN 'user created';
        ELSE
            RETURN 'user already exists';
        END IF;
    END;
    $$ LANGUAGE plpgsql SECURITY DEFINER;
    
    Heres the proxy code to distribute the user insert to the correct partition:
    queries=#
    CREATE OR REPLACE FUNCTION insert_user(i_username text) RETURNS TEXT AS $$
        CLUSTER 'queries'; RUN ON hashtext(i_username);
    $$ LANGUAGE plproxy;
    
    Your SQL query looks normal:
    SELECT insert_user("username");
    
    - The result of a query is exactly that same as if was executed on the remote database. - Currently they can route 1000-2000 requests/sec on Dual Opteron servers to a 16 parition cluster.
  • They like PL/Proxy approach for OLTP because: - PL/Proxy servers form a scalable and uniform "DB-bus." Proxies are robust because in a redundant configuration if one fails you can just connect to another. And if the proxy tier becomes slow you can add more proxies and load balance between them. - More partitions can be added to improve performance. - Only data on a failed partition is unavailable during a failover. All other partitions operate normally.
  • PgBouncer is used as a connection pooler for PostgreSQL. PL/Proxy "somewhat wastes connections as it opens connection to each partition from each backend process" so the pooler helps reduce the number of connections.
  • Hot-standby servers are created using WAL (Write Ahead Log) shipping. It doesn't appear that these servers can be used for read-only operations.
  • More sophisticated organizations often uses an OLTP database system to handle high performance transaction needs and then create seperate systems for more non-transactional needs. For example, an OLAP (Online analytical processing) system is often used for handling complicated analysis and reporting problems. These differ in schema, indexing, etc from the OLTP system. Skype also uses seperate systems for the presentation layer of web applications, sending email, and prining invoices. This requires data be moved from the OLTP to the other systems. - Initially Slony1 was used to move data to the other systems, but "as the complexity and loads grew Slony1 started to cause us greater and greater pains." - To solve this problem Skype developed their on lighter weight queueing and replication toolkit called SkyTools. The proxy approach is interesting and is an architecture we haven't seen previously. Its power comes from the make another level of indirection school of problem solving, which has advantages:
  • Applications are independent of the structure of the database servers. That's encapsulated in the proxy servers.
  • Applications do not need to change in response to partition, mapping, or other changes.
  • Load balancing, failover, and read/write splitting are invisible to applications. The downsides are:
  • Reduced performance. Another hop is added and queries must be parsed to perform all the transparent magic.
  • Inability to perform joins and other database operations across partitions.
  • Added administration complexity of dealing with proxy configuration and HA for the proxy servers. It's easy to see how the advantages can outweigh the disadvantages. Without changing your application you can slip in a proxy layer and get a lot of very cool features for what seems like a low cost. If you are a MySQL user and this approach interests you then take a look at MySQL Proxy, which accomplishes something similar in a different sort of way.

    Related Articles

  • An Unorthodox Approach to Database Design : The Coming of the Shard
  • PostgreSQProducts - Scaling infinitely with PL/Proxy
  • PL/Proxy
  • Heroku also uses PostgreSQL.
  • MySQL Proxy
  • PostgreSQL cluster: partitioning with plproxy (part I) by Kristo Kaiv'.
  • PostgreSQL cluster: partitioning with plproxy (part II) by Kristo Kaiv'.
  • PostgreSQL at Skype.
  • Skytools database scripting framework & PgQ by Kristo Kaiv'.
  • PostgreSQL High Availability.

    Click to read more ...

  • Thursday
    Apr032008

    Development of highly scalable web site

    Not sure if this is the right place to post this but here goes anyway. We are looking to hire an outside firm to help with development of a scalable and potentially high-traffic web site. We are not looking for an individual but rather a firm with enough well rounded expertise to help us with various aspects of this. Basic requirements: LAMP stack or other open source solution Very proficient in cross-browser web development Flex/AIR development for RIA Java/C/C++ proficiency Expertise with Comet and push server technology Experience with development of high-traffic web sites Use of Amazon Web Services infrastructure a plus If anyone knows of consulting firms that can take on such a project, I would appreciate your feedback. TIA

    Click to read more ...

    Wednesday
    Apr022008

    Product: Supervisor - Monitor and Control Your Processes

    It's a sad fact of life, but processes die. I know, it's horrible. You start them, send them out into process space, and hope for the best. Yet sometimes, despite your best coding, they core dump, seg fault, or some other calamity befalls them. Unlike our messy biological world so cruelly ruled by entropy, in the digital world processes can be given another chance. They can be restarted. A greater destiny awaits. And hopefully this time the random lottery of unforeseen killing factors will be avoided and a long productive life will be had by all. This is fun code to write because it's a lot more complicated than you might think. And restarting processes is a highly effective high availability strategy. Most faults are transient, caused by an unexpected series of events. Rather than taking drastic action, like taking a node out of production or failing over, transients can be effectively masked by simply restarting failed processes. Though complexity makes it a fun problem, it's also why you may want to "buy" rather than build. If you are in the market, Supervisor looks worth a visit. Adapted from their website: Supervisor is a Python program that allows you to start, stop, and restart other programs on UNIX systems. It can restart crashed processes.

  • It is often inconvenient to need to write "rc.d" scripts for every single process instance. rc.d scripts are a great lowest-common-denominator form of process initialization/autostart/management, but they can be painful to write and maintain. Additionally, rc.d scripts cannot automatically restart a crashed process and many programs do not restart themselves properly on a crash. Supervisord starts processes as its subprocesses, and can be configured to automatically restart them on a crash. It can also automatically be configured to start processes on its own invocation.
  • It's often difficult to get accurate up/down status on processes on UNIX. Pidfiles often lie. Supervisord starts processes as subprocesses, so it always knows the true up/down status of its children and can be queried conveniently for this data.
  • Users who need to control process state often need only to do that. They don't want or need full-blown shell access to the machine on which the processes are running. Supervisorctl allows a very limited form of access to the machine, essentially allowing users to see process status and control supervisord-controlled subprocesses by emitting "stop", "start", and "restart" commands from a simple shell or web UI.
  • Users often need to control processes on many machines. Supervisor provides a simple, secure, and uniform mechanism for interactively and automatically controlling processes on groups of machines.
  • Processes which listen on "low" TCP ports often need to be started and restarted as the root user (a UNIX misfeature). It's usually the case that it's perfectly fine to allow "normal" people to stop or restart such a process, but providing them with shell access is often impractical, and providing them with root access or sudo access is often impossible. It's also (rightly) difficult to explain to them why this problem exists. If supervisord is started as root, it is possible to allow "normal" users to control such processes without needing to explain the intricacies of the problem to them.
  • Processes often need to be started and stopped in groups, sometimes even in a "priority order". It's often difficult to explain to people how to do this. Supervisor allows you to assign priorities to processes, and allows user to emit commands via the supervisorctl client like "start all", and "restart all", which starts them in the preassigned priority order. Additionally, processes can be grouped into "process groups" and a set of logically related processes can be stopped and started as a unit. Supervisor also has a web interface and an XMP-RPC interface:
  • A (sparse) web user interface with functionality comparable to supervisorctl may be accessed via a browser if you start supervisord against an internet socket. Visit the server URL (e.g. http://localhost:9001/) to view and control process status through the web interface after activating the configuration file's [inet_http_server] section. XML-RPC Interface
  • The same HTTP server which serves the web UI serves up an XML-RPC interface that can be used to interrogate and control supervisor and the programs it runs. To use the XML-RPC interface, connect to supervisor's http port with any XML-RPC client library and run commands against it. An example of doing this using Python's xmlrpclib client library is as follows.

    Related Articles

  • PyCon Presentation: Supervisor as a Platform
  • Monitor Pylons application with supervisord
  • Supervisor Manual

    Click to read more ...

  • Tuesday
    Apr012008

    How to update video views count effectively?

    Hi, I am building a video-sharing site and I'm looking for an efficient way to update video views count. The easiest way would be to perform an SQL update to increase the "views" counter every time a video is viewed, but naturally I want to avoid DB write access as much as possible. I am looking for an efficient temporary storage to which I could connect and say "increment views of video X". Every so often I would save the changes to my main database, and remove the counter from this temporary storage. I am having a hard time finding such temporary storage, however. My first thought was memcache, but it's not ideal as I wouldn't like to lose the data if memcache goes down. Also, memcache's increment command requires that the key is already present - that means that every time a video is viewed, I would have to check if the key already exists in memcache, before I can actually send the increment command. What do people use to solve this kind of issues? Kind regards, Tomasz

    Click to read more ...

    Monday
    Mar312008

    Read HighScalability on Your Mobile Phone Using WidSets Widgets

    Jean-Paul de Vooght of our Switzerland contingent created a nifty little WidSets widget that lets you better read HighScalability from your mobile phone. I thought untethered readers might like to give it a try. Thanks to Jean-Paul for making it available! WidSets is: a simple service that brings you information normally accessed via the Internet by sending it directly to your mobile phone . Using mini-applications called widgets, it sends you the latest updates to your favorite websites. The system uses RSS feeds to push information from these websites directly to your mobile phone the minute they’re updated.

    Click to read more ...

    Sunday
    Mar302008

    Scaling Out MySQL

    This post covers two main options for scaling-out MySql and compare between them. The first is based on data-base clustering and the second is based on In Memory clustering a.k.a Data Grid. A special emphasis is given to a pattern which shows how to scale our existing data base without changing it through a combination of Data Grid and data base as a background service. This pattern is referred to as Persistency as a Service (PaaS). It also address many of the fequently asked question related to how performance, reliability and scalability is achieved with this pattern.

    Click to read more ...