« Stuff The Internet Says On Scalability For February 8, 2013 | Main | Super Bowl Advertisers Ready for the Traffic? Nope..It's Lights Out. »

Ask HighScalability: Web asset server concept - 3rd party software available?

We are serving dynamic (PHP) websites and their assets (JS/CSS/images/videos/binary downloads) via the same apache hosts. The static files are only being used as origins for CDN services used to distribute those files. Yet, in the current development-deploy pipeline, these files are checked into the same version control repositories as the code is. This is what we would like to change, for several reasons (decouple asset deployment from development & developers, lessen size of code repositories, etc.)

My idea is to do the following: Set up a media server (cluster) which serves as an API (REST e.g.). You can PUT files to it, and get back the URL the file is available through from the public. In between input and output, the media service deals with everything that's necessary to serve the files: Upload them to the CDN, create the public URL, write the meta data to a (relational?) database, assign a version number... This API can be used by a) the application/website directly to provide CMS system functionality (upload and reference images directly from the website), and b) the build pipeline to upload and correctly reference the files that are being used by the templating engine directly/statically.

For the media service, a GUI could be implemented which enables management of the uploaded files (CRUD).

What do you think of the idea? Is there any software out there that already does things like that? How do other large website solve the problem of asset hosting?

Thanks in advance.

P.S.: As the question might arise: We are not planning to upload DIRECTLY to a CDN hoster in order to have a complete backup on our servers as well. As soon as the CDN hoster goes down, we could (theoretically) let the media server return its own domain name and deliver the files itself. Or an easy switch to another CDN is also possible, as the information about the CDN URLs would be stored centrally on that media server.

Reader Comments (6)

Firstly I think separating images from theme/template is a good idea, but separating js, css and so from the actual template code is something I would not consider. These things should be in one repository and the additional images etc, which are content and not related to the theme should go into another repo.
To bind these together you could have the content/data repo as submodule (when using git) and every developer is then able to decide if he needs it and it should be loaded or not. Same goes for deployment.
Additionally it's easy to upload an image, just check it in and the media server could pull stuff out.

Having a media server for upload etc. could be a solution for bigger data setups, but it's harder to implement.

You wanted to have a few different options to upload images:
Via CMS: You could actually upload it to the data repository via cms and let git push changes into the data repo.
Via API: Do you really need that? If yes a simple python script could make a rest api and push on upload.
Via Repo: Developers know what todo.

As I said for bigger deployments (more than 2TB at least) I would probably consider using Amazon S3, OpenStack Swift, Ceph Object storage. (with the preference going to Ceph)

For something bigger than 20GB git repos tend to be slow and therefore git-annex is a great tool to overcome this limitation.

Controlling a CDN I would always do that via DNS. As I understand your media server should route to the CDN if needed?
This would break most of the benefits for CDNs (geolocated, edge servers, in memory cache close to the user, scalable concurrent connection, scalable bandwidth and even more).

Hope I could help a bit.

Cheers Michael

February 7, 2013 | Unregistered CommenterMichael Grosser

Sounds like a case for Swift (your own self-hosted S3). Then make the Swift the origin for your CDN. No need to explicitly upload to CDN. There are GUIs that will interface with Swift and since it has S3 compatibility, most tools that work with S3 will work with Swift.

Historically, MogileFS was used for this, but I don't hear much about it anymore. Swift has a much better architecture and addresses the scalability/high-availability problems with MogileFS.

February 7, 2013 | Unregistered CommenterErik Osterman

Hi. Long time listener, first time caller. I saw this post and started laughing! I've been wanting a nice abstraction layer for media for the last year! I started something in PHP...

February 7, 2013 | Unregistered CommenterTim Perkins

Existing CMS / DAM solutions most likely come with the features you want...

February 8, 2013 | Unregistered Commentergggeek

Like a lot of sites you have two types of static assets:

1. "Code" like assets (e.g. CSS, JS) that are referenced from your templates. I would not seperate these from the templates because if they are not synced then your site may well break, e.g. missing JS function or CSS class. There is a slight grey area here with CSS if you are applying themes (e.g. on a multi-tenancy site). In this case you would want a default set of CSS held in your source repository and the ability to override this via CMS like functionality in the site.

2. "CMS" like assets that need to change outwith the development/deploy cycle of your app. This is were you will need a seperate repository. You could implement your own service as you mention and integrate that with your site or use SVN for versioning and export from that to your origin or if you have deep pockets look at something like Adobe Scene7.

The thing to recognize is you can have multiple origins being served from a single asset domain and referenced from your site. Most CDNs have the ability to perform path based routing, e.g.:* -> goes to your site JS origin* -> goes to your asset CMS origin

I have used that model (with an SVN backed image CMS) on a number of sites.

Regarding you comment about switching CDNs in case of failure you should have look a using CNAMEs to switch CDNs via a DNS update. It means you don't have to change URLs in your app or CMS. There is the issue of proagation but that can be managed by appropriate TTLs.

February 8, 2013 | Unregistered CommenterPaul Gillespie

Take a look on Cloudinary : It does what you describe

February 8, 2013 | Unregistered Commenterotassel

PostPost a New Comment

Enter your information below to add a new comment.
Author Email (optional):
Author URL (optional):
Some HTML allowed: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <code> <em> <i> <strike> <strong>