Simple and fast url shortener using #tornadoweb and #couchdb
It was surprisingly simple to create, and so far, very fast. I'd also imagine it would scale quite well.
So if you're not familiar with either tornado or couchdb, you're probably not reading this, as the title was probably not interesting enough to get you to come here. However, if for some reason you chose to read this entry and have never heard of either of those, then I highly suggest you check them out.
Tornado web is a VERY simple python based web framework that is based on an asynchronous design (basically how nginx works), where a single process can multiplex thousands of simultaneous network connections efficiently using modern features of the linux kernel (epoll). Basically what epoll does, is notify the process when new data is available on a network connection. So the single process is free to do whatever work it needs to do while it waits for more data from the client.
Now, the benefits of doing things asynchronously are lost if the process blocks for any reason (say you're transcoding video, or performing a horribly written db query). If the process blocks, then the clients starve. This is normally not an issue with multi-process or multi-threaded applications, as if one process or thread blocks, the others can continue to work. But, if there's only one process, this becomes an issue.
Most tasks, however, finish so fast, the client doesn't notice. It does mean though, that if you make network connections inside the server process (say to query a database), it's preferable if you can make them asynchronous as well. For this reason, I chose CouchDB for the database. Basically, CouchDB is a JSON document database (it stores objects, each object is a JSON encoded object), that speaks a simple RESTful (i.e. over HTTP) protocol. This was ideal, as I could perform asynchronous database queries.
Ok, so enough background, now for the nuts and bolts. Hopefully, if you were already familiar with these two projects, you would have just skipped down to here. So the basic architecture of my service is:
NGINX -> Tornado Web App -> CouchDB
All front end requests are handled by nginx, which proxies the request to the tornado app. The tornado app parses the URI for the base36 encoded string at the end, and then uses that as the document id in a couchdb lookup. So it basically sees a GET request from a client, takes the ID being asked for, does an async GET request of it's own to couchdb. When the couchdb request returns, it calls a callback function, which parses the JSON document from couch, pulls out the URL and sends a 301 redirect to the client with the URL that was fetched.
Now, at this point, I also added an extra step, just after it sends the redirect, it performs a PUT asynchronously back to couchdb to log the request with useful info (like user-agent string, remote IP, etc). All this info could be parsed from the nginx logs and stuck into couch later for analytics, but it doesn't seem to impact performance at this point, so I'm keeping it in there for now.
I've done some very basic preliminary testing, and I have to say, it seems to be just as fast, if not in some cases faster, than the top url shortening services out there. And since it's all asynchronous, it can scale up like nobodies business. In creased number or requests per second, just add a few more of the tornado app servers. If the couchdb server gets bogged down, you can load balance it as well.
Now, one final note here. This isn't a full blown url shortening service, as there is no way to ADD urls to the DB. Atleast not via a public interface at this point (there might never be). This is because this service is actually just going to be a part of a larger service/site that I'm working on, and so there will be no need to publicly create them (they'll be created on the back end automatically for things).
Where's the code you ask? Well, I may publish it in the future (right now it's kind of ugly). If anyone asks nicely (and therefore cares), I'll put the simple code up some where.
That's all for now though.


