Here's some stuff manalang has liked. To find more cool stuff, check out Explore »

One of the most exciting presentations to come out of JSConf.eu was Ryan Dahl's presentation of his incredible Node.js project. Ironically we had just covered it at last week's NOVALANG session. I decided it would be a great article for Naked JavaScript in order to provide an nice introduction to Node.js and the broader topic of CommonJS as well. By the end of this article you will have a functional Node.js installation and built several interesting applications. Enjoy!

What is Node.js

Node.js is an evented I/O framework built on top of Google's V8 JavaScript engine. It's goal is to provide an incredibly powerful I/O system through which you can build highly efficient and s calable applications without any knowledge of "advanced topics" such as threading, processes, etc.. It does this by using an event-based programming model similar to Python's Twisted framework or Ruby's EventMachine. In the event-based model, you registered what should happen, commonly referred to as a callback, when an specific event happens. You do not worry about the capturing, execution, or closing of the event. This is distinct from threaded programming which requires the developer to identify the "event", create the thread, execute the processing, and clean up the thread, all of which is complex and littered with hard-to-debug issue. The easiest way to describe the event model, especially to people coming from browser-based JavaScript, is that it is exactly how you program interactivity in the browser. Take the following example which shows evented programming using AJAX requests in Dojo, jQuery, and Prototype.

What is going on in each of these AJAX requests is that we describe a function on what to do upon a successful GET request to the specific url, in this case "/dragons", and that function takes whatever data happens. In standard procedural programming, one would think that the process would wait for the request to be made and then continue about processing the rest of the program. In event-based programming, the function is identified and stored until the specified event (in this case successful response from the GET request). The processing of the program continues on, it does not stop or "block" waiting for the rest of the operation, in this case make and receive the GET request, to finish out. Once the request and response are executed, then the event is triggered and the stored function is called with the data passed into it. I am focusing on this because it is vastly different than the normal mode of programming, so its critical to get it right for both Node.js and general event based programming. 

Installing Node.js

At the time of this writing, Node.js is only installabled by source, due mainly because it is a constantly evolving project. To get started open up a terminal in our previous established javascript directory and issue the following commands. They will checkout the current Node.js branch from Github, configure, make both V8 and Node.js, and install it to /usr/local/bin. This should work perfectly fine on Mac OS X and Linux, Windows is not currently supported by Node.js.

Once this completes, it will install two Node.js specific executables, node and node-repl. The first one, node, is used to execute Node.js files, like the ones we will create during this article. The other, node-repl, is a Read-Eval-Print-Loop which will allow you to quick try out bits of c ode in Node.js without creating a file. This can be a lot of fun to start with if you want to just verify your installation. You can run any standard JavaScript code in node-repl directly. 

Building HTTP

The event-based programming is fundamental to proper JavaScript programming and Node.js programming and allows for elegant programs that handle massively concurrent systems with little code, memory, and processing power. The best example of this is the "build a web server" example on the homepage of the Node.js website, re-posted below. In this non-trivial example, we are setting up a highly scalable, web server bound to port 8000 that simply serves up a "hello world" web page after 2 seconds have past (have to make it feel real after all). 

The code is pure JavaScript so it should be relatively easy to understand. Briefly though, the application is doing the following:

  1. Include the standard libraries system and HTTP AND set the to contextually appropriate variable names.
  2. Create an HTTP server listening on port 8000 and attach the following function to handle any incoming calls.
  3. Within the function (req, res) {...} body is where our web action happens, which in this case just sets the response headers, sends the body "Hello World", and finishes the request. It does this only after 2 seconds (2000) have elapsed.

The interesting thing of note here is the way in which the server processing is handled. The function(req, res) {...} that we create here is actually registered to an event (in this case an incoming HTTP request on port 8000). When the event happens, the code is then execute with the provided parameters of request object and response object. This makes great little network server because when nothing is happening (IE no traffic) then nothing is happening (IE no processing). The code after the registration event happens immediately since the listen and processing is done not at the point of interpretation, but at event trigger. 

Nom, Nom, Nom

Now that we have created a server, lets create a brand new client to consume the data from that (or any server). Take a look at the following code and see if it makes sense to you, there are some tricky components of this so take your time. 

Forgoing the parts we have already covered, notice the first function we create called read. It takes a parameter callback which we execute and pass the data to once we have obtained it from the HTTP service. Within read, we create a HTTP client with the http.createClient(port, domain); syntax, this just sets up the necessary structure for the connection. Then we assign a get request of "/index" to the connection which returns a request object. This is not actually make the request yet. It waits until it is at least provided the finish command so that you can assign HTTP headers or do other processing. In the request.finish() function we pass an anonymous function that processes the response object, in here things get a little crazy.

Within the anonymous function (starting at line 6) we create an empty string called responseBody and then set the response encoding to UTF-8. We then attach a listener for the event "body" with another anonymous function that appends chunks of data to our responseBody string. This is the way to pull information from the service and is done in this way to facilitate chunked data delivery, you are actually pulling each chunk of data off the wire and into your string. This is great for large data chunks because you could start processing the data even while downloading the data. After one or many "body" events, there will be a "complete" event fired which indicates that the HTTP response has completed. In the "complete" event anonymous function we simply execute the callback parameter function and pass it the data. For this example, the call back function only outputs to the console, so nothing special, but in just a small number of lines, you have created a very powerful HTTP consuming application. 

Twitter Client

With what we have done so far we can actually create a full and meaningful application consuming an external service instead of our own hello world. The following code is just a simple modification of the localhost calling HTTP client we just wrote, but this time it is calling Twitter's search API to find any tweets about JSConf. This will automatically query the API every minute and pull the newest items and display in bottom posting order. You can just leave this terminal window open and watch all the tweets fly by while using very little processor time and memory space, 0.1% CPU and 30MB on my laptop. Perfect for netbooks and other battery constrained devices and best of all its only JavaScript!

Other Frivalities

While showing off the HTTP capabilities of Node.js are incredibly sexy and most likely the very future of web application development, it can do so much more. Take for instance this bit of code from the Node.js API documentation, which opens a raw TCP socket on port 7000 of the loopback interface. What is incredibly striking about this is that it is so fundamentally similar to the aforementioned HTTP client and HTTP server we created. In the same fashion as client side JavaScript libraries, Node.js callbacks are uniform regardless of what the event is that is being tracked. You do not worry about opening the TCP socket, threading, mutexes, or any of that complexity that in any other language would be an initial requirement.

You can have fun with this code using a simple telnet command of:

telnet localhost 7000

Also you apply the evented model to standard system execution as shown in the following code segment which executes a "ls" directory listing command and attaches a callback that will be executed upon the return of the system command execution. The Node.js execution does not stop or block waiting for the directory listing to occur, instead it continues to execute the next commands. 

Conclusion

Node.js is a revolutionary technology built on top of another powerful revolutionary technology, V8. It is gathering a lot of attention within the technology community, mainly driven by Ryan's riveting presentation at JSConf.eu. What we have covered is just scratching the surface of the power in this incredibly platform and you owe it to yourself to try your own hand at coding in Node.js, you mind find that you actually enjoy CommonJS programming more than client-side JavaScript!

More Power From The People

For interesting libraries on top of Node.js check out the libraries page on the Node.js Github wiki available at: http://wiki.github.com/ry/node There are some amazing projects out there that will allow you to combine the power of Node.js with other cutting edge technologies.


garry says...

Posterous is proud to announce the ability to change the look and feel of your Posterous blog! It's been a long time coming, and are we ever excited about releasing this feature to you guys today.

Choose from five built-in themes
Including one designed by theme creator Bill Israel. And we've got a whole ton more on the way. We wanted to get this in your hands ASAP, and we'll be releasing more into the system as soon as we create them.

Be able to upload header images
Customize your blog by creating a custom blog header in your favorite image editor. Then just upload it and see it at the top of every page on your blog. No coding experience necessary.

And choose new colors
Want to change the link color? Switch something up? Use our color picker and you don't have to code a single line of HTML.

For people who want to customize to the max...

If you're an advanced user, designer, or engineer, now you can totally change the CSS and HTML layout of your site.

Not only that, Posterous Themes are Tumblr-compatible. We built the Posterous Theme Engine to work great with the thousands of existing Tumblr themes out there! Just drop the theme code into the "advanced mode" editor. Want to add commenting and favoriting? It's just a couple lines of simple HTML away. Read more about it in our theming docs.

Some examples of Posterous Themes in the wild...

Check out what Posterous super-themer Cory Watilo has built with full CSS / HTML customization:

Our friends at Mugasha, Vidly, and Tweetvite have all chosen Posterous to host their company startup blogs. Dustin Curtis is liveblogging his 30 day flight on JetBlue on posterous too!

So what are you waiting for? It's enabled on your Posterous blog now. Go to your Manage page, and click Edit Settings > Theme and Customize to get started.


garry says...

There are a bunch of basic functional elements to building out a popular Rails app that I've never really seen explained in one place, but we had to learn the hard way while building Posterous. Here's a rundown of what we've learned, in the hopes that some Google linkjuice may bring an intrepid wanderer to this forlorn part of the woods and help you out.

Static Storage
S3 is awesome. Yes, you can host everything off S3. Is it a little more expensive? Probably. But if you're engineer constrained (and if you're a startup, you absolutely are) -- set it and forget it. If you absolutely must have local storage across app servers, then MogileFS, GFS, or HDFS or even NFS (yuck) are possibilities. For alternatives to S3, Mosso is supposed to be good too.

Images, files, whatever. Just drop it there. People say a lot of stuff about the Cloud, but it's real and a game changer for anyone doing user generated content.



HTTP Cache Control
The HTTP protocol lets you tell browsers what static content they can cache. You set this in apache.  Rails automatically will put timestamps in the IMG / javascript / CSS tags, assuming you're using the helpers. The Firefox plugin YSlow coupled with Firebug are your friends here. The improvement is significant and well worth your time, especially if you add gzip'ing. 100KB initial page load can be brought down to 5K (just the HTML file) on subsequent clicks around your site.



Search
You're not going to run full text search out of your DB. It's totally not worth it to roll anything custom here. The smart money is on Sphinx with the ThinkingSphinx plugin is probably your best bet. If you have more than one app server, you'll want to use this. Alternatively, Solr with Acts as Solr can be used if you're a Java geek / have Lucene/Solr experience previously.


Storage engine matters, and you should probably use InnoDB

MyISAM is marginally faster for reads, but InnoDB will make you more crash resistant and will not lock tables on writes. Read about the difference, because when your servers are on fire, you will realize MySQL feels like a pretty thin layer of goop on top of your storage engine. MyISAM is actually the default on MySQL, which makes sense for most crappy phpBB installations -- but probably not good enough for you. The default can hurt you.

Oh yeah, and if you can start with some replication in place, do it. You'll want at least one slave for backups anyway.



Fix your DB bottlenecks with query_reviewer and New Relic
This basically saves your ass completely. Everyone complains that Rails is slow. Rails is not slow, just like Java Swing is not slow. Rails makes it easy to shoot yourself in the face. If you do follow-the-textbook-example bumbling around with Rails ActiveRecord objects, you will end up with pages that drive 100 queries and take several seconds to return.


Above is a screenshot from query_reviewer. It tells you every single query being run, and alerts you to things that use temporary tables, file sorts and/or just damn slow queries.

In a nutshell, you need indexes to avoid full table scans. The traditional way is to run EXPLAIN manually on queries coming out of your dev log. Query_reviewer lets you see it all right there in the left corner of your web browser. It's brilliant. You also need to eager load associations that will use in your views by passing :include to your ActiveRecord find method call, so that you can batch up SQL queries instead of destroying your DB server with 100 queries per dynamic page.

New Relic is new for us, but it helps us see what is really happening on our production site. If your site is on fire, it's a freaking beautiful gift from the heavens above. You'll see exactly what controllers are slow, which servers in your cluster, how load is on all your machines, and which queries are slow.

Memcache later
If you memcache first, you will never feel the pain and never learn how bad your database indexes and Rails queries are. What happens when scale gets so big that your memcache setup is dying? Oh, right, you're even more screwed than you would have been if you got your DB right in the first place. Also, if this is your first time doing scaling Rails / a db-driven site, there's only one way to learn how, and putting it off til later probably isn't the way. Memcache is like a bandaid for a bullet hole -- you're gonna die.



You're only as fast as your slowest query.
If you're using nginx or Apache as a load balancer in front of a pack of mongrels (or thins or whatever else is cool/new/hip), then each of those mongrels acts like a queue. The upshot is that if you EVER have a request that takes a long time to finish, you're in a world of hurt. So say you have 4 mongrels, and Request A comes in to port 8000 and it takes 10 seconds. The load balancer is naive and keeps passing requests to Port 8000 even though that port is busy. (Note: This might help, but we don't use it)

Then what happens? Sad town happens. 1 in 4 requests after Request A will go to port 8000, and all of those requests will wait in line as that mongrel chugs away at the slow request. Effective wait time on 1/4th of your requests in that 10 second period may be as long as 10 seconds, even if normally it should only take 50msec!

Enter the wonderful mongrel proctitle. Now, you can see exactly what is blocking your mongrels. I keep this on a watch in a terminal at all times. It's what I look at immediately if our constant uptime tests tell us something's wrong. Super useful.



The answer is: a) run some mongrels dedicated to slow running jobs (meh) or b) run Phusion Passenger, or c) run slow stuff offline... which leads us to...

Offline Job Queues
So you gotta send some emails. Or maybe denormalize your DB. Or resize photos, or transcode video or audio. But how do you do it in the 200msec that you need to return a web request? You don't. You use Workling or Delayed Job or nanite. It'll happen outside of your mongrels and everyone will be happier.

I don't know why people don't talk about this more, because if you run a site that basically does anything, you need something like this. It *should* be a part of Rails, but isn't. It isn't a part of Rails in the same way that SwingWorker in Java wasn't a part of Java Swing core like forever, even though it absolutely had to be.



If you don't monitor it, it will probably go down, and you will never know.
Test your site uptime, not just ping but actual real user requests that hit the DB. Sure, you could use pingdom if you're lazy, but it seriously takes like 10 lines of ruby code to write an automated daemon that runs, does a user action and checks that your site is not hosed. open-uri is your friend. You don't know if you're up if you're not checking. Do not tolerate downtime.

Also, use god for mongrel and process monitoring. Mongrels die or go crazy. You gotta keep them in their place. (What's funny is that god leaks memory over time with Ruby 1.8.6 *sigh*). Munin, monit, and nagios are also great to have.

Keep an eye on your resources -- IO ok? Disk space? It's the worst thing every to have a site crash because you forgot to clean the logs or you ran out of disk space. Make cronjobs for cleaning all logs and temp directories, so that you can set it and forget it. Because you will forget, until you are reminded in the worst way.



Read the source, and cut back the whining
You will learn more reading the source and debugging / fixing bugs in plugins and sometimes Rails itself than a) complaining on a mailing list or b) whining about shit on your twitter. It's Ruby open source code -- if it's broken, there's a reason. There's a bug, or you're doing it wrong. Fix it yourself, drop it into a github fork, and submit back.



Beware old plugins
They don't work well. And they sit around on Google sucking up time and effort. Acts as paranoid is one. They look legit, with famous names who created them. Don't fall for it. Insist on using code that has been updated recently. Rails changes pretty fast, and plugins that don't get updated will waste your time, cause random bugs, and basically make your life crap.

Github is new on the scene and has totally revolutionized Rails. When in doubt, search Github. If it's not on Github, it's probably dead/not-maintained. Be wary.



Beware old anything
Actually, if this blog post is older than even 6 months or 1 year -- you might want to go elsewhere. Rails moves fast. What's hot and "must have" in Rails now may be totally a piece of crap / barely functioning garbage later. Same with any blog posts. Be super wary of the Rails wiki. There be dragons -- I mean, really stuff that references Rails 1.2.6 or earlier!

And that's a wrap.
There's tons more stuff, but this is a pretty decent list of stuff to watch out for. If you have any suggestions for other things I missed, or questions, please do leave a comment below!

If you liked this article, please try posterous.com and/or follow me on twitter at @posterous and @garrytan!

 

Filed under: Ruby on Rails, scaling