Search posterous

Search all posts and users. Type a name, type a favorite song title, whatever! See what comes up.
  

More posterous blogs











More recommended blogs »

Here are posterous posts filed under couchdb...

Doug says...

It was surprisingly simple to create, and so far, very fast. I'd also imagine it would scale quite well.

So if you're not familiar with either tornado or couchdb, you're probably not reading this, as the title was probably not interesting enough to get you to come here. However, if for some reason you chose to read this entry and have never heard of either of those, then I highly suggest you check them out. 

Tornado web is a VERY simple python based web framework that is based on an asynchronous design (basically how nginx works), where a single process can multiplex thousands of simultaneous network connections efficiently using modern features of the linux kernel (epoll). Basically what epoll does, is notify the process when new data is available on a network connection. So the single process is free to do whatever work it needs to do while it waits for more data from the client.

Now, the benefits of doing things asynchronously are lost if the process blocks for any reason (say you're transcoding video, or performing a horribly written db query). If the process blocks, then the clients starve. This is normally not an issue with multi-process or multi-threaded applications, as if one process or thread blocks, the others can continue to work. But, if there's only one process, this becomes an issue.

Most tasks, however, finish so fast, the client doesn't notice. It does mean though, that if you make network connections inside the server process (say to query a database), it's preferable if you can make them asynchronous as well. For this reason, I chose CouchDB for the database. Basically, CouchDB is a JSON document database (it stores objects, each object is a JSON encoded object), that speaks a simple RESTful (i.e. over HTTP) protocol. This was ideal, as I could perform asynchronous database queries.

Ok, so enough background, now for the nuts and bolts. Hopefully, if you were already familiar with these two projects, you would have just skipped down to here. So the basic architecture of my service is:

NGINX -> Tornado Web App -> CouchDB

All front end requests are handled by nginx, which proxies the request to the tornado app. The tornado app parses the URI for the base36 encoded string at the end, and then uses that as the document id in a couchdb lookup. So it basically sees a GET request from a client, takes the ID being asked for, does an async GET request of it's own to couchdb. When the couchdb request returns, it calls a callback function, which parses the JSON document from couch, pulls out the URL and sends a 301 redirect to the client with the URL that was fetched.

Now, at this point, I also added an extra step, just after it sends the redirect, it performs a PUT asynchronously back to couchdb to log the request with useful info (like user-agent string, remote IP, etc). All this info could be parsed from the nginx logs and stuck into couch later for analytics, but it doesn't seem to impact performance at this point, so I'm keeping it in there for now.

I've done some very basic preliminary testing, and I have to say, it seems to be just as fast, if not in some cases faster, than the top url shortening services out there. And since it's all asynchronous, it can scale up like nobodies business. In creased number or requests per second, just add a few more of the tornado app servers. If the couchdb server gets bogged down, you can load balance it as well.

Now, one final note here. This isn't a full blown url shortening service, as there is no way to ADD urls to the DB. Atleast not via a public interface at this point (there might never be). This is because this service is actually just going to be a part of a larger service/site that I'm working on, and so there will be no need to publicly create them (they'll be created on the back end automatically for things).

Where's the code you ask? Well, I may publish it in the future (right now it's kind of ugly). If anyone asks nicely (and therefore cares), I'll put the simple code up some where.

That's all for now though.

Filed under: couchdb

hdknr says...

STEP1. Map関数とReduce関数をJSON形式のテキストドキュメントで記述する。
STEP2. HTTP PUTを使用してSTEP1で用意したドキュメントをデータベース上に登録する。
STEP3. STEP2の結果作成されたクエリ用のURIに対してHTTP GETでアクセスする。
STEP4. STEP2で登録したドキュメントに基づいてMapReduceが実行される (CouchDB上の処理)。
STEP5. クライアントに結果がHTTP Responseとして返される。


図1. CouchDB におけるクエリの流れ

Filed under: CouchDB

Matt says...

Recently, there has been a lot of buzz about “No SQL” databases. In fact there are at least two conferences on the topic in 2009, one on each coast. Seemingly this buzz comes from people who are proponents of:

• document-style stores in which a database record consists of a collection of (key, value) pairs plus a payload. Examples of this class of system include CouchDB and MongoDB, and we call such systems document stores for simplicity

• key-value stores whose records consist of (key, payload) pairs. Usually, these are implemented by distributed hash tables (DHTs), and we call these key-value stores for simplicity. Examples include Memcachedb and Dynamo.

In either case, one usually gets a low-level record-at-a-time DBMS interface, instead of SQL. Hence, this group identifies itself as advocating “No SQL.”

Great first part of a two-part series about data storage and how "NoSQL" doesn't at all get at what things like CouchDB, MongoDB, etc. are all about.

Filed under: CouchDB

pierrel says...

I recently managed to secure my couchdb server to my liking and I haven't seen any website that explains all the methods I used in one place, so I'm going to do that here. I'm assuming that you are running couchdb from localhost on port 5984 and all communication with couchdb will be done with curl and couchapp.

I am using couchdb version 0.11.0a, the one that comes with macports "couchdb-devel" port at the time of this writing. I installed it using

$sudo port install icu erlang spidermonkey curl couchdb-devel

Once everything is nice and installed edit the local.ini and default.ini files. If you installed like above they should be in /opt/local/etc/couchdb. If it's not there then it might be in /usr/local/etc/couchdb.

In local.ini add the line

authentication_handler = {couch_httpd_auth, cookie_authentication_handler}

under the "[httpd]" section. Uncomment the "secret = " line like so:

[couch_httpd_auth]
secret =  suparsecret

Then add an admin under the [admin] section like so:
[admins]
admin = pass

In default.ini add a new authentication handler under the [httpd] section so that the "authentication_handlers" list looks something like this:

authentication_handlers = {couch_httpd_auth, cookie_authentication_handler}, {couch_httpd_auth, default_authentication_handler}

Then point the "authentication_db" variable under the [couch_httpd_auth] section to the database of your choosing like this:

authentication_db = mydb

Make sure "require_valid_user" is set to false. Also note that if you have your database running then you'll need to restart it now for the rest to work correctly.

Create the database you are going to be authenticating users from:

$curl -X PUT http://admin:pass@localhost:5984/mydb

You should get the response {"ok":true}

Now we're going to make a basic user. Couchdb expects user passwords to be hashed using sha1, and I'm going to be using the following ruby script (called 'hash.rb') to do the hashing:
<code>
#!/usr/bin/env ruby
require 'digest/sha1'

print Digest::SHA1.hexdigest(ARGV.first)
</code>

First hash the password "mypassword":

$ruby hash.rb mypassword

should return 91dfd9ddb4198affc5c194cd8ce6d338fde470e2.

Create a user and a dummy document using the following commands

$curl -X PUT http://admin:pass@localhost:5984/mydb/myuser -d '{"type":"user", "hashed_password":"91dfd9ddb4198affc5c194cd8ce6d338fde470e2"}'

You should get a response that looks like {"ok":true,"id":"myuser","rev":"1-90e61ef93d1bf2f691f64cb423126218"}
What that command does is create a new document in our mydb database with attributes "type" and "hashed_password". The password for myuser is "mypassword", but will be stored in the database hashed in case someone gets access.

Now we need to create a new design document to handle user authentication. For simplicity I'm going to be using couchapp. Create a new couchapp called "_auth" from the command line.

$couchapp generate _auth

Create a new view called "users". This means making a _auth/views/users/map.js file. What you want to do is map all user documents to a dictionary of "password_sha", "salt, "secret", and "roles" keyed by username. Mine looks like this
<code>
function (doc) {
    if (doc.type == 'user') {
        emit(doc._id, {password_sha: doc.hashed_password, salt: "", secret: 'suparsecret', roles: ['user']});
    }
}
</code>
We didn't use a salt in creating the hashed password, so we just include an empty string for the salt but generally passwords should be salted and that's where it's done. Also notice that we used the secret "suparsecret" that we set in the local.ini file earlier.

We also want to add some validation so that non-users' action are restricted. Create a file in _auth called validate_doc_update.js with the following code

<code>
function (newDoc, oldDoc, user) {
    isAdmin = (user.roles.indexOf('_admin') != -1);
    isUser = (user.roles.indexOf('user') != -1);

    // must be admin or user to update any doc
    if (!isAdmin || !isUser) {
            throw({unauthorized: 'must be admin or user.'});
    }
}
</code>
You will probably want to add more stuff to this file later.

Push the design document to the server by executing the following from within the "_auth" directory

$couchapp push _auth http://admin:pass@localhost:5984/mydb

Now if you try to do something like create a new document without being authenticated you should get an error:

$curl -X PUT http://localhost:5984/mydb/dummy -d '{"type":"something", "info":"Some more information"}'
{"error":"unauthorized","reason":"must be admin or user."}

Now to get authenticated. I'm using the cookies authentication method to create sessions between clients and the couchdb server. All you have to do is POST to /_session with the username and password and you will be given an authentication token to use for later requests.

First authenticate, here is the command and some of the output you should see (note the new "-v" argument to curl, it's important!)

$curl -vX POST http://localhost:5984/_session -d 'username=myuser&password=mypassword'

* About to connect() to localhost port 5984 (#0)
*   Trying 127.0.0.1... connected
* Connected to localhost (127.0.0.1) port 5984 (#0)
> POST /_session HTTP/1.1
> Host: localhost:5984
> Accept: */*
> Content-Length: 35
> Content-Type: application/x-www-form-urlencoded
< HTTP/1.1 200 OK
< Set-Cookie: AuthSession=bXl1c2VyOjRBRjFGRjYyOnltRR2ir7eFNaVYMkPHmIG9VRmP; Version=1; Path=/; HttpOnly
< Server: CouchDB/0.11.0a (Erlang OTP/R13B)
< Date: Wed, 04 Nov 2009 22:08:38 GMT
< Content-Type: text/plain;charset=utf-8
< Content-Length: 12
< Cache-Control: must-revalidate
{"ok":true}
* Connection #0 to host localhost left intact
* Closing connection #0

The important part is the "Set-Cookie" header in the response on line 13. We're going to use that in future communications. Now we can add that new record:

$curl -vX PUT http://localhost:5984/mydb/dummy -d '{"type":"something", "info":"Some more information"}' -H "Cookie: AuthSession= bXl1c2VyOjRBRjFGRjYyOnltRR2ir7eFNaVYMkPHmIG9VRmP" -H "X-CouchDB-WWW-Authenticate: Cookie" -H "Content-Type: application/x-www-form-urlencoded"

{"ok":true,"id":"dummy","rev":"1-b2c3fd668db420b31478176059e2c7ff"}

You can also ask the server to return the authenticated user's username and roles with a GET to /_session

curl -X GET http://localhost:5984/_session -H "Cookie: AuthSession=bXl1c2VyOjRBRjFGRjYyOnltRR2ir7eFNaVYMkPHmIG9VRmP" -H "X-CouchDB-WWW-Authenticate: Cookie" -H "Content-Type: application/x-www-form-urlencoded"

{"ok":true,"name":"myuser","roles":["user"]}

Conclusion

This authentication method may not be for everyone. Some may have a problem with the fact that anyone can see any document in the database. What I've described above only prevents certain types of updates to documents i.e. writing to the database. Also the validation described above is far from perfect, which is why I suggest adding some more logic (like not allowing users to delete other user accounts). There's a good wiki on turning your couchdb into a translucent database so that the whole "anyone can see my db" is not such a big problem. Supposedly it is possible to delete a session by sending /_session a DELETE with the correct cookie header, but it hasn't worked for me so far. Anyway I Hope this helps!

Filed under: couchdb

Matt says...

One more Raindrop/CouchDB video--this app is really showing the power and benefits of using a schemaless database. Perfect fit for something like Raindrop.

Filed under: CouchDB

Matt says...

Follow-up to the previous Raindrop/CouchDB video--this shows how Raindrop uses CouchDB. Nice to see an increasing number of real-world uses of CouchDB out in the wild.

Filed under: CouchDB

Matt says...

A friend of mine pointed out that Mozilla Raindrop is using CouchDB for persistence, and a member of the Raindrop team produced a really nice CouchDB intro video. There's another video about how Raindrop uses CouchDB--I'll post that one as well.

Filed under: CouchDB

lmaa says...

There's a kind of a hype around the database that puts pressure on relaxing. Great idea! An new way to think about databases for the web was badly needed, but there seems to be quite a lot of people thinking too short when it comes to the another usage of couchdb. All the libraries I've seen recently make one big mistake when it comes to determining the Type of a document. Schemaless databases give you the opportunity to name the Type column the way YOU like. 

This leads to the following problem: When accessing a couchdb from another client that uses a different couchdb library, it can query the data from the couchdb views without a problem, but when it comes to deserializing json to objects, many libraries rely on hardcoded Type-columns that determine the "application-specific" class. 

For example the ruby library couchpotato, which is really neat, uses the key ruby_class, while the actionscript3 couchdb service in the backend of restfulx uses the really creative term clazz. So we either agree on a standardized name for the Type column, or we need to implement a Mutli-Type-column-handling in each library.

All the libraries really ignore the fact, that it should be possible to access the couchdb from different clients directly since it is already a restful webservice. Am I the only one demanding to be able to use it that way?

Of course one could develop a new webservice in front of the couchdb to which all the other clients can talk then. But then we have another goddamn API you have to implement in each client. The goal should be to reduce the not-reusable-bloat in each client to a minimal API-Overhead. Of course there needs to be some logic implemented to handle dependencies of document creation, or cascades when deleting documents, but having a couchdb lib in each client reduces the overhead for the developer to a minimum. Furthermore you can reduce the load on the server side, since it just has to handle the real db-calls, while the logicwork is carried out by the clients.

Just my thoughts on the current evolving couchdbesque apps and services around the web.

What do you think? For what are you using your couchdb? Are you using couchdb yet? Got something better?

Filed under: couchdb

Matt says...

Ubuntu 9.10 Karmic Koala has just been released. This is big news as this version includes Apache CouchDB, used as a replicable database by desktop apps. This means CouchDB will be on over 10 million desktops. Nice :)

WOW! Didn't realize CouchDB was standard in Ubuntu now. Very cool. I'll have to look more into how exactly it's being used. Congrats to the CouchDB team!

Filed under: CouchDB

Brian says...

I actually gave this presentation on CouchDB several months ago, but I didn't blog back then, so I thought I'd put it up now. This is a basic broad overview of what CouchDB is, what Ruby Libraries are available, and some example code on how to use them. With the latest release (0.10.0), I'm not sure if this is still 100% accurate, since it was written for 0.9, but it should be close. In case you were wondering, here are the Ruby Libraries avaiable:

Filed under: couchdb