State of the Hackage

This summer (2010), I’ve tasked myself with porting the current Hackage codebase, which is served at hackage.haskell.org to web browsers and cabal-install processes alike, to a newer one which is still in development but nonetheless pretty polished. The older one is known as hackage-scripts, and you can find it here:

$ darcs get http://darcs.haskell.org/hackage-scripts/

Its primary goal is to serve both cabal files (package metadata) for the cabal-install tool to parse and package tarballs for it to compile, and the server uses a glorified directory tree to accomplish this. It also has a minimalistic web interface for finding packages, viewing their metadata, and perusing Haddock-generated documentation. hacakge-scripts uses a combination of static files and Network.CGI executables, which are invoked by the web server, read information about the request using the CGI specification, and then print the HTML response to standard output. Not the least of these scripts is the one that uploads new packages, using either cabal upload or the web interface.

hackage-scripts is portable in that it should run on any standard Apache installation. Unfortunately, it usually doesn’t run out of the box. The directory tree and static files have to be set up manually, and the Makefile and source code need to be hardcoded with pathnames indicating where the set up is. Even if you can’t get it running on your own, it is happily chugging away on hackage.haskell.org, which your cabal configs (~/.cabal/config) undoubtedly point to.

The candidate replacement is known simply as hackage-server, and you can get it here in its pre-summer-of-code state:

$ darcs get http://code.haskell.org/hackage-server/

It uses the Happstack web framework to deconstruct URIs by their path hierarchy, rather than letting Apache root through a large directory tree of mostly static files. It also uses the happstack-state package, at present keeping approximately 186 MiB of package data for 8376 package versions in memory to serve requests, falling back to the disk for larger files such as the package tarballs.

This summer’s project is particular in that it involves work on a code base which most Haskellers won’t install themselves, but provides a service most of us will end up dealing with frequently. This makes it important to get right from an architecture standpoint. Nonetheless, I hope to make it painless to set up a secondary Hackage repository as a drop-in replacement for the main one, potentially allowing you to pull from a variety of sources of varying stabilities. Setting up a server on http://localhost:8080/ over an empty repository is as easy as changing to the repository’s top-level Darcs directory for the repository and running

$ cabal install
$ hackage-server --initialise

(albeit not as easy if the dependencies end up failing: I had to change the Happstack dependency brackets in hackage-server.cabal from ==0.4.* to ==0.5.* because I use an older base) Setting up a haskell.hackage.org clone with the current tarballs is a bit more complex, but within the realm of science to solve!

$ cabal install
$ wget -P /tmp http://hackage.haskell.org/cgi-bin/hackage-scripts/archive.tar
$ wget -P /tmp http://hackage.haskell.org/packages/archive/00-index.tar.gz
$ wget -P /tmp http://hackage.haskell.org/packages/archive/log
$ echo 'admin:wywGGkc7Qc/6I' > /tmp/htpasswd
$ echo 'admin' > /tmp/adminlist
$ hackage-server --import-index=/tmp/00-index.tar.gz \
    --import-log=/tmp/log --import-accounts=/tmp/htpasswd \
    --import-archive=/tmp/archive.tar \
    --import-admins=/tmp/adminlist

Be warned: archive.tar is 128MB at the moment! As for wywGGkc7Qc/6I, it is one of 4096 crypt-salted hashings of the password admin. On Wednesday I implemented digest authentication, which would instead hash admin:hackage:admin in MD5 and use a nonce challenge/response for reasonably secure authentication (the current scheme sends your password in near-plaintext with every request). I found a minor Chromium bug in the process, too!

Tersely put, the design goals are for hackage-server to become a more consistent, extensible, modular and (most importantly) runnable Hackage server. This means duplicating the existing functionality, a task mostly done by Antoine Latter and Duncan Coutts in the span of the last two years, and organizing the modules into a URI hierarchy that obeys REST and ROA principles. I’ve outlined all of the resources Hackage currently provides (partially listed on the trac wiki), and I’m working on a mapping to a new and improved set of URIs.

For the more commonly accessed Hackage URIs (those that have been linked from other websites or hardcoded in cabal), backwards-compatibility is a priority, and mostly already implemented as a series of 301 redirects. Such a legacy redirect system might be considered a “feature”, a plug-in functionality which can be enabled and disabled. Part of making the new Hackage modular and hackable is defining a consistent interface for features. Much like lambdabot‘s Module typeclass, each feature can be defined discretely, and the behavior of the web server becomes the msum of each feature’s ServerPart Response.

The above is the state of affairs on Day 1 (well, Day 4, but I’m still getting started with these new-fangled blags!). The title of my proposal is “Infrastructure for a more social Hacakge 2.0“, not “A more social Hackage 2.0″. I expect that the exact array of social services that Hackage will provide will need a hefty bout of fine-tuning and analysis (see also some insightful thoughts on this), so my job is to provide the technical base to make the shiny new features easy to plug in and modify, as well as implementing as many as possible in a mad rush of coding late July and early August.

If you have any kind of wish list for Hackage features, it is imperative that you let me know—eventually. Duncan and others have encouraged me to concentrate on setting up the infrastructure before building features, so at some point I’ll try to facilitate a community discussion about what you all want to see in our favorite package repository. If you need me, you can find me as Gracenotes on the #haskell and #hackage channels on irc.freenode.net. And best of luck to my fellow gsoc-ers, whose blogs I’ve linked in the sidebar.

About these ads

May 27, 2010. Uncategorized.

4 Comments

  1. Don Stewart replied:

    Obviously resiliance and robustness are a huge concern where Hackage is involved. It has served up over 2M downloads in the last couple of years, and we absolutely cannot lose any data.

  2. ja replied:

    Your blog is not working with the Planet Haskell feed. In the RSS for planet.haskell.org, and on the Planet Haskell page itself, each of your post titles is just a link to http://planet.haskell.org.

    • cogracenotes replied:

      This seems to be a problem with Planet Haskell parsing the default WordPress feed, since it provides all the needed information. If it can’t be resolved I’ll move the blog to my own domain and configure it.

  3. Antoine replied:

    Hi Matthew,

    It is great that you’re working on this!

    Looking at the digest auth code, it looks like we’re protecting the user from sending their password in the clear, which is a great goal.

    But it looks like we’re still vulnerable to replay attacks, but I really don’t think it’s much of a concern.

    I had some code around for nonce generation based on a hidden secret and a timestamp (for POSTS, to limit the replay attack window), but I think it is gone now.

    The idea is that for a GET we accept any nonce that the client sends up (which I think is what you do) but then for a POST we could deny and send a new nonce which has the timestamp coded in it:

    identifier:timestamp:H(timestamp:secret)

    When the client sends the new request the timestamp can be verified as one we sent, and then checked against the clock.

    All of this might be over-kill, though. And certainly secondary to letting users not send their passwords in plaintext.

    Antoine

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

%d bloggers like this: