Hackage on Sparky

Hi, Haskellers. It’s been a while since I finished most of Hackage 2.0’s internal infrastructure. The site still needs a visual makeover, but I feel that enough of the core functionality is exposed for it to be useful to you guys. The latest from the darcs repository is running at:

http://sparky.haskell.org:8080/

This is imported from Hackage package data a day or so ago—no user account data. The features currently enabled on the server are package pages, uploading packages, uploading candidates, distribution information, user groups, documentation, build reports, preferred versions, package deprecation, reverse dependencies, download statistics, tags, name search, and a handful of others.

The most important feature, though, and the reason this was a complete rewrite instead of just extending the old server, is that the internal design is modular and meant to be extended easily. If there’s a feature you don’t like (say, doing download statistics), it should take very little time to gut it from the application and not compile it in at all. The NameSearch module, as an example, adds two search indices, a simple search page (at /packages/find), and an OpenSearch plugin with suggestions. Installing it entails adding a line to Features.hs and writing an HTML view for it.

Performance

As far as performance goes: the process of routing a URI, querying data from several sources, and rendering the resultant page takes anywhere from 15ms (for an unadorned package page) to 3 seconds (for long lists of packages with descriptions and tags) on the sparky server. This is the amount of time it takes to fully generate the document as a ByteString, which is then given to the Happstack web framework. Here are some example times. I expect that switching from xhtml to BlazeHTML, based on the benchmarks so far, would definitely reduce the rendering time; I’m looking into other places to cut corners, though I’m no expert here.

Routing itself takes around 1ms, based on the dynamic approach I described in this post. On my laptop, which has faster cores but far fewer of them, crafting a response takes anywhere from 2ms to half a second, and routing takes around 0.2ms, for the same server configuration and package collection.

Unfortunately, sparky itself seems a bit laggy: yesterday it took 30 seconds (!) to request and retrieve a 350KB HTML document which is fully cached in memory, even though it took a fraction of a millisecond to get a ByteString for it. I’m looking into this.

Try it out!

So, take a look around and tell me what you think! If you want to try out your own copy, these should work as a bash shell scripts, if you have ghc+cabal-install+alex+happy on your system: import current Hackage data or start a completely new server. (These install the server and use its command line interface.) Importing the current Hackage dataset requires somewhere in the neighborhood of 750MB of memory (I’m looking to reduce this) and 600MB to run the server (sparky has 32GB of memory). A brand new server requires just 2MB of memory.

To do

The primary goal this summer was to create a server architecture that could handle whatever we as a community need, and implement as much of it in Haskell as possible. I’m only one person, so there’s still a lot left to do, short-term and long-term, to get a better Hackage. I’ve outlined some of these tasks below.

What needs to be done before deploying to hackage.haskell.org?

  • Documentation. It’s one of the most important things Hackage provides. hackage-server lets maintainers upload documentation tarballs, but ticket 517 should be resolved so documentation can be more easily generated with Cabal.
  • Importing download statistics from the last few years. Granted, this is a minor one, but it’s a big help to have these without a gap in recording.
  • Stress-testing, in terms of making sure the server performs well and maintains the consistency of internal indices. Make sparky a bit more responsive. Ensure compatibility with cabal-install, including old versions. Double-check security in order to minimize the risk of attacks (replay, DDOS, etc.).
  • Deciding policy for things like account creation and uploading. I’ll put up a blog post soon about the policy that hackage-server currently has for these sorts of things, including an overview of the user group system.
  • Implementing backup for some of the newer features and creating an interface for admins to download backup tarballs.
  • Make sure the URI scheme is convenient for everyone.
  • Make robots.txt and set noindex on pages as appropriate.
  • Arrange for distribution maintainers (for Debian and Arch, presently) to send us updates about which packages they have available. Haskell packages in distribution repositories tend to be simpler to install and more stable, so connecting to them is important.
  • We need site admins and package trustees!

In the short-term future? (these should be implemented, sooner better than later)

  • Build reports: get a system working for cabal-install clients to send build reports, anonymous or non-anonymous, as a replacement/enhancement of the build bot’s functionality. At present Hackage can accept basic build reports, but this should be gotten right before it’s enabled, particularly for anonymous reports.
  • Web interface redesign. Since Hackage has more information to serve, it needs a better way to visually organize it. Anyone with web design chops is welcome. Other things to do here: expose JSON representations for Ajax functionality; rewrite HTML generating-code to use Blaze.
  • Serve the internals of packages and set up a sitemap.xml so they can go on Google Code Search.
  • Allow modifications to the cabal file without bumping the package version number. Admins can do this, but under some circumstances package maintainers might want to as well.
  • See if user group information can be stored better internally.
  • Get an STMP client running on the server to send automated email notifications.
  • More server-side logging of actions (with user and timestamp): this makes it easier to find out what’s going on and provide historical data.

In the long term future? (looking into the crystal ball)

  • Social features. This includes reviews, voting, contributing content: the little things that let you know your fellow Haskellers are humans and not code-generating automatons (besides mailing lists, IRC, reddit, meetups, conferences, blog posts…). The more effectively we can connect maintainers and users, the better. Most of these social features would be simple to implement technically. It’s more difficult to decide which features would actually benefit us as a community and get better-quality packages.
  • Allow the creation of arbitrary groups of packages. Currently, there’s a Haskell Platform feature, which puts a little star next to every package that’s in the platform. Why not lay the groundwork for other package groups?
  • Insert your idea here

There’s a document in progress about the server internals, and how you can extend Hackage with new features. For the next week, I’ll be tidying up the code, bug-hunting, writing documentation, and seeing what I can do with transition preparations. Come join #hackage on freenode, if you like, since we’ll be discussing some of these things in the coming weeks.

About these ads

August 8, 2010. Uncategorized.

9 Comments

  1. bill replied:

    This is excellent!

  2. dom96 replied:

    I was expecting to see a redesign, but i see you have that planned. Other than that looks great.

  3. Joachim Breitner replied:

    The distro data for Debian is provided at

    http://people.debian.org/~nomeata/cabalDebianMap.txt

    and the same format is generated by Arch somewhere. On current hackage, this is downloaded periodically by a cronjob. Is there any reason to change this scheme?

    • Duncan Coutts replied:

      Joachim, the current interim method for providing distro links is backwards. Distros should push this information to the server rather than having the server poll and pull it.

      This was noted at the time the feature was added to the old server implementation. http://hackage.haskell.org/trac/hackage/ticket/570#comment:2

      We didn’t require that the feature be rewritten at that time because it was known that it would only be in use in the old server until the old server is retired.

      So in the new server implementation all you need to do is to PUT the updated map to the server (using appropriate credentials). This should be easy to do using curl/wget.

  4. jaspervdj replied:

    Nice work! If you need any help with blaze, I’d be glad to lend a hand.

  5. Edward Z. Yang replied:

    Exciting stuff. Thanks for your work!

  6. Don Stewart replied:

    I think you can bring in collaborators to help with things like the design of the site, too. As well as the BlazeHTML team and the happstack team, to help with performance issues. This is such a high profile project, they’ll want to help.

    What support is there for storing the system — are all packages kept on disk as with the current Hackage?

  7. 403 replied:

    I’m getting 403 on Hackage’s trac, though other projects are up:

    http://hackage.haskell.org/trac/

  8. Mike replied:

    What/where is the latest info on Hackage 2.0?

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

%d bloggers like this: