Hackage Proposal

What follows is the proposal as I submitted it for Google Summer of Code 2010. I’m using it as a guideline for the completion of the project.

Abstract: Hackage has grown to have thousands of uploads and millions of downloads. I plan on making hackage-server, the candidate replacement for the current hackage-scripts, deployable so we can crowd-source the process of picking the best packages for a job. I’ll add statistics, reverse dependencies, and build reports, and also develop an infrastructure to support things like voting, tags, package reviews, and other content that can be contributed.

Infrastructure for a more social Hackage 2.0

Goals

The project’s goals come primarily from Hackage 2.0 Web Services and Hackage-related suggestions I’ve mined from places other Haskellers frequent, including a haskell-cafe thread I started.

Cabal started in 2004 as a means of distributing Haskell modules, and shortly thereafter Hackage began as a centralized repository for them, with a useful web interface to boot. Fast-forward to 2010: Hackage has hundreds of uploads and millions of downloads, with just a few endearing bugs. The problem I seek to address is giving Hackage more leg room to grow. Given the volume of coverage in some areas (binary serialization, parsers, regex, graphics), fulfilling the original purpose of finding and installing libraries fast can be better achieved by giving people more of the right kind of information, and less of the overwhelming kind. However, these efforts shouldn’t discourage content writers from sharing perfectly good code because of the extra work. This is where users come in, so we can crowd-source the process of picking the best packages for a job.

These changes would be made under the new hackage-server, the candidate replacement for hackage-scripts, reaching the goal of making it the replacement. The final source should be deployable at hackage.haskell.org, and also simple enough to run a Hackage package server anywhere with minimal configuration.

These are the kind of features I’d like to implement:
1. Collecting and serving more information about packages.
2. Expanding the Hackage interface, with corresponding backend functionality, for users and maintainers to update and contribute in real time.
3. Support for better interfacing with cabal and other clients: acting as a web service.

The below document also has a bit more of a focus on implementing social functionality than might be actually implemented during the summer.

Specifics of design

The work on bringing hackage-server up to feature parity is primary. As of April 2010, an instance of the new Hackage is running at http://sparky.haskell.org:8080/, and already is very similar to hackage.haskell.org’s.

1. Collecting and serving more information about packages.
Hackage should be able to get numbers about packages popularity statistics—essentially per-tarball downloads—and other details users might want to know to bring it up to par with most other package servers. The amount of activity and how well-documented the package is (if this information can be retrieved quickly, perhaps using comment-counting) are additional useful numbers.

These numbers should be collected, crunched, and output as metrics that will determine how suited an individual package is, as well as ranking various packages. These statistics should, if possible, fork or use Hackage-analyzing code that’s already been written, even Hackage-analyzing code that’s been uploaded to Hackage!

Another sort of useful information to have at-a-glance is reverse dependencies. Using canvas or simple HTML positioning—generated from graphviz algorithms, maybe—we can also show dependency graphs to a few dependencies in either direction. It can potentially work for arbitrary versions of each package, although schedule-wise this might be a stretch. Dependency information will need a one-time analysis of Hackage-wide dependency information, and then it can be updated incrementally. I expect Roel van Dijk’s work on it to be a great help.

2. Expanding the Hackage interface, with corresponding backend functionality, for users and maintainers to update and contribute in real time.
This way, users and can help other users, and users can help maintainers. There are several kinds of information that users can provide to indicate smooth sailing or troubled waters. Not everyone will write a review for every single package they install successfully or try to throw away. One-click upvote/downvote surveying, therefore, will be useful: “This package was great” or “This package did not meet my needs”. Any feedback in more detail, such as reviews or notes about bugs, should be for package uploaders to enable. Voting can also happen along more than one axis for different aspects of package usefulness. Indices will be generated as more votes and reviews come in. A problem that still needs solving, however, is versioning of reviews and nullifying information that’s no longer relevant. A review system probably won’t be deployed this summer due to these social problems and a need for long-term fine-tuning and integration with other bug-tracking systems.

Users should also be able to add tidbits to a package page, such as extra links to documentation and simple examples for getting started. Other useful information would include marking relationships with other packages, such as comparing functionality, or allowing administrators to mark a package for deprecation.

This item also includes writing intuitive web interfaces for features that are already implemented in the hackage-server backend, such as uploading package documentation instead of generating it server-side, but need better exposing. Maintainers should have some choices about their package page. For example, if they already have a bug system or prefer email, they just won’t check the box that enables reviews.

3. Support for better interfacing with cabal and other clients: acting as a web service.
One of the issues with Linux distribution coordination is making sure Hackage and the other repository can communicate. By implementing a framework for services, one with a RESTful approach (including nested URLs as services, using resource-modifying HTTP headers). I’m not certain about the scope yet, but this framework may expand into a total restructuring of URLs for Hackage.

One of the holy grails of a packaging system is making sure the version brackets are correct, and knowing which chains of versioned dependencies compile and link and which don’t—a problem that’s NP-complete in its full form, not to mention CPU-intensive. This is ongoing and exciting (and frustrating) work in Cabal, which Hackage can gladly serve if an infrastructure for it is put in place, perhaps complete with compile farms. Meanwhile, we can detect some basic but important issues, like hackage-scripts does: cyclic dependencies, impossible version brackets (as far as Hackage is concerned), and perhaps even the diamond dependency problem. Mass build reports can go further for providing heuristics for resolution, and even acting as a substitute for server-side compilation if user-uploadable docs are in place. I’ll be concentrating on those.

The build report submission service in cabal is functional, complete with anonymizing functionality, but hackage-server is mostly unfamiliar with it. Analyzing the build results and working out trends belongs both to the first goal and the third. Build reports can reveal useful extra information, such as the extent to which the package is maintained.

Other goals
The result should also be portable: setting up a Hackage instance given a collection of Cabal packages should be easy. With client-side Cabal support for multiple repositories, there can be alternative or experimental repositories, or simple replications if the main Hackage is feeling under the weather.

Benefit to the community

The deployment of this project will provide immediate benefits to the community. Why? The features that I’d like to implement are ones which Haskellers, from academia to the industry to the basement, can use to find packages that meet their needs and to submit their code to be discovered by more people. There’s no pressure to make a package that’s widely used or particularly successful, but tools should be there to facilitate that.

Also, I think there are some great opportunities for confluence with other GSOC projects, particularly the Haddock- and Cabal-related ones, to both improve Hackage and give wider exposure to other projects. For example, Cabal test results could be uploaded in addition to build reports.

Another project this proposal will be interacting with quite a bit is Happstack. Having a full-featured Happstack server will help test its mettle as a scalable dynamic web server, particularly with the 2.0 functionality.

Schedule

Hackage 2.0 has been in progress for a few years, and will continue long after the summer. With this in mind, the work for the summer is mostly the addition of important features. The order of implementation is agnostic for some of these features. Given this, I’ve attempted to order objectives according to which are the most essential, which would help me gain familiarity with thecodebase earlier on, and which would require familiarity with the codebase later on.

During the community bonding period: Become more familiar with Happstack, hackage-scripts, and hackage-server. Contribute more to Hackage myself, and do research about done-right modular design for servers.

1. 2 weeks. Become familiar with codebase and add documentation to declarations as I understand them. Find functionality not in the old server and not covered by the coming weeks and fully port it. Do the same for items in the hackage-server TODO list.
2. 1.5 weeks. Get build reports to display and gather useful information: already partially implemented. Use this feature as an opportunity to become even more comfortable refactoring and enhancing the hackage-server source.
3. 1.5 weeks. Get user accounts and settings working, writing a system for web forms, both the dynamic JavaScript kind and static kind. Use this system to get package configurating settings editable by both package maintainers and Hackage administrators.
4. 1 week. If a viable solutions for changelogs comes up by this point, I’ll implement it here. This might be as simple as a ./changelog file with a simple prescribed format.
5. 0.5 weeks. Implementing gathering important package statistics. These includes items like page views.

At this point, halfway through, hackage-server should be deployable (a deliverable), though a bit lacking in bells and whistles.

6. 0.5 weeks. Plan for further integrating a web service, implemented over the next few steps.
7. 1.5 weeks. Start working on voting and review system. This includes other user-contributed content, like links and perhaps code snippets. Get a basic framework set up, improving the general server design.
8. 1.5 weeks. Draft algorithms to sort packages based on reviews and statistics. Things like PageRank and reverse dependencies will be completed here.
9. 1 week. Generate more useful indices of packages based on everything that’s been implemented, adding cosmetic touches to the interface and more useful information where appropriate.
10. 1 week. Lots and lots of testing and code polishing. Fill in any missing documentation.

In sum, 12 weeks of coding. The time marks may shift around slightly as I learn the hackage-server ropes. I also plan to take total advantage of any extra time I can come by. If there are any other requirements for the project, the schedule will bend (but not break) in response. For example, I’ll probably end up replacing some of 7, 8, and 9 with making a much larger dent in the RESTful API. By that time, the components will be modularized, and I’ll discover common API functionality between them to abstract.

Biography and programming experience

I’m a student at Stony Brook University in Long Island, New York, out to complete my undergraduate Computer Science degree. Most of my productive waking hours, I code, read, and learn. In case I get too bogged down, there’s nothing like a session of piano improvisation or an hour of cycling to lighten the figurative weight on my shoulders. Using Haskell to get things done briskly has given me lots of free time, but in the same fell swoop consumed even more time as I study its depth and breadth. [this copy of the application has more details]

Conclusion

I think this is an important project to do now, as Hackage continues to see tremendous growth. It’s a good time to make a concentrated burst of creative and motivated effort, approximately 1-person-summer’s worth, to give it 2.0 in the title. After examining the project in some depth, I feel I can realistically make this happen. The mentor-student model, in addition to the community interactions, is a good fit for me. I like coding by myself, and I like communicating with others about code.

My previous experience with programming and algorithms overlaps with the project’s scope. It’s my hope that hackage-server will be more comprehensive as a result of my hacking, and that my experience and knowledge will expand as a result of working on this project. Plus, I’m not quitting Haskell any time soon, so I too can snack on the fruits of my labor.

—Matthew Gruen

2 Comments

  1. jberryman replied:

    I’m excited for the possibility of Hackage 2.0! If you are taking suggestions, here’s a biggie for me: I want to be able to search hackage the way I can search the base packages with hoogle, i.e. find a package containing a function with a type signature similar to what I’m looking for.

    Good luck!

  2. Robert Massaioli replied:

    I agree with the hoogle+hackage idea; ‘search’ is what a package database is all about. I also think that this is a great plan and it is much needed. Having a new hackage which makes selecting the right package easier is a really good idea. I’ll be very interested in seeing the end result.

Leave a comment