Tuesday, January 17, 2012

Porting effort for Clojure contrib libs

Looking for Clojure contrib lib projects to port to ClojureCLR?

I looked at the most popular libs on https://github.com/clojure, the official libs of the clojure project.  I defined popularity by the number of watchers, lacking a better criterion.  Here are the top projects sorted by number of watchers when I looked recently.  Ignoring those in single digits and all java.* projects, here they are:

WatchersProjectWatchersProject
129
core.logic
23
test.generative
69
core.match
20
core.cache
60
tools.nrepl
19
core.memoize
37
tools.cli
18
algo.monads
36
data.finger-tree
15
data.xml
35
tools.logging
11
test.benchmark
32
core.unify
10
core.incubator
28
data.json
10
data.csv
10
tools.macro

There are some fairly trivial edits that are required in porting most libs.  These include:

  1. Substituting an appropriate CLR exception class.  For example, InvalidArgumentException becomes ArgumentException.  If a throw uses Exception, that will work as is.
  2. Substituting interop method names.  For example, toString becomes ToString, hashCode becomes GetHashCode, etc.  Most String methods and some I/O methods just need capitialization.  BTW, ClojureCLR preserves case on most clojure.lang class method names so they don't need to be changed.  (You're welcome.)  Also, method names on protocols won't need to be changed.

I'll refer to these kinds of changes below as the usual.

I did a quick scan of the source of each project to estimate the effort required to port the project to ClojureCLR. In the order given above, here are some comments on each.

core.logic: This is one of the larger projects.  The usual, and not that much of it.  The only thing I saw that might take a little more investigation is that the deftype Pair implements java.util.Map$Entry.  (See below for more.)  Easy. (Unless it requires actual thought, in which case you'd have to understand the code, and that would make it a Challenge.)

core.match:  Another large project.  The usual, and not much of it.  The bean-match function will require adaptation to CLR classes and the regular expression matcher will need to be examined -- JVM vs CLR regexes always requires a look.  Of most concern is the deftype MapPattern that mentions java.util.Map.  The question is always dealing with IDictionary and IDictionary<K,V> -- support for arbitray generics is always tricky.  Probably Easy, with the same caveat as core.logic.

tools.nrepl: This is likely to be tricky.  There are some Java classes that will have to be ported.  Of greater concern is the amount of low-level I/O on sockets.  At best, a Medium project, likely a Challenge.  Given that this project is being redesigned, it might be wise to wait for 2.0 and then put in the effort.

tools.cli: The uusual, and not much of it.  There is a test that uses an Integer method.  Trivial.

data.finger-tree:  The usual.  The only concern is the mention of java.util.Set.  There is no System.Collections.ISet, only System.Collections.Generic.ISet<T>, so some thought will be required. At worst, Medium; more likely Easy.

tools.logging: This will take some work because adapters for .Net logging tools will have be developed.  One might consider log4net, ELMAH, NLog.  The good news is that the code is designed to plug different adapters into its framework, so developing new adapters should be easy, requiring mostly a decent knowledge of the target logging framework.  Most of the tests will have to be rewritten.  Medium, probably fun.

core.unify: The usual.  The same concern about java.util.Map mentioned for core.match.  I'm guessing this is trivial here.  Easy.

data.json: We know exactly how much work this will take.  See Porting libs to ClojureCLR: an example.

test.generative: Needs tools.namespace.  That didn't make the popularity cut, but it should be barely Medium to port, mostly due to the need to think a little about the I/O interop.  In test.generative, there are some library calls, to Random, Math.* methods, system time, etc., that will take a little more work than just the usual.  Barely Medium.

core.cache: A moment's thought about replacing java.lang.Iterable in the definition of defcache.  Otherwise, just the usual.  Easy.

core.memoize: Needs core.cache.  Might work as-is!  Trivial.

algo.monads: Might work as-is!  Trivial.  Hey, when was the last time you saw 'trivial' and 'monads' in such proximity?

data.xml:  The README notes that is is not yet ready for use.  Really, this should be called java.xml because of its dependence on org.xml.sax, java.xml.parsers, etc.  This will require a major rewrite.  Until this is complete, I can't say how hard it will be.

test.benchmark: Looks straightforward.  Easy.

core.incubator: The toughest thing is reference to java.util.Map (see above).  Trivial.

data.csv: The I/O will take some time, but at worst a Medium.  A very Easy Medium at that.

tools.macro: Appears to be Trivial.

So, what are you waiting for.  Plenty of easy ones to get started with and a few more challenging ones.  Whatever you pick, you'll have a chance to read some good Clojure code, always a worthwhile exercise.

Where are the hard ones, you ask?  They certainly exist, just not among the official contrib libs.  There are plenty of other Clojure projects floating around that will require significant effort.

Port a lib today!

A note on java.util.Map$Entry:  clojure.lang.IMapEntry extends java.util.Map$Entry on the JVM. ClojureCLR could not do that because the equivalent to Map$Entry, System.Collections.DictionaryEntry, is a struct and can't be subclassed. Also, we have the problem with the generic System.Collections.Generic.KeyValuePair<TKey,TValue>. I shudder when I see Map$Entry; this is a sign that real thinking will be required.

10 comments:

  1. It seems like many of 'the usual', and some of the others, can easily be avoided at the source by using something more portable in the first place (e.g. str vs .toString). Perhaps we need a) some more of these (what's missing?), and b) some lint-like tool to nag people (myself included :) to use them.

    ReplyDelete
    Replies
    1. There are two common situations involving method name renaming. One situation is using (calling) the method, such as in (.write w) for I/O. The other is mention instead of use, typically in a deftype or an extend of a protocol. I don't believe I saw an undisciplined use of .toString in the code I examined.

      I'll do another post soon with a more comprehensive categorization of porting challenges.

      Delete
    2. Rich's suggested lint can probably be implemented with `analyze`. https://github.com/frenchy64/analyze/issues/12

      I'll make a prototype, could be useful. I'm interested in your other challenges and attempting to make a source analyzer to check for portability.

      Delete
  2. It's interesting to see such a big gap between the rankings by Github watcher count and the rankings by which projects are more widely used. I wonder if ClojureSphere might be a better source for determining priority of ports: http://clojuresphere.herokuapp.com/

    ReplyDelete
    Replies
    1. I picked the current contrib libs because they are active, small in number, mostly small in size, and not very interlinked. All these made the analysis easier.

      Looking at clojuresphere is informative. Sorted by dependendents, the first thing you run into is clojure-contrib, which is 1.2 and sort of deprecated. It's hard to tell what's active/current, and it has a daunting network of dependencies.

      Looking at the top items after that, a first breakdown gives leiningen-related, web-dev-related and emacs-related. Porting any of this stuff is going to be a major effort. Leiningen relies on Maven -- we'll need a different repo system hook for MS/.Net land. Porting swank will require porting cdt, debug-repl and others -- and that's just the beginning. Most of the web stuff has ring dependencies and that sits on jetty and servlets. Compojure lists 26 dependencies (past or current), including ring libs, clout, jetty stuff, swank. More than I cared to contemplate in an evening.

      I was definitely picking the low-hanging fruit in this post.

      Delete
  3. (Off topic) Everyone in the community can feel how much new Clojure users find it hard to start. Granted, lots have been done to mitigate the issue like the new Getting Started page. However, the many editor and ide options that are available for ClojureJVM is itself a problem. Even if a new user chooses one editor and successfully starts, he will find trouble when he reads a tutorial or blog post where the author uses a different tool for instance maven instead of Leiningen or the reverse, Emacs instead of Eclispe+CCW.

    The .NET community always had one tool for development VS, usage of other tools pale in comparison. ClojureCLR has the same problem and the top priority for the ClojureCLR project, in my humble opinion, is to make sure that the abandoned vsClojure plugin works (last I tried it didn't) or build a new VS plugin. With a working interoperability with the host (CLR), a developer can make do for a while without porting the contribs. But without a plugin for their only tool, .NET developers will be discouraged to try Clojure. They wouldn't not use an editor and a command line repl or even try Emacs.

    A fully-featured plugin for VS (it could be offered as a ClojureCLR box which is a VS Shell + the plugin in a single separate download), ClojureCLR would have better chances of success even than ClojureJVM.

    ReplyDelete
    Replies
    1. The vsClojure project soon will be active again. I've been talking with jmis, the original developer, about restarting the project, getting it running with the latest ClojureCLR bits, and continuing development. I don't have a plan or timetable yet, but stay tuned. This is a priority, but not my area of expertise, so suggestions, a helping hand, testing will all be appreciated.

      Delete
    2. You sir, and Rich and others in the best programming community, will be a hero one day. You guys just need to be patient till Clojure takes over the world. Even if it didn't; it'll be the secret weapon that whoever uses it will have smashing successful products. Big thanks to you for your efforts.

      Delete
    3. BTW, I'm happy to report that jmis, who has put so much excellent work in vsClojure so far, plans to continue development on the project.

      Delete
  4. I found the code that implements parse and parse-str of data.xml
    Please check it.
    https://gist.github.com/mnzk/194e226c262513d9ca1c

    ReplyDelete