JavaDocs are not documentation.

Of course, my heading is sensationalist and strictly untrue - but it is true that the standard JavaDoc you see is barely worthy of the name ‘documentation’.

Documentation is hard and expensive to do right. Wriitng good documentation can often be harder than writing code, and it’s not something most programmers relish. However, we know it’s worthwhile. Good documentation makes code usable (for APIs), penatrable (for applications) and better (for all) - documenting your code forces you to think over it’s design, just as explaining it to someone in real life does.

I don’t think I need to say more to convince anyone good documentation is a good idea, so the question becomes what is good documentation?

The point I want to make with this post is that good documentation is not what JavaDocs supply. Let’s take a look at a JavaDoc:

image

This is for SAXParser - one of the most popular ways of parsing XML in the Java standard library. Can you, from that page, parse some XML? I definitely couldn’t - it’s straight off to find a tutorial or more documentation. In this case, there is some more official documentation from oracle.

This is terrible. Having documentation split like this is a sin that makes JavaDocs worse than no documentation so much of the time. To make good documentation this way, you need to:

  • Ensure your broad documentation is versioned the same way as your JavaDocs, and make it easy to swap between versions.
  • Ensure your broad documentation links to your API docs, and vice versa.
  • Encourage people to actually read both documents.
  • Actually write both documents.

Oracle manages to produce both documents, but that’s it. Most third party code I’ve seen does none. They publish their JavaDocs and call it documentation. It’s not. Alone, JavaDocs are just a convinent way to read through the definitions of the source code.

Some people try and shove all the detail of the broad documentation into their JavaDocs - this is almost as bad - it results in an unreadable heap, and bloats your source code - no doubt causing developers to fold comments away and never read them.

Good documentation needs to be curated, and needs to have two components - documentation from code structure and comments, and the broad documentation.

The idea of only producing reference documentation, and not broader documentation is like testing only with unit tests - it completely misses an entire dimension. For anything non-trivial (and hence, anything worth documenting) 'x takes y and gives z' is useless.

So what is the answer? Personally, I have found the tool that does the job right is Sphinx. Sphinx allows exactly what I’ve mentioned above - curated and automatically generated documentation existing side-by-side.

image

(Surprise, surprise, I think Python did it right)

Sphinx is used for Python’s documentation, and it shows why it’s infinitely better. The documentation explains how to use the code first, then goes into the fine detail of reference (with more curated content mixed in where needed). This produces what I have found to be the best standard library documentation of any language.

And, if you look at Python projects, they tend to follow suit. The ecosystem is full of really well documented libraries and applications - while the Java standard is a JavaDoc with trivial comments at best. The damage JavaDoc has done to the Java community is huge in the way it has encouraged developers to let documentation fester.

Look closely and notice the stuff that makes your life easier - inherantly you are encouraged to link through your documentation, the whole thing is versioned together, it’s virtually impossible to miss the broad documentation or the reference documentation - when you go in looking at one, you inherently see the other.

Do documentation right. I’m not saying you have to use sphinx (although it does support other languages, and will make your life easier), but just make sure to cover the bases. Ensure your broad documentation has the same status as your reference documentation. One without the other will hurt you more than you realise.

I also highly recommend checking out Read the Docs, an awesome site that hosts documentation for projects, all built around sphinx - it’s a great way to get your documentation somewhere easy to find and use, and it’s a great example - click through a few projects on there and see how much better, on average, curated documentation is.

Mark and Recapture decorators in Python.

Decorators are a nice little feature in Python, allowing you to transform functions on the fly in a method that looks nice in code. One use I find myself gravitating towards is as a callback marker - avoiding ugly ‘connect’ function calls. This is a great use that is easy to do, however it does have a downside - decorators happen before instantiation, meaning that the callbacks don’t have `self` filled.

My solution to this problem is the idea of a mark and recapture decorator. We create a decorator that marks a function, then, once the class is constructed, we recapture it.

def callback(*args):
    def decorate(f):
        try:
            f._marks.add(args)
        except AttributeError:
            f._marks = {args}
        return f
    return decorate

Here is our decorator. We take advantage of the fact that Python functions are first class objects, and simply mark them by adding an attribute to them. We use a set so that we can run the decorator multiple times to use the same function for multiple callbacks.

def connect_callbacks(obj):
    for _, f in inspect.getmembers(obj, inspect.ismethod):
        try:
            for mark in f.__func__._marks:
                connect(mark, f)
        except AttributeError:
             pass

This is out connection function. This searches through the given object and finds all the marked functions. We do this by trying to access the `_marks` attribute on the `__func__` attribute. The `__func__` attribute is the original, unbound function the method runs, which is the one we marked earlier. We then run connect (which would be your callback creating function) with the mark (which can be any data you want) and the function, or pass if it doesn’t have a mark. We can easily run this connection function in the constructor for our object and the mark and recapture pattern gives us what we want.

Follow this link for an example of the naive version (which gives us the unbound method) vs this method on ideone (which gives you the output too!), along with code highlighting.

Generating words at random - what I learnt from Ludum Dare 22.

I posted this up on the Ludum Dare blog as well, but thought it’d go well here too.

So, I didn’t manage to finish Ludum Dare 22 as I had to travel home from Uni halfway through and ran out of time.

My aim was to create a procedurally generated universe and allow the player to travel around finding out if they are alone as sentient life in the given universe. Given the time issues I really didn’t get much done, but I did focus on a particular problem, I wanted to name planets so players could remember where they had been. How do you create words that are pronounceable without just having planets called ‘Fork’ and ‘Television’. Words like these:

  • fanglas
  • jubbensetrier
  • amenet
  • moquiets
  • mystilaxation
  • consutey
  • untive
  • curchers
  • anchottollon
  • symborse
  • prasting
  • weeloats
  • dupliquding
  • autobency
  • proscolicends

Well, the answer came in the form of Markov chains, a cool little trick that allows you to do this quite simply. Afterwards this still intrigued me, and I finally had some time to finish up my script,  wordgenerator.

It’s a Python library and command line application, so it’s usable by pretty much anyone. If you have trouble thinking up names for things in general, it can be a great help, and as a library it goes hand-in-hand with any procedurally generated content. It’s GPLv3ed, so feel free to use it in any way that fits the license. The above is actual output from my script. You can change the output via a variety of options (explained in the above link) and by changing the input dictionary of words to generate from, for example, using an Italian one:

  • impiate
  • aliersi
  • inaudartererai
  • ottardiscrerei
  • addoluccio
  • deredicassella
  • coibinarei
  • impresto
  • accreste
  • storano

While nothing revolutionary (Markov chains are pretty well known), the script performs pretty well and saves a bit of work. I think it’s pretty cool, and surprisingly funny to see the output you get, so if you find yourself needing names in your next Ludum Dare game, feel free to use it.

Teaching CompSci - Code Reuse

Something I have noticed as a big trend in Computer Science is the act of telling students to forgo the standard library when trying to solve tasks. I get this - when teaching someone the basics of CompSci, you have to start with basic problems, and, for obvious reasons, most of those have already been solved by the standard library.

This has lead to questions asking students to do things without using the standard library, and that’s fine. I get the purpose of that, it makes sense, and there is nothing inherently wrong with it. The issue is with the fact that it is not being made clear to students that this is an academic exercise. I constantly see people re-implementing trivial (or not-so-trivial) functions that are provided by the standard library, and constantly hear people referring to functions provided by a language as ‘cheating’ or ‘taking the easy way out’.

Not reinventing the wheel is something that makes a good programmer. Code reuse has so many benefits - increased stability and less time working on a problem can only be a good thing, and yet I constantly see students getting this misunderstanding that they should implement everything by hand.

Does this mean people teaching should have to think up simple, and yet currently unsolved problems? Of course not. The answer is to simply state clearly, in every case you give a problem like this, that the optimal solution is to simply use the standard library, but for the case of this exercise, that isn’t allowed. This needs to be stated clearly, otherwise, you punish students for following good practice and finding the best solution.

There are many horror stories about reimplementing functionality, and I strongly believe that this kind of teaching causes these horrors.

So I implore all teachers and lecturers, make it clear that what you are asking is only for the purpose of the exercise, and bad practice.