What is codavaj?

codavaj is javadoc in reverse. A seemingly useless tool. Arguably of interest only to the most hardcore java hackers. A "must have" in every geek's software toolbox.

What is codavaj really?

Codavaj is a reverse engineering tool with focus on Javadoc. Currently codavaj is distributed as a command line tool ( codavaj.cmd/sh - using org.codavaj.Main class ). A 100% java tool. Codavaj supports Java6 language constructs including Generics and Annotations.

You can

  • convert an entire local javadoc tree into java source code.
  • download an entire remote javadoc tree via http(s).
  • derive a Reflection-like API based on information derived from a javadoc tree.

Codavaj works by converting javadoc HTML into XML ( using Nekohtml ) and then successively deriving class information from the XML using XPATH queries ( using dom4j ).

Known limitations

  • Generated constructors have implicit superclass call - which does not suffice to compile for all class heirarchies.
  • Codavaj does not 100% acurately reconstruct the source code comments. @author, @since tags are not currently preserved accurately.

Why?

Not because i lost the sources for some code - but miraculously still had the javadoc for it! This wouldn't help much anyway because codavaj leaves you with a TODO for each method implementation.

Codavaj can potentially be used as part of an exploit to de-obfuscate compiled java code, with the precondition that the javadoc is availible for the code in question. This is often the case for libraries where the API is documented publicly on the internet but the library itself is distributed with demo's and other third party products obfuscated. The trick? Typical obfuscation tools obfuscate code and at the same time provide an output mapping file which can be used to unobfuscate the obfuscated code. Use such an obfuscation tool on an already obfuscated code ( some other obfuscator ). This will supply a mapping file and re-obfuscated code. Using the type information derived by Codavaj from the javadocs, perform an analysis on the re-obfuscated code and determine the best or most plausible match to the original structure. This part of the procedure is not provided in Codavaj. Once a reverse mapping is determined, apply the reverse mapping to the mapping file and the obfuscator will magically turn the re-obfuscated code into something a bit more useable - at least the API part. This potential didn't go unnoticed - the prize goes to Andreou Dimitris for emailing me back in Nov. 2006.

For me this was simply an intellectual challenge - a side battle in a hobby programmer's grand master plan. As a bonus i managed to improve my XPATH query skills a bit. It took me 2 full working weeks to get the initial version out. If I needed this once, then there's bound to be someone out there who might like to use it, for whatever reason. It probably will save them the 2 weeks too.

Running Codavaj

The full distribution zip contains all jar dependencies required for running codavaj. For convenience a 'codavaj.cmd' and 'codavaj.sh' shell script ( for Windows and Linux respectively ) is provided in the distribution for starting codavaj with it's many jar dependencies.

to download an entire javadoc tree for further processing use

  • codavaj.cmd wget <URL> <destination-dir>
  • codavaj.cmd wget http://jumpi.sourceforge.net/javadoc/j2se tmp/jumpi/javadoc

to convert local javadoc tree into java source

  • codavaj.cmd codavaj <javadoc-dir> <javasource-dir> {<external-link>}*

Example without external links: codavaj.cmd codavaj tmp/jumpi/javadoc tmp/jumpi/src

Example with external links: codavaj.cmd codavaj tmp/jumpi/javadoc tmp/jumpi/src http://external.link.com/api/ http://external.link2.com/api/

Codavaj "Reflection-like" API

Codavaj preserves all javadoc information including all fields, methods, constructors and their modifiers, constant values, class heirarchy and inner types.

You can use the following function in org.codavaj.Main to derive the reflection-like API:

    /**
     * Derive a reflection-like API from a javadoc source tree. Resolve any type names
     * to external javadoc links. External links to Sun's JDK javadoc apis are 
     * automatically resolved ( i.e. http://java.sun.com/j2se/X/docs/api/ ) 
     * 
     * @param javadocdir the javadoc tree root
     * @param externaLinks a list of 'http://..' strings representing external javadoc refs.
     * 
     * @return a TypeFactory handle on the resulting api
     * @throws Exception any problem.
     */
    public static TypeFactory analyze( String javadocdir, List externalLinks ) throws Exception {
        ProcessMonitor pm = new Main().new ProcessMonitor();
        
        DocParser dp = new DocParser();
        dp.setJavadocDirName(javadocdir);
        dp.setExternalLinks(externalLinks);
        dp.addProgressListener(pm);
        dp.process();

        return dp.getTypeFactory();
    }
	    	

Through the TypeFactory, you can get at Types ( Classes or Interface - both inner and outer ), and their respective Packages. A Type contains fields ( Field ), constructors and methods ( Method ), and inner types ( Type ). See the source code here for more information - and the SrcWriter class which traverses the Types to write java source files. Packages can be traversed for access to it's Types and sub Packages.

Acknowledgements

Codavaj builds vertically on many great Open Source Projects - without which codavaj would not exist. HttpUnit, Nekohtml, Dom4j and several others indirectly. Many thanks to their project contributors!

Bug fix and contribution complements go to - Yevgeny Rouban and Brian Koehmstedt

SourceForge.net Logo