codavaj is javadoc in reverse. A seemingly useless tool. Arguably of interest only to the most hardcore java hackers. A "must have" in every geek's software toolbox.
Codavaj is a reverse engineering tool with focus on Javadoc. Currently codavaj is distributed as a command line tool ( codavaj.cmd/sh - using org.codavaj.Main class ). A 100% java tool. Codavaj supports Java6 language constructs including Generics and Annotations.
You can
Codavaj works by converting javadoc HTML into XML ( using Nekohtml ) and then successively deriving class information from the XML using XPATH queries ( using dom4j ).
Known limitations
Not because i lost the sources for some code - but miraculously still had the javadoc for it! This wouldn't help much anyway because codavaj leaves you with a TODO for each method implementation.
Codavaj can potentially be used as part of an exploit to de-obfuscate compiled java code, with the precondition that the javadoc is availible for the code in question. This is often the case for libraries where the API is documented publicly on the internet but the library itself is distributed with demo's and other third party products obfuscated. The trick? Typical obfuscation tools obfuscate code and at the same time provide an output mapping file which can be used to unobfuscate the obfuscated code. Use such an obfuscation tool on an already obfuscated code ( some other obfuscator ). This will supply a mapping file and re-obfuscated code. Using the type information derived by Codavaj from the javadocs, perform an analysis on the re-obfuscated code and determine the best or most plausible match to the original structure. This part of the procedure is not provided in Codavaj. Once a reverse mapping is determined, apply the reverse mapping to the mapping file and the obfuscator will magically turn the re-obfuscated code into something a bit more useable - at least the API part. This potential didn't go unnoticed - the prize goes to Andreou Dimitris for emailing me back in Nov. 2006.
For me this was simply an intellectual challenge - a side battle in a hobby programmer's grand master plan. As a bonus i managed to improve my XPATH query skills a bit. It took me 2 full working weeks to get the initial version out. If I needed this once, then there's bound to be someone out there who might like to use it, for whatever reason. It probably will save them the 2 weeks too.
The full distribution zip contains all jar dependencies required for running codavaj. For convenience a 'codavaj.cmd' and 'codavaj.sh' shell script ( for Windows and Linux respectively ) is provided in the distribution for starting codavaj with it's many jar dependencies.
to download an entire javadoc tree for further processing use
to convert local javadoc tree into java source
Example without external links: codavaj.cmd codavaj tmp/jumpi/javadoc tmp/jumpi/src
Example with external links: codavaj.cmd codavaj tmp/jumpi/javadoc tmp/jumpi/src http://external.link.com/api/ http://external.link2.com/api/
Codavaj preserves all javadoc information including all fields, methods, constructors and their modifiers, constant values, class heirarchy and inner types.
You can use the following function in org.codavaj.Main to derive the reflection-like API:
/** * Derive a reflection-like API from a javadoc source tree. Resolve any type names * to external javadoc links. External links to Sun's JDK javadoc apis are * automatically resolved ( i.e. http://java.sun.com/j2se/X/docs/api/ ) * * @param javadocdir the javadoc tree root * @param externaLinks a list of 'http://..' strings representing external javadoc refs. * * @return a TypeFactory handle on the resulting api * @throws Exception any problem. */ public static TypeFactory analyze( String javadocdir, List externalLinks ) throws Exception { ProcessMonitor pm = new Main().new ProcessMonitor(); DocParser dp = new DocParser(); dp.setJavadocDirName(javadocdir); dp.setExternalLinks(externalLinks); dp.addProgressListener(pm); dp.process(); return dp.getTypeFactory(); }
Through the TypeFactory, you can get at Types ( Classes or Interface - both inner and outer ), and their respective Packages. A Type contains fields ( Field ), constructors and methods ( Method ), and inner types ( Type ). See the source code here for more information - and the SrcWriter class which traverses the Types to write java source files. Packages can be traversed for access to it's Types and sub Packages.