java2graph is a high-performance, parallelized Command Line Interface (CLI) tool designed to parse Java source code and its compiled dependencies to generate a rich, queryable representation of the codebase. It extracts structural information (classes, interfaces, methods, lambdas) and behavioral relationships (inheritance, method definitions, method calls) and exports them simultaneously into CSV files and a highly-optimized embedded columnar graph database, LadybugDB.
This document outlines the system architecture, the data processing pipeline, the graph schema, and how to interact with the generated dataset.
The CLI is built using a Pass-Based, Multi-Threaded Pipeline Architecture. Because parsing and resolving large codebases can be extremely computationally expensive, the work is divided into distinct, isolated passes. The passes operate on a shared context object (GraphContext) using concurrent data structures (ConcurrentHashMap, ConcurrentLinkedQueue) to ensure safe parallel execution.
The core technologies driving the tool are:
- JavaParser & JavaSymbolSolver: For constructing Abstract Syntax Trees (ASTs) and performing semantic resolution (binding method calls to actual definitions, determining lambda signatures, and resolving inheritance across source files and JARs).
- Lombok (
delombok): For dynamic preprocessing of annotations to ensure accurate structural extraction without requiring the user to pre-compile the source with specific plugins. - LadybugDB: An embedded, serverless, columnar graph database optimized for complex analytical graph queries.
- Picocli: For a rich, POSIX-compliant command-line interface.
The execution flow of the application is orchestrated in the Main class, which initializes the Java2GraphConfig and GraphContext and then sequentially executes an array of passes:
Purpose: Preprocess source code to expand Lombok annotations (e.g., @Data, @Getter, @Builder) into standard Java boilerplate (getters, setters, constructors).
- Why? JavaParser operates on the raw AST. Without delomboking, implicit methods generated by Lombok are invisible to the parser, leading to unresolvable method calls downstream.
- How: The pass invokes the
lombok.launch.Mainengine via reflection (to bypass Java 11+ module encapsulation rules) and writes the expanded source code to a temporary directory. The application configuration's source directory is dynamically updated to point to this temporary directory for all subsequent passes.
Purpose: Convert raw Java source files into Abstract Syntax Trees (ASTs).
- How: It configures a
CombinedTypeSolverthat combines:ReflectionTypeSolver(for standard Java library classes).JavaParserTypeSolver(for the project's source code).JarTypeSolver(dynamically added for every.jarfile provided via the CLI arguments).
- Parallelism: Uses
Files.walkto find all.javafiles and processes them using aparallelStream(). The resultingCompilationUnitobjects are stored in the concurrentGraphContext.
Purpose: Traverse the ASTs to extract structural nodes and establish semantic edges (relationships) between them.
- How: It utilizes the Visitor Pattern (
VoidVisitorAdapter) to traverse everyCompilationUnit. For every relevant node (Classes, Methods, Object Creations, Method Calls), it invokes theJavaSymbolSolverto resolve the Fully Qualified Name (FQN). - Key extractions:
- Classes/Interfaces: Captures FQN, name, and raw declaration code. Detects
EXTENDSandIMPLEMENTSedges by resolving extended/implemented types. - Methods & Constructors: Captures signature, source code, and links them to their containing class.
- Lambdas: Dynamically generates unique IDs (based on line numbers and containing scopes) for lambda expressions, treating them as first-class methods in the graph.
- Method Calls: Resolves the caller's context and the target method's exact signature (even across JAR boundaries) to create a
MethodCallEdge.
- Classes/Interfaces: Captures FQN, name, and raw declaration code. Detects
Purpose: Persist the in-memory graph structures to disk.
- CSV Export: Uses Apache Commons CSV to quickly dump the raw nodes and edges into relational
.csvfiles for traditional data processing. - LadybugDB Export: Initializes an embedded Ladybug database. It defines a strict property graph schema (Node Tables and Rel Tables) and uses
PreparedStatements to efficiently batch-insert the nodes and edges from theGraphContextdirectly into the columnar storage engine.
The GraphContext relies on intermediate POJOs (ClassNode, MethodNode, InheritanceEdge, MethodCallEdge). When persisted to LadybugDB, these map directly to the following Cypher Schema:
1. Class Node
Represents a Java Class or Interface.
id(STRING) - Primary Key: The Fully Qualified Name (FQN).fqn(STRING): The Fully Qualified Name.name(STRING): The short name of the class.isInterface(BOOLEAN): True if it is an interface.declarationCode(STRING): The source code of the class declaration block.
2. Method Node
Represents a standard method, constructor, or lambda expression.
id(STRING) - Primary Key: The unique signature/FQN of the method.fqn(STRING): The unique signature/FQN.name(STRING): The short name of the method (or "lambda").signature(STRING): The method signature parameters.sourceCode(STRING): The full source code block of the method body.isLambda(BOOLEAN): True if the method is an extracted lambda expression.
Extends(Class -> Class): Indicates that the source Class extends the target Class.Implements(Class -> Class): Indicates that the source Class implements the target Interface.Defines(Class -> Method): Indicates that a Class encapsulates a specific Method.Calls(Method -> Method): Indicates that the source Method's body contains an invocation of the target Method.
Once the database is generated (e.g., in a directory named my-graph.db), you can connect to it using LadybugDB's CLI or client libraries to perform deep architectural analysis.
Here are a few examples of what you can discover:
1. Find all highly-coupled classes (God Objects) Find classes that define the most methods.
MATCH (c:Class)-[:Defines]->(m:Method)
RETURN c.name, COUNT(m) AS methodCount
ORDER BY methodCount DESC
LIMIT 10;2. Impact Analysis (Reverse Call Graph)
If I change the saveUser method in UserRepository, which other methods are directly or indirectly affected? (Using variable-length paths)
MATCH (caller:Method)-[:Calls*1..3]->(target:Method {name: 'saveUser'})
RETURN caller.fqn, target.fqn;3. Find Unused / Dead Code (Orphan Methods) Find methods that are defined, are not constructors or lambdas, and are never called by any other method in the analyzed codebase.
MATCH (c:Class)-[:Defines]->(m:Method)
WHERE NOT ()-[:Calls]->(m) AND m.name <> c.name AND m.isLambda = false
RETURN m.fqn;4. Interface Implementation Discovery
Find all concrete classes that implement java.io.Serializable and count how many methods they define.
MATCH (c:Class)-[:Implements]->(i:Class {name: 'Serializable'}), (c)-[:Defines]->(m:Method)
RETURN c.name, COUNT(m) AS definedMethods;Because parsing requires significant computational resources and users may not have a compatible JVM installed, java2graph is distributed as a Zero-Dependency Native Executable.
- Maven Fat Jar: The project is compiled into a single
jar-with-dependenciescontaining all libraries (JavaParser, LadybugDB native bindings, Picocli). jdeps: Analyzes the Fat Jar to dynamically determine exactly which JDK modules (e.g.,java.base,java.compiler,java.sql) are required.jlink: Strips out the vast majority of the JVM, creating a custom, highly-compressed, minimal Java Runtime Environment (JRE) tailored exclusively for this CLI.jpackage: Bundles the Fat Jar and the custom minimal JRE into a standalone native application image (e.g.,.appfor macOS,.exefor Windows, or an ELF binary for Linux).
The repository contains a .github/workflows/release.yml workflow. Upon pushing to the main branch, GitHub Actions will:
- Spin up instances of Ubuntu, macOS, and Windows.
- Build the Maven project.
- Execute
jlinkandjpackagenatively on each OS. - Zip and upload the cross-platform native binaries as build artifacts.
This guarantees that a user can simply download java2graph-Windows.zip or java2graph-macos.tar.gz, extract it, and run the tool immediately from their terminal without installing Java.