Codegraphy Project

Examples of relational data visualizations: CodeFlower is a D3.js module for drawing a file dependency graph; CodeCity represents classes as buildings and packages as districts laid out in a grid, color-coded with code metrics; and Circos is a tool for visualizing annotated relational data laid out on a circle.

Examples of relational data visualizations: CodeFlower is a D3.js module for drawing a file dependency graph; CodeCity represents classes as buildings and packages as districts laid out in a grid, color-coded with code metrics; and Circos is a tool for visualizing annotated relational data laid out on a circle.

Lately I've been doing a bit of research to find out what kind of code metrics are commonly used to better understand the structure and health of a codebase, and what tools exist for visualizing those metrics. It's a pretty vast subject (I've probably only scratched the surface in my research), but I'll try to give a summary of my findings so far, and sketch out what I hope to tackle in this area.

I would like to build a tool to visualize the relational structure and informational flow in a large-scale iOS project. I'm a very visual thinker who likes to gain a big-picture understanding of things, which I find can be difficult to do when joining on a new project with a large codebase. It'd be very helpful if there were ready-made tools available for Objective C to visualize a call-graph or dependency matrix of the code, color-coded with metrics like lines-of-code, cyclomatic complexity, test-coverage, or modification activity, which would help identify hot spots of potential code smells and make better-informed iterative architectural decisions. 

Although code metric and visualization tools do exist for statically typed languages like C# (e.g. Visual StudioNDepend) and Java (e.g. Sonargraph, JArchitect), as well as dynamically typed languages like python (e.g. Radon, Python Call Graph), Ruby (e.g. Code Climate), and Javascript (e.g. JSComplexity, Code Climate), there seems to be a dearth of such tools in the land of iOS (see this Wikipedia page for a list of static code analyzers, which these tools are typically built upon). For Objective C I have unearthed a couple of tools that look worth investigating further: SonarQube is a multi-language platform for managing code quality that has an Objective C plugin. I also came across a blog post that describes how to set up iOS code metrics in Jenkins. There is also this python script for generating an import dependency graph.

Given that a visualization tool for Objective C code structure and metrics doesn't exist (at least not in a form that I have in mind), I've begun to explore what it would take to build one. The first ingredient I'll need is a tool for parsing through code and generating the relational graphs I would like to visualize. The clang compiler has an API library in C called libclang that can be used to parse C/C++/Objective C code into an abstract syntax tree (AST) structure, as well as process such structures. There is also a convenient python binding for libclang (for a helpful reference, see this blog post).

So, the first step in creating a visualization tool is to use libclang to process all of the Objective C code in a project into a graph data structure (or dependency matrix). But what defines this structure? What are the nodes and links? Depending on the analysis, one could consider a node to be a file, a class, or perhaps even an object. A link corresponds to some kind of directional relation between nodes, such as when a file depends on another file, or a class calls a method in another class, or an object is injected into another object, either via constructor or method injection. I've begun to explore these different possibilities and likely more than one will turn out to be useful.

The next major step after building a relational structure will be to calculate various code metrics, such as lines of code (LOC), complexity, and code coverage, which can be incorporated via various graph element stylings, such as node size and color. Aside from the usual basic metrics, it would be interesting to consider ways to quantify properties such as code coupling and cohesion, within the source code and between source and tests, to get a sense of how flexible the code is to modification. 

The final form of this tool will most likely be a D3.js driven interactive web page. I've come across some existing code that should serve as useful references, such as CodeFlower and DependencyWheel (which is similar to a Circos visualization). I'm also intrigued by the CodeCity project, which is based around a city metaphor, representing classes as buildings. I wonder how far one could take that metaphor, perhaps superimposing transit-like network structures to represent the flow of data through the system.