Lately I've been doing a bit of research to find out what kind of code metrics are commonly used to better understand the structure and health of a codebase, and what tools exist for visualizing those metrics. It's a pretty vast subject (I've probably only scratched the surface in my research), but I'll try to give a summary of my findings so far, and sketch out what I hope to tackle in this area.
I would like to build a tool to visualize the relational structure and informational flow in a large-scale iOS project. I'm a very visual thinker who likes to gain a big-picture understanding of things, which I find can be difficult to do when joining on a new project with a large codebase. It'd be very helpful if there were ready-made tools available for Objective C to visualize a call-graph or dependency matrix of the code, color-coded with metrics like lines-of-code, cyclomatic complexity, test-coverage, or modification activity, which would help identify hot spots of potential code smells and make better-informed iterative architectural decisions.
Given that a visualization tool for Objective C code structure and metrics doesn't exist (at least not in a form that I have in mind), I've begun to explore what it would take to build one. The first ingredient I'll need is a tool for parsing through code and generating the relational graphs I would like to visualize. The clang compiler has an API library in C called libclang that can be used to parse C/C++/Objective C code into an abstract syntax tree (AST) structure, as well as process such structures. There is also a convenient python binding for libclang (for a helpful reference, see this blog post).
So, the first step in creating a visualization tool is to use libclang to process all of the Objective C code in a project into a graph data structure (or dependency matrix). But what defines this structure? What are the nodes and links? Depending on the analysis, one could consider a node to be a file, a class, or perhaps even an object. A link corresponds to some kind of directional relation between nodes, such as when a file depends on another file, or a class calls a method in another class, or an object is injected into another object, either via constructor or method injection. I've begun to explore these different possibilities and likely more than one will turn out to be useful.
The next major step after building a relational structure will be to calculate various code metrics, such as lines of code (LOC), complexity, and code coverage, which can be incorporated via various graph element stylings, such as node size and color. Aside from the usual basic metrics, it would be interesting to consider ways to quantify properties such as code coupling and cohesion, within the source code and between source and tests, to get a sense of how flexible the code is to modification.
The final form of this tool will most likely be a D3.js driven interactive web page. I've come across some existing code that should serve as useful references, such as CodeFlower and DependencyWheel (which is similar to a Circos visualization). I'm also intrigued by the CodeCity project, which is based around a city metaphor, representing classes as buildings. I wonder how far one could take that metaphor, perhaps superimposing transit-like network structures to represent the flow of data through the system.