Map Sandbox Project

A choropleth map showing crime incident levels in Pittsburgh census blocks. Original 2008 Pittsburgh incident data from GIS Tutorial for Crime Analysis; Census blocks from City of Pittsburgh.

Earlier this year I became interested in geospatial visualization and analysis, and so began a self-guided study of the field in my spare time, focussing on crime mapping. I recently kicked-off a project blog hosted on GitHub Pages to document my progress:

I’ve always found the best way to learn a new topic or technology is to build something, a tool of some sort, that drives the learning process and provides a conceptual scaffolding upon which emerging concepts can grow. So I’ve been working on an idea for a mobile and/or web app for visualizing a side-by-side comparison of neighborhood livability metrics. My initial focus will be on crime statistics to keep it focussed initially, but what I’m building should be extensible to other kinds of socioeconomic attributes.

I’m doing this project ‘just for fun’, purely as an evening/weekend side-project. I’m drawn to this kind of project because there is a mix of problems to be solved that engage different parts of my brain: data wrangling/modeling/analytics, infographics and UI/UX design, and software engineering.  In the short term, I’ll embark on a series of exploratory spikes as I make my way through the fundamentals of geographical analytics and figure out how to best achieve the desired features. I’ve already begun working my way through various articles and books.

In parallel to the project blog, I’m also maintaining a GitHub repo of my code. Initially I’ll be tapping into the scientific computing platform Mathematica (recently rebranded as the Wolfram Language), which is my go-to tool for these kinds of projects (I worked at Wolfram for nearly six years before pivoting my career into mobile app development). In parallel I’ll be learning python, since there are so many open-source geo-processing tools available. As a side effect, it will be useful to have two distinct implementations to verify and validate as I go along.

Strange Loop Recap

Avi Bryant presenting at Strange Loop on using group theory to help streamline data aggregation (photo: J. Williams).

Back in September I attended the fifth annual Strange Loop conference here in St. Louis. Strange Loop, which has been called the "TED for programmers", is a tech conference for software developers covering a range of both applied and academic topics in computer science. The conference was founded in 2009 by Alex Miller (@puredanger), and has grown in attendance from around 300 people the first year to over 1100 people this year. 

Activities began on Wednesday, Sept. 18, at Union Station with a pre-conference day of hands-on workshops broken into three-hour-long morning and afternoon sessions. I attended workshops with a focus on Arduino and RaspberryPi hardware. There was also what has become a hallmark of Strange Loop - the Emerging Languages Camp (which I didn't attend), covering a range of new languages such as Gershwin and Noether. On Wednesday evening, time was reserved for the so-called 'Unsessions', which are sessions informally organized by attendees using a GitHub wiki page. I attended a presentation on data mining the source code of large-scale projects.

The regular conference took place at the Peabody Opera House on Thursday and Friday. Each day opened and closed with a single keynote in the main auditorium, and then proceeded with five or six hour-long presentations running in parallel throughout the day. Presentations were grouped into eight different tracks:  

  • Keynotes (4)
  • Languages (19)
  • Systems (12)
  • Web (10)
  • Fundamentals (7)
  • Tools (3)
  • Mobile (3)
  • /Etc (8)

Videos of the presentations have been made available in raw form at for conference attendees, and will be made generally available there in a more polished form within the next few months.

There was also a conference party on Thursday evening at the City Museum, which is a unique synthesis of interactive explorable sculptures and architectural objects, one of St. Louis's most cherished cultural attractions (which happens to be a couple of blocks from my apartment downtown). The conference was concluded with a keynote by Douglas Hofstadter, followed by a related theatrical musical performance written by David Stutz titled “Thrown for a Loop: A Carnival of Consciousness”. It was entertaining. 

I enjoyed the diversity of topics, but found myself gravitating toward presentations on what I guess you could call meta programming: tools and techniques for analyzing code and improving the development process (marked below with a ★). Rather than give you a top-level synthesis, I list the schedule of sessions I attended with scant notes providing any useful links or nuggets gleaned from the presentations.


- Multilingual RaspberryPi Cooking Class by Steve Chin (@steveonjava): For this workshop, Steve provided each participant with a hardware pack to use during the session (containing a RaspberryPi with a Pibow case, a Chalkboard Electronics touchscreen, and various connectors/adaptors). We essentially followed along through a tutorial available from Steve's blog. The Pi board came pre-installed with Linux and the Java 8 for ARM SDK

- Hardware Hacking For The Rest Of Us by Kipp Bradford (@kippworks): This was a workshop designed to understand the basics of an Arduino and XBee-based sensor network. For this workshop, participants had to purchase the hardware kit (available here; summary of parts here) if they wanted a hands-on experience building the project I didn't end up getting one, so I just followed along as a spectator.

- ★ Code Archeology by Paul Slusarz (@pslusarz), slides:  Paul shared results from a few different projects doing essentially data mining of a large codebase (over 1000 projects and 150000 files). He looked at metrics like lines of code per file, file connectivity (imports), and class connectivity (references). He showed a practical use case of simplifying a Java build system by pairing down the module dependency graph. He referenced the 'Code Archeology' podcast by Dave Thomas (see also this article). The topic of code quality arose, with mentions of tools such as SonarQube and TattleTale, as well as clone detection tools like CloneDigger. The analytics engine Splunk also looks relevant for code analytics.

Day One

- Machine Learning for Relevance and Serendipity by Jenny Finkle (@jrfinkel), video: She's the chief software architect at Prismatic, a news aggregation service. She reviewed some of the tools and techniques they've used in solving the challenges of developing their content recommendation system.

- ★ Visualization Driven Development (VDD) by Jason Gilman (@jasongilman), video: A fascinating talk about building and using tools to visualize code execution to help debug and better understand code logic. He presented a few examples, including visualizing a quick-sort algorithm. Jason has built a Clojure library with some core VDD tools. He was inspired by earlier IDE work by Bret Victor and Chris Granger (who also gave an interesting talk described below).

- Graph Computing at Scale by Matthias Broecheler (@mbroecheler), video: This was about building a scalable graph database, with a focus on the Titan system, on which Matthius is lead developer at Aurelius. Some of his sample queries to motivate a graph database model, such as 'the degree of the wife of the president of the U.S.' or 'the average term length of presidents since 1980' reminded me of work I and colleagues had done when I was a developer at Wolfram Alpha (e.g. try the query: 'population birthplace of steven spielberg') He also briefly described the Faunus graph analytics engine that can sit on top of Titan.

- The History of Women in Technology by Sarah Dutkiewicz (@sadukie), video: Nice talk reviewing the contributions of about a dozen or so women, including Ada Lovelace and Grace Murray Hopper.

- ★ How Does Text Become Data? by Catherine Havasi (@LuminosoInsight) and Rob Speer, video: This talk about mining textual data was one of my favorite sessions of the conference, the kind of talk that makes you want to run home and start geeking out with the tools they demonstrated (e.g. like the Python NLTK library and the ConceptNet API). If you are at all interested in this area, go check out the Python code they've made available on GitHub, which walks you through the examples they cover in the talk (on topics such as classification, document similarity and search). Also go have a look at the book 'Natural Language Processing with Python'.

- ★ Xiki: GUI and Text Interfaces are Converging by Craig Muth (@xiki), video: This talk kind of blew my mind. Xiki (pronounced 'Zik-ee'), is described as a 'shell console with GUI features', but that doesn't begin to convey the breadth of its functionality. Just go watch the video to witness its awesomeness as he steps through many different use cases. You can learn more at and download the Ruby-based tool from the GitHub repo.

- Creative Machines by Joseph Wilk (@josephwilk), video: Joseph is a developer at SoundCloud and has an interest in algorithmic music creation. After giving a brief summary quantifying creativity and reviewing some past work in this area, such as the AARON program of Harold Cohen, he walked us through some of his own work generating music using the Clojure-based library Overtone (go check out his GitHub fork). A couple useful books he mentioned: 'Virtual Music: Computer Synthesis of Musical Style' and 'Computer Models of Creativity', both by David Cope.

- Redesigning the Interface: Making Software Development Make Sense to Everyone by Jen Myers (@antiheroine), video: An insiprational presentation that, at its core, was about designing more effective ways to educate and train people to code, and to make our field more accessible to those who have traditionally been met with disproportionate challenges. She's an instructor in Chicago at the DevBootcamp, and co-founder of the Girl Develop It chapter in Columbus, Ohio.

Day Two

- The Trouble With Types by Martin Odersky (@odersky), video: Great talk by the inventor of Scala, he started out with an overview of type systems, and then focussed in on his recent work on dependent object types (DOT) and briefly described an experimental language called Dotty based on those developments, which could pave the way for a future version of Scala. I really appreciated the point he makes early in the talk that a balance of constraints helps lead to better design, and that static typing can be tuned to provide the right level of constraints for architecting code.

- Add ALL the Things: Abstract Algebra Meets Analytics by Avi Bryant (@avibryant), video: Excellent talk walking through various practical applications of data aggregation calculations (e.g. summing, finding the maximum, computing the mean, etc) and illustrating how these common operations posses the group-theoretic properties of a commutative monoid (or an abelian group in some cases). By making this connection, one can codify these into a single abstraction that can result in a more efficient and extensible infrastructure. He references the Scala library by Twitter called Algebird (see also the related Summingbird), and an extension of it he wrote called Simmer. Also check out the recent Aggregate Knowledge data science talks, which have a similar theory/application flavor.

- Exercises in Style by Crista Lopes (@cristalopes), video: In fields such as art or literature, it's well accepted that there is an evolving set of distinct styles (e.g. impressionism, cubism, etc). As a source of inspiration, she takes the French book by Raymond Queneau titled 'Exercises in Style'. It's a collection of writings, each describing the same basic story, but written in different styles. She applies this same idea to the world of programming: define a single computational task - compute the term frequency of words in a body of text - and then implement it using different programming styles. If you are interested, she has made this collection available on GitHub. Wonderful presentation.

- Spanner - Google's Distributed Database by Sebastian Kanthak, video: Impressive technology underlying Google's Spanner infrastructure.

- ★ Thinking DSL's for Massive Visualization by Leo Meyerovich (@lmeyerov), video: Interesting talk about code synthesis and real-time big data visualizations. He showed some examples using the theorem prover Z3 and the Racket language. See also his homepage.

- ★ Finding a Way Out by Chris Granger (@ibdknox), video: Creator of the Light Table IDE, he motivates the need for a new data-driven approach to programming that is more direct and observable and then proceeds to demo a new project he's been working on called Aurora. As far as I can tell, it has a lot of similarities to what one can currently do within a Mathematica notebook using the Dynamic construct. 


Interactive maps with Leaflet

One of my current side projects involves visualizing crime activity in St. Louis and other cities. A key component to the project is being able to represent the density of crime incidents on a map, and so I've been exploring open source mapping API's. The best one I've come across so far is a Javascript library called Leaflet for displaying interactive map layers. The above is an example I created using an OpenStreetMap layer, with a popup marker I placed pointing to my favorite part of City Garden in St. Louis.  

My end-goal is to build an iOS app with an interactive elegant crime map. In the coming weeks, I'll use my blog as a testing ground as I explore the various features relevant to my project. The next thing I plan on doing is pulling in a GeoJSON dataset with block-level polygons color-coded by level of crime activity.

Tracking my heart health

In August I purchased a Withings blood pressure monitor. It's an inflatable arm band that connects with an iPhone and pairs with the Withings Health Mate app. It can take a reading of your systolic and diastolic blood pressure as well as your heart rate. 

So, for about the past three months I've been tracking my blood pressure and heart rate. I take three separate readings each morning and evening (so, six total).  In the two plots below, I show the results so far. The colored lines are mean values, while the gray shaded region shows the range between the minimum and maximum values (of the six values taken each day). I'm also applying a gaussian smoother (with a width of about 2 days). The plots were done using Mathematica by importing the raw data from the Withings app.

According to the American Heart Association, I'm treading into the realm of so-called prehypertension, which is pretty disconcerting. There are lifestyle changes one can make to help lower one's heart rate, like losing weight, exercising, and cutting back on alcohol intake. I've got plenty of room to improve on all three of these areas. Hopefully when I report back in three months I can show some improvement.


Unlocking the power of D3.js

A couple of months ago, I attended a workshop on D3.js at the St. Louis Machine Learning & Data Science meetup, which got me really excited to start playing around with it. It's a Javascript library for building interactive data visualizations on the web. I'm only now getting around to exploring this incredible tool and wanted to try to get a visualization to work within my Squarespace blog. I came across a helpful blog post by Toke Frello that explained (mostly) how to do it, which turns out to be rather straightforward.

In the above demo (you can grab the nodes and move them around), I'm using an example taken from the book 'Interactive Data Visualization for the Web' by Scott Murray (Ch. 11, ex 4 to be specific), which I first learned about through the workshop. The key steps to get it to work properly within my blog were:

  1. Upload the d3.js library file, and then add the following to the 'Code Injection' section under advanced 'Blog Settings': <script src="/s/d3.js"></script>. You may also want to inject css styling there as well. This step only has to be done once, obviously.
  2. Create a code block in your blog post. Start the block with a <div> tag that you can reference in the javascript code, e.g. <div class="mydiv"></div>, and then paste in the visualization code, i.e. the entire <script type="text/javascript"> element. 
  3. The key trick was to change the"body") to".mydiv") , which will then place the svg element within the custom div tag.

That's all there is to it.  I'm excited to start creating my own visualizations!

Move to the Mound City

View of St. Louis from the Cahokia Mounds.

View of St. Louis from the Cahokia Mounds.

Back in February of this year I moved to Saint Louis, MO from Champaign, IL where I had been living for about six years. Over the course of my adult life I have found myself uprooting and journeying to a new location in pursuit of new career opportunities: from Iowa City, IA (5 yrs) to Boulder, CO (5 yrs) to Toronto, ON, Canada (2 yrs) to Washington, D.C. (yrs) to Champaign, IL (6 yrs) to St. Louis, MO. My new position is with the agile-based software company Asynchrony in downtown St. Louis, doing mainly iOS development, but also getting the opportunity to learn Java alongside seasoned veterans.

More than six months later, I'm feeling very much at home in this city, which has so much to offer in terms of personal growth (a wide array of tech meetup groups and a very active entrepreneurial community) and entertainment (restaurants, breweries, festivals, concerts, sports, etc). I'm also finding working at Asynchrony stimulating, and I appreciate their insistence on a work/life balance, which allows me to pursue projects in my spare time. 

Open Data STL

Last month a new meetup group called Open Data STL was launched with the goal of bringing developers and local civic groups together to facilitate providing citizens access to local civic/government data and to aid in the development of tools to make use of that data.  The group meets monthly at the T-Rex tech incubator space.

For the second meeting, I prepared a few slides to try to help gain some insight into existing open data initiatives of other cities from across the United States. It turns out that most cities seem to be using the Socrata open data platform, and so I focussed my attention on exploring civic data sets in their repository.  

The Story of an App

Earlier this year, I had the opportunity to speak at the 2013 Mobile Development Day  conference at the University of Illinois Research Park. I gave a presentation about the development process we used at mpressInteractive to build the Spanish-English Dictionary iPhone app for the University of Chicago Press. It was a multi-session conference with over forty speakers covering technical, UI/UX, and entrepreneurial topics on mobile development. You can find a write-up about it in the Daily Illini here and a photo album here from the Research Park Facebook page.

In my presentation I try to give a broad over-arching picture of the development process that unfolded in the summer of 2012. After giving some background on the saturated app space for multilingual dictionaries, I then describe our collaborative workflow with the Un. of Chicago Press, which had many elements of an agile development process, such as short iterations and regular interaction with our client to get feedback. I also cover many of the design and software engineering challenges we solved in order to deliver the app in just under four months. You can find the app on the Apple app store. I think it's one of the better Spanish-English dictionary apps available.

My first hackathon

Last month (6/1 & 6/2) I participated in a two day hackathon at the T-Rex tech incubator in downtown St. Louis. It was part of the nation-wide National Day of Civic Hacking, and in collaboration with Random Hacks of Kindness. The goal was to bring local developers and civic leaders/citizens together to identify specific problems facing the city, and then form teams to work on building a tool to help solve the problem, with a focus on making use of open data. You can find a write-up about the event here, as well as coverage in the local KSDK TV news here.

The event was kicked off by the main organizers Drew Winship (CEO of Juristat) and Jon Leek (of IDC Projects), who outlined the plan for the weekend (see also  @HackforStL).  We first gathered into groups to brainstorm possible projects, and then voted on which ones to pursue and then formed project teams. I had gone in hoping to do a project related to crime data, and it turned out there were several other developers who also were interested in giving better access and insight to local neighborhood data.

The core idea we came up with was to build a tool to let citizens better understand the general livability profile of their neighborhoods. We would do this by aggregating various types of data, such as crime stats, socio-economic indicators, and proximity data (i.e. how close is the nearest grocery store, park, cafe, etc), and then assign block-level regions a normalized set of values according to how prominent a given feature is for that region. This then would allow us to cluster regions together based on this livability profile. Our short term objective for the weekend was to build a simple prototype web app focusing only on crime data, while our longer term goal is to develop an extensible data aggregation framework that could support a larger app ecosystem. We made a valiant effort, but didn't get quite as far as we'd hoped. I think we did a great job, given that none of us had ever met before and we had no pre-conceived project plan before starting.

At the end of the weekend, the several participating teams gathered together to present our projects and results. All of the teams did great work, but the winning team really stood out, executing really well on a creative idea to try to make it much easier for homeless persons to successfully find shelter and assistance. The team has since created a web page:

SXSWi 2013 Recap

 Back in March (3/7 - 3/13) I had a great time attending SXSW Interactive for the second year in a row (last year I wrote a short blog post as well). A couple of weeks ago I gave a 'lunch & learn' presentation at work giving an overview of SXSW and summarizing the highlights of my trip. For me, SXSW is all about broadening my perspective and exploring new ideas, and although it's somewhat of an overwhelming firehose of information to try to grok, I always come away from the conference feeling energized and inspired. Many of the sessions were recorded and are available as audio streams from SoundCloud here).

This year I found the sessions/events I attended to fall roughly into one of three main recurring themes: 'Hardware is the New Software', 'Harnessing Big Data', and 'Collaboration/Innovation'.  Here is a categorized list of the sessions I attended (marked with a ☆), along with some other interesting ones I couldn't make it to, with links to slide decks, video/audio recordings, etc.

  Hardware is the New Software

 3D Printing


Internet of Things 

Wearable Tech 

  Harnessing Big Data

Quantified Self 

Data Science 

Personalized Health 

Collaboration / Innovation 

Lean / Agile  (Lean Startup mini-conference)


Sharing Economy