The Quantified Car

The Automatic device plugs into the diagnostics port of your car to track your driving habits by monitoring your trips, and gives you visual and audio feedback through the accompanying mobile app.

The Automatic device plugs into the diagnostics port of your car to track your driving habits by monitoring your trips, and gives you visual and audio feedback through the accompanying mobile app.

Earlier this month I began an offsite project at work, which means I'll be spending an hour commuting to and from work every day. I decided to finally get rid of my `99 VW Beetle, and replaced it with a new Subaru Crosstrek, which I'd had my eyes on for over a year. That very same day, I drove to the Apple Store to pick up an Automatic device to track my driving habits. I plugs into the diagnostics port of your car (typically located on the driver's side under the dashboard). The device interfaces with your car's onboard computer system, and can record data such as speed, gas mileage, hard breaking and hard acceleration. They provide a beautifully designed mobile app that shows you a summary of data for every trip alongside a map showing you the route obtained using the GPS in your phone.

The primary use of the device is to improve your fuel efficiency. Driving at speeds higher than around 60 MPH, as well as hard breaking and accelerating, all have a negative impact on the fuel efficiency of your car. So, by monitoring these parameters, and alerting you when you operate outside the optimal range, you can adapt your driving habits to better optimize your car's fuel consumption.  That's pretty cool, but for me personally, being a data nerd, I'm excited to be collecting this data (which should be accessible through their API) and look forward to playing with it down the road, so to speak.

Map Sandbox Project

A choropleth map showing crime incident levels in Pittsburgh census blocks. Original 2008 Pittsburgh incident data from GIS Tutorial for Crime Analysis; Census blocks from City of Pittsburgh.

Earlier this year I became interested in geospatial visualization and analysis, and so began a self-guided study of the field in my spare time, focussing on crime mapping. I recently kicked-off a project blog hosted on GitHub Pages to document my progress: http://jamieinfinity.github.io/mapsandbox.

I’ve always found the best way to learn a new topic or technology is to build something, a tool of some sort, that drives the learning process and provides a conceptual scaffolding upon which emerging concepts can grow. So I’ve been working on an idea for a mobile and/or web app for visualizing a side-by-side comparison of neighborhood livability metrics. My initial focus will be on crime statistics to keep it focussed initially, but what I’m building should be extensible to other kinds of socioeconomic attributes.

I’m doing this project ‘just for fun’, purely as an evening/weekend side-project. I’m drawn to this kind of project because there is a mix of problems to be solved that engage different parts of my brain: data wrangling/modeling/analytics, infographics and UI/UX design, and software engineering.  In the short term, I’ll embark on a series of exploratory spikes as I make my way through the fundamentals of geographical analytics and figure out how to best achieve the desired features. I’ve already begun working my way through various articles and books.

In parallel to the project blog, I’m also maintaining a GitHub repo of my code. Initially I’ll be tapping into the scientific computing platform Mathematica (recently rebranded as the Wolfram Language), which is my go-to tool for these kinds of projects (I worked at Wolfram for nearly six years before pivoting my career into mobile app development). In parallel I’ll be learning python, since there are so many open-source geo-processing tools available. As a side effect, it will be useful to have two distinct implementations to verify and validate as I go along.

Strange Loop Recap

Avi Bryant presenting at Strange Loop on using group theory to help streamline data aggregation (photo: J. Williams).

Back in September I attended the fifth annual Strange Loop conference here in St. Louis. Strange Loop, which has been called the "TED for programmers", is a tech conference for software developers covering a range of both applied and academic topics in computer science. The conference was founded in 2009 by Alex Miller (@puredanger), and has grown in attendance from around 300 people the first year to over 1100 people this year. 

Activities began on Wednesday, Sept. 18, at Union Station with a pre-conference day of hands-on workshops broken into three-hour-long morning and afternoon sessions. I attended workshops with a focus on Arduino and RaspberryPi hardware. There was also what has become a hallmark of Strange Loop - the Emerging Languages Camp (which I didn't attend), covering a range of new languages such as Gershwin and Noether. On Wednesday evening, time was reserved for the so-called 'Unsessions', which are sessions informally organized by attendees using a GitHub wiki page. I attended a presentation on data mining the source code of large-scale projects.

The regular conference took place at the Peabody Opera House on Thursday and Friday. Each day opened and closed with a single keynote in the main auditorium, and then proceeded with five or six hour-long presentations running in parallel throughout the day. Presentations were grouped into eight different tracks:  

  • Keynotes (4)
  • Languages (19)
  • Systems (12)
  • Web (10)
  • Fundamentals (7)
  • Tools (3)
  • Mobile (3)
  • /Etc (8)

Videos of the presentations have been made available in raw form at infoq.com for conference attendees, and will be made generally available there in a more polished form within the next few months.

There was also a conference party on Thursday evening at the City Museum, which is a unique synthesis of interactive explorable sculptures and architectural objects, one of St. Louis's most cherished cultural attractions (which happens to be a couple of blocks from my apartment downtown). The conference was concluded with a keynote by Douglas Hofstadter, followed by a related theatrical musical performance written by David Stutz titled “Thrown for a Loop: A Carnival of Consciousness”. It was entertaining. 

I enjoyed the diversity of topics, but found myself gravitating toward presentations on what I guess you could call meta programming: tools and techniques for analyzing code and improving the development process (marked below with a ★). Rather than give you a top-level synthesis, I list the schedule of sessions I attended with scant notes providing any useful links or nuggets gleaned from the presentations.

Pre-conference

- Multilingual RaspberryPi Cooking Class by Steve Chin (@steveonjava): For this workshop, Steve provided each participant with a hardware pack to use during the session (containing a RaspberryPi with a Pibow case, a Chalkboard Electronics touchscreen, and various connectors/adaptors). We essentially followed along through a tutorial available from Steve's blog. The Pi board came pre-installed with Linux and the Java 8 for ARM SDK

- Hardware Hacking For The Rest Of Us by Kipp Bradford (@kippworks): This was a workshop designed to understand the basics of an Arduino and XBee-based sensor network. For this workshop, participants had to purchase the hardware kit (available here; summary of parts here) if they wanted a hands-on experience building the project I didn't end up getting one, so I just followed along as a spectator.

- ★ Code Archeology by Paul Slusarz (@pslusarz), slides:  Paul shared results from a few different projects doing essentially data mining of a large codebase (over 1000 projects and 150000 files). He looked at metrics like lines of code per file, file connectivity (imports), and class connectivity (references). He showed a practical use case of simplifying a Java build system by pairing down the module dependency graph. He referenced the 'Code Archeology' podcast by Dave Thomas (see also this article). The topic of code quality arose, with mentions of tools such as SonarQube and TattleTale, as well as clone detection tools like CloneDigger. The analytics engine Splunk also looks relevant for code analytics.

Day One

- Machine Learning for Relevance and Serendipity by Jenny Finkle (@jrfinkel), video: She's the chief software architect at Prismatic, a news aggregation service. She reviewed some of the tools and techniques they've used in solving the challenges of developing their content recommendation system.

- ★ Visualization Driven Development (VDD) by Jason Gilman (@jasongilman), video: A fascinating talk about building and using tools to visualize code execution to help debug and better understand code logic. He presented a few examples, including visualizing a quick-sort algorithm. Jason has built a Clojure library with some core VDD tools. He was inspired by earlier IDE work by Bret Victor and Chris Granger (who also gave an interesting talk described below).

- Graph Computing at Scale by Matthias Broecheler (@mbroecheler), video: This was about building a scalable graph database, with a focus on the Titan system, on which Matthius is lead developer at Aurelius. Some of his sample queries to motivate a graph database model, such as 'the degree of the wife of the president of the U.S.' or 'the average term length of presidents since 1980' reminded me of work I and colleagues had done when I was a developer at Wolfram Alpha (e.g. try the query: 'population birthplace of steven spielberg') He also briefly described the Faunus graph analytics engine that can sit on top of Titan.

- The History of Women in Technology by Sarah Dutkiewicz (@sadukie), video: Nice talk reviewing the contributions of about a dozen or so women, including Ada Lovelace and Grace Murray Hopper.

- ★ How Does Text Become Data? by Catherine Havasi (@LuminosoInsight) and Rob Speer, video: This talk about mining textual data was one of my favorite sessions of the conference, the kind of talk that makes you want to run home and start geeking out with the tools they demonstrated (e.g. like the Python NLTK library and the ConceptNet API). If you are at all interested in this area, go check out the Python code they've made available on GitHub, which walks you through the examples they cover in the talk (on topics such as classification, document similarity and search). Also go have a look at the book 'Natural Language Processing with Python'.

- ★ Xiki: GUI and Text Interfaces are Converging by Craig Muth (@xiki), video: This talk kind of blew my mind. Xiki (pronounced 'Zik-ee'), is described as a 'shell console with GUI features', but that doesn't begin to convey the breadth of its functionality. Just go watch the video to witness its awesomeness as he steps through many different use cases. You can learn more at xiki.org and download the Ruby-based tool from the GitHub repo.

- Creative Machines by Joseph Wilk (@josephwilk), video: Joseph is a developer at SoundCloud and has an interest in algorithmic music creation. After giving a brief summary quantifying creativity and reviewing some past work in this area, such as the AARON program of Harold Cohen, he walked us through some of his own work generating music using the Clojure-based library Overtone (go check out his GitHub fork). A couple useful books he mentioned: 'Virtual Music: Computer Synthesis of Musical Style' and 'Computer Models of Creativity', both by David Cope.

- Redesigning the Interface: Making Software Development Make Sense to Everyone by Jen Myers (@antiheroine), video: An insiprational presentation that, at its core, was about designing more effective ways to educate and train people to code, and to make our field more accessible to those who have traditionally been met with disproportionate challenges. She's an instructor in Chicago at the DevBootcamp, and co-founder of the Girl Develop It chapter in Columbus, Ohio.

Day Two

- The Trouble With Types by Martin Odersky (@odersky), video: Great talk by the inventor of Scala, he started out with an overview of type systems, and then focussed in on his recent work on dependent object types (DOT) and briefly described an experimental language called Dotty based on those developments, which could pave the way for a future version of Scala. I really appreciated the point he makes early in the talk that a balance of constraints helps lead to better design, and that static typing can be tuned to provide the right level of constraints for architecting code.

- Add ALL the Things: Abstract Algebra Meets Analytics by Avi Bryant (@avibryant), video: Excellent talk walking through various practical applications of data aggregation calculations (e.g. summing, finding the maximum, computing the mean, etc) and illustrating how these common operations posses the group-theoretic properties of a commutative monoid (or an abelian group in some cases). By making this connection, one can codify these into a single abstraction that can result in a more efficient and extensible infrastructure. He references the Scala library by Twitter called Algebird (see also the related Summingbird), and an extension of it he wrote called Simmer. Also check out the recent Aggregate Knowledge data science talks, which have a similar theory/application flavor.

- Exercises in Style by Crista Lopes (@cristalopes), video: In fields such as art or literature, it's well accepted that there is an evolving set of distinct styles (e.g. impressionism, cubism, etc). As a source of inspiration, she takes the French book by Raymond Queneau titled 'Exercises in Style'. It's a collection of writings, each describing the same basic story, but written in different styles. She applies this same idea to the world of programming: define a single computational task - compute the term frequency of words in a body of text - and then implement it using different programming styles. If you are interested, she has made this collection available on GitHub. Wonderful presentation.

- Spanner - Google's Distributed Database by Sebastian Kanthak, video: Impressive technology underlying Google's Spanner infrastructure.

- ★ Thinking DSL's for Massive Visualization by Leo Meyerovich (@lmeyerov), video: Interesting talk about code synthesis and real-time big data visualizations. He showed some examples using the theorem prover Z3 and the Racket language. See also his homepage.

- ★ Finding a Way Out by Chris Granger (@ibdknox), video: Creator of the Light Table IDE, he motivates the need for a new data-driven approach to programming that is more direct and observable and then proceeds to demo a new project he's been working on called Aurora. As far as I can tell, it has a lot of similarities to what one can currently do within a Mathematica notebook using the Dynamic construct. 

 

Interactive maps with Leaflet

One of my current side projects involves visualizing crime activity in St. Louis and other cities. A key component to the project is being able to represent the density of crime incidents on a map, and so I've been exploring open source mapping API's. The best one I've come across so far is a Javascript library called Leaflet for displaying interactive map layers. The above is an example I created using an OpenStreetMap layer, with a popup marker I placed pointing to my favorite part of City Garden in St. Louis.  

My end-goal is to build an iOS app with an interactive elegant crime map. In the coming weeks, I'll use my blog as a testing ground as I explore the various features relevant to my project. The next thing I plan on doing is pulling in a GeoJSON dataset with block-level polygons color-coded by level of crime activity.

Tracking my heart health

In August I purchased a Withings blood pressure monitor. It's an inflatable arm band that connects with an iPhone and pairs with the Withings Health Mate app. It can take a reading of your systolic and diastolic blood pressure as well as your heart rate. 

So, for about the past three months I've been tracking my blood pressure and heart rate. I take three separate readings each morning and evening (so, six total).  In the two plots below, I show the results so far. The colored lines are mean values, while the gray shaded region shows the range between the minimum and maximum values (of the six values taken each day). I'm also applying a gaussian smoother (with a width of about 2 days). The plots were done using Mathematica by importing the raw data from the Withings app.

According to the American Heart Association, I'm treading into the realm of so-called prehypertension, which is pretty disconcerting. There are lifestyle changes one can make to help lower one's heart rate, like losing weight, exercising, and cutting back on alcohol intake. I've got plenty of room to improve on all three of these areas. Hopefully when I report back in three months I can show some improvement.