Strange Loop Recap

Avi Bryant presenting at Strange Loop on using group theory to help streamline data aggregation (photo: J. Williams).

Back in September I attended the fifth annual Strange Loop conference here in St. Louis. Strange Loop, which has been called the "TED for programmers", is a tech conference for software developers covering a range of both applied and academic topics in computer science. The conference was founded in 2009 by Alex Miller (@puredanger), and has grown in attendance from around 300 people the first year to over 1100 people this year. 

Activities began on Wednesday, Sept. 18, at Union Station with a pre-conference day of hands-on workshops broken into three-hour-long morning and afternoon sessions. I attended workshops with a focus on Arduino and RaspberryPi hardware. There was also what has become a hallmark of Strange Loop - the Emerging Languages Camp (which I didn't attend), covering a range of new languages such as Gershwin and Noether. On Wednesday evening, time was reserved for the so-called 'Unsessions', which are sessions informally organized by attendees using a GitHub wiki page. I attended a presentation on data mining the source code of large-scale projects.

The regular conference took place at the Peabody Opera House on Thursday and Friday. Each day opened and closed with a single keynote in the main auditorium, and then proceeded with five or six hour-long presentations running in parallel throughout the day. Presentations were grouped into eight different tracks:  

  • Keynotes (4)
  • Languages (19)
  • Systems (12)
  • Web (10)
  • Fundamentals (7)
  • Tools (3)
  • Mobile (3)
  • /Etc (8)

Videos of the presentations have been made available in raw form at infoq.com for conference attendees, and will be made generally available there in a more polished form within the next few months.

There was also a conference party on Thursday evening at the City Museum, which is a unique synthesis of interactive explorable sculptures and architectural objects, one of St. Louis's most cherished cultural attractions (which happens to be a couple of blocks from my apartment downtown). The conference was concluded with a keynote by Douglas Hofstadter, followed by a related theatrical musical performance written by David Stutz titled “Thrown for a Loop: A Carnival of Consciousness”. It was entertaining. 

I enjoyed the diversity of topics, but found myself gravitating toward presentations on what I guess you could call meta programming: tools and techniques for analyzing code and improving the development process (marked below with a ★). Rather than give you a top-level synthesis, I list the schedule of sessions I attended with scant notes providing any useful links or nuggets gleaned from the presentations.

Pre-conference

- Multilingual RaspberryPi Cooking Class by Steve Chin (@steveonjava): For this workshop, Steve provided each participant with a hardware pack to use during the session (containing a RaspberryPi with a Pibow case, a Chalkboard Electronics touchscreen, and various connectors/adaptors). We essentially followed along through a tutorial available from Steve's blog. The Pi board came pre-installed with Linux and the Java 8 for ARM SDK

- Hardware Hacking For The Rest Of Us by Kipp Bradford (@kippworks): This was a workshop designed to understand the basics of an Arduino and XBee-based sensor network. For this workshop, participants had to purchase the hardware kit (available here; summary of parts here) if they wanted a hands-on experience building the project I didn't end up getting one, so I just followed along as a spectator.

- ★ Code Archeology by Paul Slusarz (@pslusarz), slides:  Paul shared results from a few different projects doing essentially data mining of a large codebase (over 1000 projects and 150000 files). He looked at metrics like lines of code per file, file connectivity (imports), and class connectivity (references). He showed a practical use case of simplifying a Java build system by pairing down the module dependency graph. He referenced the 'Code Archeology' podcast by Dave Thomas (see also this article). The topic of code quality arose, with mentions of tools such as SonarQube and TattleTale, as well as clone detection tools like CloneDigger. The analytics engine Splunk also looks relevant for code analytics.

Day One

- Machine Learning for Relevance and Serendipity by Jenny Finkle (@jrfinkel), video: She's the chief software architect at Prismatic, a news aggregation service. She reviewed some of the tools and techniques they've used in solving the challenges of developing their content recommendation system.

- ★ Visualization Driven Development (VDD) by Jason Gilman (@jasongilman), video: A fascinating talk about building and using tools to visualize code execution to help debug and better understand code logic. He presented a few examples, including visualizing a quick-sort algorithm. Jason has built a Clojure library with some core VDD tools. He was inspired by earlier IDE work by Bret Victor and Chris Granger (who also gave an interesting talk described below).

- Graph Computing at Scale by Matthias Broecheler (@mbroecheler), video: This was about building a scalable graph database, with a focus on the Titan system, on which Matthius is lead developer at Aurelius. Some of his sample queries to motivate a graph database model, such as 'the degree of the wife of the president of the U.S.' or 'the average term length of presidents since 1980' reminded me of work I and colleagues had done when I was a developer at Wolfram Alpha (e.g. try the query: 'population birthplace of steven spielberg') He also briefly described the Faunus graph analytics engine that can sit on top of Titan.

- The History of Women in Technology by Sarah Dutkiewicz (@sadukie), video: Nice talk reviewing the contributions of about a dozen or so women, including Ada Lovelace and Grace Murray Hopper.

- ★ How Does Text Become Data? by Catherine Havasi (@LuminosoInsight) and Rob Speer, video: This talk about mining textual data was one of my favorite sessions of the conference, the kind of talk that makes you want to run home and start geeking out with the tools they demonstrated (e.g. like the Python NLTK library and the ConceptNet API). If you are at all interested in this area, go check out the Python code they've made available on GitHub, which walks you through the examples they cover in the talk (on topics such as classification, document similarity and search). Also go have a look at the book 'Natural Language Processing with Python'.

- ★ Xiki: GUI and Text Interfaces are Converging by Craig Muth (@xiki), video: This talk kind of blew my mind. Xiki (pronounced 'Zik-ee'), is described as a 'shell console with GUI features', but that doesn't begin to convey the breadth of its functionality. Just go watch the video to witness its awesomeness as he steps through many different use cases. You can learn more at xiki.org and download the Ruby-based tool from the GitHub repo.

- Creative Machines by Joseph Wilk (@josephwilk), video: Joseph is a developer at SoundCloud and has an interest in algorithmic music creation. After giving a brief summary quantifying creativity and reviewing some past work in this area, such as the AARON program of Harold Cohen, he walked us through some of his own work generating music using the Clojure-based library Overtone (go check out his GitHub fork). A couple useful books he mentioned: 'Virtual Music: Computer Synthesis of Musical Style' and 'Computer Models of Creativity', both by David Cope.

- Redesigning the Interface: Making Software Development Make Sense to Everyone by Jen Myers (@antiheroine), video: An insiprational presentation that, at its core, was about designing more effective ways to educate and train people to code, and to make our field more accessible to those who have traditionally been met with disproportionate challenges. She's an instructor in Chicago at the DevBootcamp, and co-founder of the Girl Develop It chapter in Columbus, Ohio.

Day Two

- The Trouble With Types by Martin Odersky (@odersky), video: Great talk by the inventor of Scala, he started out with an overview of type systems, and then focussed in on his recent work on dependent object types (DOT) and briefly described an experimental language called Dotty based on those developments, which could pave the way for a future version of Scala. I really appreciated the point he makes early in the talk that a balance of constraints helps lead to better design, and that static typing can be tuned to provide the right level of constraints for architecting code.

- Add ALL the Things: Abstract Algebra Meets Analytics by Avi Bryant (@avibryant), video: Excellent talk walking through various practical applications of data aggregation calculations (e.g. summing, finding the maximum, computing the mean, etc) and illustrating how these common operations posses the group-theoretic properties of a commutative monoid (or an abelian group in some cases). By making this connection, one can codify these into a single abstraction that can result in a more efficient and extensible infrastructure. He references the Scala library by Twitter called Algebird (see also the related Summingbird), and an extension of it he wrote called Simmer. Also check out the recent Aggregate Knowledge data science talks, which have a similar theory/application flavor.

- Exercises in Style by Crista Lopes (@cristalopes), video: In fields such as art or literature, it's well accepted that there is an evolving set of distinct styles (e.g. impressionism, cubism, etc). As a source of inspiration, she takes the French book by Raymond Queneau titled 'Exercises in Style'. It's a collection of writings, each describing the same basic story, but written in different styles. She applies this same idea to the world of programming: define a single computational task - compute the term frequency of words in a body of text - and then implement it using different programming styles. If you are interested, she has made this collection available on GitHub. Wonderful presentation.

- Spanner - Google's Distributed Database by Sebastian Kanthak, video: Impressive technology underlying Google's Spanner infrastructure.

- ★ Thinking DSL's for Massive Visualization by Leo Meyerovich (@lmeyerov), video: Interesting talk about code synthesis and real-time big data visualizations. He showed some examples using the theorem prover Z3 and the Racket language. See also his homepage.

- ★ Finding a Way Out by Chris Granger (@ibdknox), video: Creator of the Light Table IDE, he motivates the need for a new data-driven approach to programming that is more direct and observable and then proceeds to demo a new project he's been working on called Aurora. As far as I can tell, it has a lot of similarities to what one can currently do within a Mathematica notebook using the Dynamic construct.