William Kennedy: Building an Analytics Engine using MongoDB and Go

Authored by sourcegraph

Update: Link to Bill’s slides
Update: Cute gopher video presented during Bill’s talk: 
http://vimeo.com/92375131.

Other GopherCon talk slides are being posted here by speakers: https://github.com/gophercon/2014-talks

William Kennedy @GoingGoDotNet is one of the authors of Go in Action (from Manning) and talked about his experiences building an analytics pipeline using Go, MongoDB, mgo, beego, and Iron.io.

Bill led with a video establishing the motivation for the problem he was working on. It featured 2 cute gophers: http://vimeo.com/92375131.

A while ago, Bill’s company was hired to build PowerWallet (similar to Mint). They soon realized that this product also gave them a lot of interesting consumer data: they knew, in aggregate, where and what people were buying. How could they analyze this data? And how could they present this in an actionable feed?

The search for a solution

The first version that Bill built used a SQL database. But whenever they wanted to make a new feed, they’d have to build a new table, populate it with data, etc. That would take a long time. This solution wasn’t working out for them.

They needed a system that could easily and quickly create dynamic feeds based on all of the data. They also wanted to be able to write rules to alter the overall content of a user’s feed quickly (for example, to create custom Valentine’s Day suggestions).

Their system needed to allow them to:

  • Write rules that can be updated and applied at runtime
  • Pass variables to filter and pinpoint relevance
  • Use data aggregation technique to filter and group data
  • Build tests around aggregated datasets
  • Build tests against multiple aggregated datasets
  • Publish data from offer/deal feed and other internal feeds

After evaluating a number of other tools, they settled on Go, Linux, MongoDB, beego, mgo, and Iron.io. (They choose beego over other Go Web frameworks because they liked its MVC architecture.)

They used a denormalized schema for their feed data and kept it updated using workers running tasks on Iron.io.

Walkthrough

Bill walked through the system he made and showed how it performs incremental updates. (Ed. note: This portion of the talk was heavy on screenshots and code, and we couldn’t transcribe it in a way that’d be helpful to our readers. We’ll link to the slides if/when they’re up.)

Q&A

Q: How well do you think this will scale? A: I went with MongoDB because of its scalability. It is scaling much better than the previous SQL database solution. Iron.io is also super helpful in allowing this system to scale easily.