Updates

[Reading Time: 5mn]

Time flies. And good things mature.

Looking at a couple of posts from a few years back, I see that the ones on outlier detection with Clojure and Incanter are still very valuable and a frequent read if I believe the blog stats.

But they are way out of date regarding the libraries I have used at the time. They too have matured quite a bit, which means that the Clojure ecosystem is as strong (of not stronger) as ever. I have updated the post slightly, and the code heavily, so you might want to check them out.

Thanks for reading

The Art of Queueing – Simulated

[Reading Time: 30mn]

I don’t know a lot of people who enjoy queueing. But queueing is a fact of life: each time we are competing for access to some limited resource, the natural tendency is to form queues: taxi spots, social security counters, Eiffel Tower entrances, Disneyland latest attraction, saturdays’ over-crowded supermarket tills, annual iPhone launch day at any Apple Store. The list goes on and on.

Last week I was standing in a queue which had formed in front of 2 tube tickets distribution machines at the St Lazare station in Paris, France. The line was short but accumulating. Not a good sign but the Metro hall at St Lazare is huge: you can fill many more people before promiscuity becomes compromising. And in any case, my turn was close.

Continue reading “The Art of Queueing – Simulated”

Data Quality – Testing Outliers With Live Market Data And The Stockings Library

[Reading time: 10mn]

While working on enhancing our outliers detection library, we saw an announcement for a new Clojure library called “stockings“. This library offers an API which you case use on order to get at market date from Yahoo Finance service (stockings.core) or Google Finance service (stockings.alt).

We’ll leave it to you to have a look at the complete capabilities of the library. As you guess from our previous posts, we were specifically interested in its market data history features.Our existing example starts with data available on disk in a CSV file. We’ve adapted it so it dynamically gets historical data using the stockings library.

Continue reading “Data Quality – Testing Outliers With Live Market Data And The Stockings Library”

Data Quality – Outliers Display with Incanter

[Reading time: 5mn]

In our previous post, we briefly explained how we used Clojure to do data outliers detection with descriptive statistics. Since then, we have enriched our prototype library with further detection methods: MAD (Median Absolute Deviation) and IQR (Interquartile Range). The source code is available on github if you want to play around with it.

Now, how good are these outliers methods? Obviously, as the functions return a collection of offending points with calculation details, it is rather difficult to notice whether the results are pertinent or not. For this, you want to see the time series on a chart with outliers highlighted – well, let’s say that we want to see this.

Continue reading “Data Quality – Outliers Display with Incanter”

Data Quality – Here Come The Outliers

[Reading time: 5mn]

We’re at a point where playing with our first outlier detection functions become interesting: it start generating ideas. This is mainly due to the fact that we have used Clojure to do these experiments.

As a functional programming language, Clojure puts functions first, allows for abstraction through composition and offers a dynamic and fun approach to developing using a REPL.
I know, it’s Lispy – and this is precisely the point. Code as data (or is it the reverse), and laziness (meaning lazy evaluation) provides tremendous opportunities when it comes to data analytics and data quality investigations.

Continue reading “Data Quality – Here Come The Outliers”

Rails – How to sort a model on a virtual attribute

[Reading time: 2mn]

Setting
Ruby 1.9.2-p180
Rails 3.0.4

The Problem
So you have a Product model with a couple of virtual attributes like user rating which depends on some clever calculation. It is declared as an instance def in the model, a.k.a: a virtual attribute (ps: remember that class defs – e.g.: def self.do_something – apply to the records as a whole, whereas instance defs – e.g.: def do_something – apply to the current record). But I digress.
Actually you want to sort the products on this user rating but you know that scopes only work at the database level.

Continue reading “Rails – How to sort a model on a virtual attribute”