Data Quality – Testing Outliers With Live Market Data And The Stockings Library

[Reading time: 10mn]

While working on enhancing our outliers detection library, we saw an announcement for a new Clojure library called “stockings“. This library offers an API which you case use on order to get at market date from Yahoo Finance service (stockings.core) or Google Finance service (stockings.alt).

We’ll leave it to you to have a look at the complete capabilities of the library. As you guess from our previous posts, we were specifically interested in its market data history features.Our existing example starts with data available on disk in a CSV file. We’ve adapted it so it dynamically gets historical data using the stockings library.

Continue reading “Data Quality – Testing Outliers With Live Market Data And The Stockings Library”

Data Quality – Outliers Display with Incanter

[Reading time: 5mn]

In our previous post, we briefly explained how we used Clojure to do data outliers detection with descriptive statistics. Since then, we have enriched our prototype library with further detection methods: MAD (Median Absolute Deviation) and IQR (Interquartile Range). The source code is available on github if you want to play around with it.

Now, how good are these outliers methods? Obviously, as the functions return a collection of offending points with calculation details, it is rather difficult to notice whether the results are pertinent or not. For this, you want to see the time series on a chart with outliers highlighted – well, let’s say that we want to see this.

Continue reading “Data Quality – Outliers Display with Incanter”

Data Quality – Here Come The Outliers

[Reading time: 5mn]

We’re at a point where playing with our first outlier detection functions become interesting: it start generating ideas. This is mainly due to the fact that we have used Clojure to do these experiments.

As a functional programming language, Clojure puts functions first, allows for abstraction through composition and offers a dynamic and fun approach to developing using a REPL.
I know, it’s Lispy – and this is precisely the point. Code as data (or is it the reverse), and laziness (meaning lazy evaluation) provides tremendous opportunities when it comes to data analytics and data quality investigations.

Continue reading “Data Quality – Here Come The Outliers”