Data Quality – Testing Outliers With Live Market Data And The Stockings Library

[Reading time: 10mn]

While working on enhancing our outliers detection library, we saw an announcement for a new Clojure library called “stockings“. This library offers an API which you case use on order to get at market date from Yahoo Finance service (stockings.core) or Google Finance service (stockings.alt).

We’ll leave it to you to have a look at the complete capabilities of the library. As you guess from our previous posts, we were specifically interested in its market data history features.Our existing example starts with data available on disk in a CSV file. We’ve adapted it so it dynamically gets historical data using the stockings library.

As you’ll see in the code below, it’s pretty straightforward except that:

  • Yahoo Finance limits the data volume you can get to approx. 18 months. If you ask for more, you’ll get nothing.
  • Google Finance gives you a maximum of 4000 data. As opposed to Yahoo Finance behavior, if you ask for more you’ll still get your 4000.
  • Google Finance doesn’t know about SP500 index (symbol ^GSPC).

The above limitations are in no way due to the stockings library – but you’d better know about them. I guess this is done on purpose by Yahoo and Google: don’t forget that these services are free – live, no-limit, API-based access to historical market data across several markets is something financial companies are paying huge money for.

One last remark worth remembering is that the stockings library uses Joda time. This means you have to use Joda as well. Unfortunately, we’re also using Incanter and passing Joda time class instances to Incanter’s charts functions didn’t work out very well. So we had to convert Joda times to epoch times, going through the venerable Java Date…

Enough said. Show me the code.

First off, let’s change our project dependencies to include “stockings”

(defproject outlier "1.0.0-SNAPSHOT"
  :description "FIXME: write description"
  :dependencies [[org.clojure/clojure "1.2.1"]
        [org.clojure/clojure-contrib "1.2.0"]
        [net.sourceforge.javacsv/javacsv "2.0"]
        [incanter "1.2.3-SNAPSHOT"]
        [clj-time "0.3.0"]
        [com.fxtlabs/stockings "1.1.0-SNAPSHOT"]]
  :dev-dependencies [[org.clojure/clojure "1.2.1"]
        [swank-clojure "1.3.0"]
        [jline "0.9.94"]]
  :repositories {"clojure-releases" ""}

Now run “lein deps” to retrieve the related jars.

Then start “lein repl” and declare a namespace

user=> (ns testing (:use [outlier utils core])
                (:use [incanter core charts io pdf])
                (:use [incanter.stats :exclude (mean median)])
                (:use [stockings alt])
                (:import [org.joda.time LocalDate]))

Now get at IBM prices from Google Finance (we’ve declared stockings.alt)

testing=> (def recs (get-historical-quotes "IBM" (LocalDate. 1995 4 1) (LocalDate. 2011 6 1)))

Change Joda times to Epoch. To do that, we extract the date and close price from the initial collection, and convert the date before reinserting the results in a new map. We loop through all original prices using the map function which will give us a lazy collection in return.

testing=> (def recs2 (map #(assoc {} :date-epoch (. (. (. (:date %) toDateMidnight) toDate) getTime) :date (:date %) :close (:close %)) recs))

Note that we kept the original date – much more readable than epoch.
Now we’ll want to sort the data in date ascending, like this:

testing=> (def sorted-recs (sort-by :date recs2))

Incanter lets you have a look at the data this way:

testing=> (view (to-dataset sorted-recs))

Finally, time has come to look for outliers:

testing=> (def outs (outliers-median (into [] (map #(:close %) sorted-recs)) 5  1.5))

Now, create an Incanter chart to display the original data together with the outliers – and there are quite a few of them actually which obviously are false positives.

testing=> (def chart (time-series-plot :date-epoch :close :data (to-dataset sorted-recs)))testing=> (map #(doto chart (add-pointer (:date-epoch (nth sorted-recs (:idx %))) (:close (nth sorted-recs (:idx %))) :text "Outlier")) outs)

This is what it looks like:

testing=> (view chart)

This is it. If you’re modeling market data analytics in Clojure (or even in Java, because we know that you can use Clojure libs from Java can’t you), you’ve got to take a look at the stockings library. Even though we just looked at the historical data retrieval here, it has a great set of features and is actively maintained by a responsive and friendly developer. It is definitely part of our toolset now.

Until next time.


[Updated: 2015-02-04]

Fixed some typos and format, and updated the link to the stockings library


Leave a Reply

Your email address will not be published. Required fields are marked *