Various musings about everything and nothing...
8th July 2017
So things didn't quite turn out quite as anyone expected in the snap UK general election...
. I wanted to create a visualisation of the results which contrast the seats won with the % of the popular vote, and came up with this infographic. The nice thing about the two semi-circular charts I generated is that they can be nested within each other.
21st August 2016
Four years on from the London Olympics he's only gone and done it again - the double double 5000m/1000m.
. Once again, I tracked the tweets using the twitter streaming API (search terms #gomo,#motime,@mo_farah,#mofarah) before, during and after the race.
The interesting things is, well, the distribution of tweets over time is pretty similar to last time. Even the absolute rates in tweets per second are similar, despite the fact the race started at 01.37am British Summer Time. You can compare them youselves by looking at my original post from 2012.
4th April 2015
14th December 2014
Equipped with only a Raspberry Pi, a Raspberry Pi Camera, and small USB Lithium Battery pack, I set out to capture some Astrophotography images on a clear winters night.
In particular I was aiming to capture star trails - the trails that appear to be left by stars as they move across the night sky (in fact it is the observer who moves as the earth rotates). I would take a series of still images using the Raspberry Pi's camera and store them onto the Pi's memory card. Later I would try to combine the images which would hopefully plot the star trails.
pyspark, python, data science
5th May 2014
pyspark, python, data science
2nd March 2014
Apache Spark is a relatively new data processing engine implemented in Scala and Java that can run on a cluster to process and analyze large amounts of data. Spark performance is particularly good if the cluster has sufficient main memory to hold the data being analyzed. Several sub-projects run on top of Spark and provide graph analysis (GraphX), Hive-based SQL engine (Shark), machine learning algorithms (MLlib) and realtime streaming (Spark streaming). Spark has also recently been promoted from incubator status to a new top-level project.
In this series of blog posts, we'll look at installing spark on a cluster and explore using its Python API bindings PySpark for a number of practical data science tasks. This first post focuses on installation and getting started.
4th December 2013
In Part 1 I looked at the hardware required to build a mini Raspberry Pi cluster with 5 machines where one of the machines acts as the "head" node, connecting the cluster to the internet via a wireless lan link, and the other four are "worker" nodes.
In this post I'll describe how to configure Raspbian OS on each of the machines. We'll want to be able to log in to each of the worker nodes from the head node without a password, and we'll want to enable internet access on the head node and each of the worker nodes.
12th October 2013
The Raspberry Pi is an amazing, cheap single board computer designed to hook up to a TV and help with teaching programming.
I was inspired by a story from Southampton University in the UK to build my own (but rather more modestly sized) Raspberry Pi cluster. The cluster has 4 worker nodes (Rev 2 model Bs with 512Mb RAM), and 1 head node (Rev 1 model B with 256Mb RAM). Each of the nodes run Raspbian.
14th July 2013
This snippet, simpleproxy, is a simple command line tool, written in python2/python3, for forwarding network connections on a specified proxy port, to another port (possibly on a different host).
simpleproxy can forward multiple connections at the same time and uses an approach called event driven programming. Rather than starting a thread to handle each proxy connection, the program sits in an event loop and waits for activity on each of the connections, moving data whenever a connection is ready. This approach is particularly suitable for programs which do a lot of I/O (and this one does little else).
26th January 2013
I've always wanted to see the northern lights in person, but in the meantime I experimented with capturing time-lapse video from web-cams set up on the Nature of Jokkmokk site.
While being careful not to overload the site, I set up a job to download images from one of the site's webcams every 5 minutes over a 5 day period, and found that the northern lights lit up the night of the 23rd/24th January 2013. It was then fairly simple to stitch the images from this night together using ffmpeg and some advice in Paul Rouget's blog post.
|pyworksheet, an online tool for experimenting with python||This blog post introduces pyworksheet, an online tool for experimenting with python enabled by google app engine||1st January 2013|
|twitstreamer||commandline tool for reading tweets from the twitter streaming API||1st December 2012|
|twitfetcher||commandline tool for searching for and fetching tweets||21st November 2012|
|Analyzing co-occurence networks with Gephi||This blog post describes some early results analyzing networks co-occurence of individuals named in news stories||12th October 2012|
|Displaying recent volcanic eruptions on a 3D globe with webgl/three.js||This blog post uses webgl/three.js to plot the location of recent volcanic eruptions on a 3D globe||22nd August 2012|
|Mo Farah's Olympic 5000m final victory, in tweets||This blog post uses the g.raphaeljs library to chart tweeting rates during the 5000m final||12th August 2012|
|Visualizing Reading FC's Winning 2011/2012 Season||This blog post uses the d3 library to visualize Reading FC's incredible 2011/2012 season||8th August 2012|
|Connecting a Raspberry Pi to a ubuntu netbook||This blog post describes how to make a direct ethernet connection between the Raspberry Pi and a netbook/laptop running Ubuntu 12.04||27th July 2012|
|Perill de caigudes||Perill de caigudes||6th June 2012|
|HBaseload||A utility for importing/exporting between hbase and csv||12th December 2011|
|Minicrawler||An extensible queue-based web-crawler||12th December 2011|
|svgworld||svgworld renders a 3d projection of a world map using svg||23rd December 2010|
|PySparql||Simple ways to access dbpedia data using python and Sparql||11th January 2010|
|S3 Cache||A local file cache for Amazon S3 using python and boto||9th September 2009|
|Snapshot||Snapshot demonstrates a way to create a static, standalone copy of the dynamic content in a web page||7th February 2009|
|PPyPNG||A pure python script with limited functionality for creating portable network graphics (PNG) format files||6th July 2008|