` Niall McCarroll's Professional Home Page

Air Pollution Risk Finder

python, javascript, svg

22nd November 2020

air pollution dashboard thumbnail

Recently I've been working with climate and environmental data and thinking of ways to present this data to non-scientists.

As part of the Visigoths team we explored how to present Covid-19 risks in a dashboard in our winning entry to the 4th COVID-19 hackathon organised by the UK Natural Environment Research Council.

Later we generalised these ideas to present a range of environmental risks. This post is about the next iteration of our risk finder dashboard which explores the risks from air pollution.

You can access the live dashboard here.


Visigoth: An Open Source Python3 library for building interactive Geospatial and Data Visualisations

python, javascript, svg

7th January 2020

road safety diagram thumbnail

I've just pushed an initial release out to github for visigoth.

This library aims to provide engineers and data scientists with an easy way to construct interactive visualisations containing multiple maps and charts. Visualisations are output as Support Vector Graphics (SVG) files.

For an example, click on the thumbnail image (on the left) or the larger image (below) to open the full SVG in a new tab or window. This visualisation looks at the spatial and temporal distribution of serious road accidents in the London,UK area in 2018.


UK general election 2019 results

javascript, svg

13th December 2019

hemicycle diagram thumbnail

Re-running the same visualization I ran after the UK general election in 2017... so the results look very different this time around. What is striking is the number of extra seats the Conservative Party has gained compared to the increase in their percentage of the votes cast.


Exploring UK Gender Pay Gap Data with Seaborn


6th April 2018

Quick post using Python3 and the Seaborn statisitcal visualization package to start trying to understand the UK gender pay gap data released this week. All UK companies with more than 250 employees are required to provide data on how their female and male employees are paid differently. I decided to drill down to look at how, according to the data self-reported by companies, pay varies by gender in the electricity sector.

I've provided my workings in a jupyter notebook. If you want to run the examples and don't have Jupyter and Seaborn installed I'd recommend installing these quickly and easily via Anaconda.


UK general election 2017 provisional results

javascript, svg

8th July 2017

hemicycle diagram thumbnail

So things didn't quite turn out quite as anyone expected in the snap UK general election...

I wanted to create a visualisation of the results which contrast the seats won with the % of the popular vote, and came up with this infographic. The nice thing about the two semi-circular charts I generated is that they can be nested within each other.


Mo Farah's Olympic (Rio 2016) 5000m final victory, in tweets

javascript, raphaeljs

21st August 2016

Mo Farah's "mobot" victory gesture.

Four years on from the London Olympics he's only gone and done it again - the double double 5000m/1000m.

Once again, I tracked the tweets using the twitter streaming API (search terms #gomo,#motime,@mo_farah,#mofarah) before, during and after the race.

The interesting things is, well, the distribution of tweets over time is pretty similar to last time. Even the absolute rates in tweets per second are similar, despite the fact the race started at 01.37am British Summer Time. You can compare them youselves by looking at my original post from 2012.


Getting started with PySpark - Part 2

pyspark, python, data science

5th May 2014

In Part 1 we looked at installing the data processing engine Apache Spark and started to explore some features of its Python API, PySpark. In this article, we look in more detail at using PySpark.


Getting started with PySpark - Part 1

pyspark, python, data science

2nd March 2014

Apache Spark is a relatively new data processing engine implemented in Scala and Java that can run on a cluster to process and analyze large amounts of data. Spark performance is particularly good if the cluster has sufficient main memory to hold the data being analyzed. Several sub-projects run on top of Spark and provide graph analysis (GraphX), Hive-based SQL engine (Shark), machine learning algorithms (MLlib) and realtime streaming (Spark streaming). Spark has also recently been promoted from incubator status to a new top-level project.

In this series of blog posts, we'll look at installing spark on a cluster and explore using its Python API bindings PySpark for a number of practical data science tasks. This first post focuses on installation and getting started.




1st December 2012

This snippet, twitstreamer, is a simple command line tool, written in python3, for retrieving tweets via the twitter streaming API, v1.1. The tweets are written to standard output as CSV or JSON formatted lines.

The tool will read from either of two twitter streaming APIs.




21st November 2012

This snippet, twitfetcher, is a simple command line tool, written in python3, for retrieving tweets via the twitter search API, v1.1. The tweets can be stored into CSV or JSON formatted files.

Twitter only makes a sample of those tweets sent over the previous week searchable, but it is still a very useful free source of data for data science experiments.


Analyzing co-occurence networks with Gephi

python, sna, gephi, nltk

12th October 2012

I started on Coursera's Social Network Analysis course and was looking around for some network data to start analyzing. I've seen a talk by Matt Biddulph at a Big Data London meetup (blog post) on analyzing Wikipedia data and wondered if something similar could be easily done with news data.

It was fairly easy to grab some newspaper articles using the Guardian Open Platform. I then used the python-based Natural Lanuage Toolkit to extract named entities (in particular the names of people) from the articles. A network could then be constructed using names as the nodes, and connecting nodes with a link if at least two articles included both names.

The resulting network could then be loaded into Gephi, an excellent tool for visualizing and anayzing networks.


Mo Farah's Olympic 5000m final victory, in tweets

javascript, raphaeljs

12th August 2012

Another sports related post, this time inspired by Mo Farah's amazing double gold medals (in the 5000m and 10000m) over the last couple of weeks at the London Olympics.

Mo Farah's "mobot" victory gesture I used the gRaphael Charting Library and the Twitter search API to show how the rate at which tweets containing the hashtag #gomo varied before during and just after the 5000m London Olympics final. Hover over the chart to display the text for selected tweets.

The main features of the chart are a small peak just before the race starts followed by the huge peak after Mo wins. And I thought it was a long way to jog to the bus stop when running late in the morning!


Visualizing Reading FC's Winning 2011/2012 Season

javascript, d3

8th August 2012

I used the D3 javascript visualization library and modified one of the code examples from Mike Dewar's excellent book Getting started with D3 to build a modest visualization which tells the story of Reading football club's championship winning season 2011-2012.

The visualization plots matches played (x-axis) against points accumulated (y-axis). Click on "Add club" button to compare the progress against that of the other clubs playing in the England and Wales FA Championship.