Quick post using Python3 and the Seaborn statisitcal visualization package to start trying to understand the UK gender pay gap data released this week. All UK companies with more than 250 employees are required to provide data on how their female and male employees are paid differently. I decided to drill down to look at how, according to the data self-reported by companies, pay varies by gender in the electricity sector.
In this quick visual analysis, I started by downloading the CSV data and loading them into a pandas dataframe. I also created a second dataframe which includes only companies that operate in the electricity sector using SIC codes in the range 351X0 (if you are really interested, more details are avalable from the UK office of national statistics on SIC codes.
First lets rank all companies in the electricity sector by their mean % difference in hourly pay between mean and women (MeanHourlyPercentageDiff).
You can see immediately that in most companies in this sector, women are paid less than men, in the worst case over 40% less.
Seaborn allows us to easily overly distributions of mean and median pay differences between women and men, so we can compare the distibutions for the electricity sector only and the general population of all companies. We'll use Seaborn Kernel Density Estimate (KDE) plots here although plain old histograms could also be used.
Whether you look at the median or mean figures, the gender pay gap looks somewhat more pronounced in the electricity sector.
Basic pay is one thing but bonuses are also significant when considering how workers are paid. The dataset provides a number of statistics regarding this. Firstly we have the percentages of both women and men receiving a bonus. We can plot this data in a scatter plot and also fit a regression line to see if there is an inequality.
Here at least things seem fairer, in most companies similar percentages of men and women appear to recieve bonuses, and there seems to be little overall bias between
However, we also need to consider how the bonus payments themselves vary, and given the inequalities noted in basic pay in the sector it is not so suprising to see the same issues with bonus payments... Men tend to receive larger bonus payments than their female colleagues, as you'll see in the KDE plots for the distribution of the mean and median differences in bonus payments:
Companies also need to report the percentages of female employees in each pay quartile, and this data can also be used to measure pay inequality. In cases where there is pay inequality, we would expect to see diminishing or increasing percentages of women as we move up through the quartiles. Indeed we can fit a separate linear regression model each company through these four data points and use the slope of the line to establish a measure of inequality (a slope of 0 would indicate that a company has the same percentages of women in each quartile).
We can use this measure to rank the companies and investigate the distribution of percentages of female workers across the pay quartiles for the 6 companies in the sector with the most and 6 companies with the least pay ineqality.
First the 6 companies in the Electricity sector which show the most inequality according to this method:
These tend to be the companies in the electricity sector that had the highest differences in mean hourly pay... and note that the inequality is biased in favour of men in all of these cases.
Now the 6 companies in the Electricity sector which show the least inequality according to this method:
As expected these tend to be the companies in the electricity sector that had the smallest differences in mean hourly pay... However we should note that some of these companies, whilst having a small gender pay gap, actually employ relatively small numbers of women. As well as being a potential symptom of other forms of discrimination, the low proportions of female workers in these cases would tend to reduce the reliability of any conclusions we could draw about pay equality in these companies.