Using survey data to investigate migrants and benefits

Last week, the Daily Express published a story with the headline “Migrants ‘milking’ benefits system: Foreigners more likely to claim handouts”, drawing upon figures from a Migration Watch report that investigates the economic characteristics of migrants in the UK in 2014. The purpose of this post is not to critique the claims made by the newspaper nor the report itself, but rather to investigate whether the source of the data, the Labour Force Survey (LFS), is capable of providing evidence that is robust enough to support these claims.

As the study points out, migrants in the UK do not uniformly match the age profile of their host country as a whole and predominantly fall into the 25-44 age bands. It follows that analysis of benefit claims by migrants should be broken down by age, yet the picture that emerges of the differences between the UK-born and non UK-born population is not entirely clear. The charts  below – taken directly from the report – show that claimant rates for certain benefits are higher among the non UK-born (e.g. housing benefit) while for others they are lower (e.g. out-of-work benefits). The implication is that a nuanced narrative might be more appropriate than broad statements about migrants’ likelihood to claim.

150728 fig1

As Migration Watch point out in their report, the Home Office identified the LFS as “the most complete data source for measuring the impacts of migration on the UK labour market”. Yet, as with any survey data, problems of sample size arise the deeper you drill into it. To illustrate these problems I’ll take one example from the set of countries that Migration Watch chooses to focus on in its report, that of migrants from Pakistan and Bangladesh. One chart in the Migration Watch report shows a 10 point gap in the claimant rate between UK-born and Pakistan/Bangladesh born 40-44 year old housing benefit claimants.

Yet when we consider percentages such as those represented in the chart below it is important to consider the underlying data that these figures are drawn from.

150728 fig2

The ideal scenario would be to take all working age housing benefit claimants born in Bangladesh or Pakistan and divide it by the entire working age population born in either of these countries. Unfortunately, this data does not exist and herein lies the value of surveys. They allow us to take a focused snapshot of a population – for which it would be prohibitively expensive and time consuming to collect data on every individual within it – and then use this to make inferences on the population as a whole.

While the LFS has a relatively large sample (just under 100,000 for the Q2 2014 data used here), in looking at Housing Benefit claimants born in Pakistan or Bangladesh we are essentially zooming in on four rather focused subsections within this overall sample:

150728 fig3

If we just take the top two layers of these sub-categories, the working age population born in Pakistan and Bangladesh, we can compare those captured by the LFS with those captured by the 2011 Census (the Census differs from the LFS in that it attempts to capture all individuals within the entire population). Such an exercise reveals that while the Census represents over 595,000 within this group, the LFS has only 895 – reflecting coverage of 0.15%. It seems quite intuitive that when we drill down into increasingly focused sub-categories the sample size shrinks, yet we should equally be aware of the limitations that this imposes on our ability to make reliable inferences.

Sampling the population in this way can only produce estimates of the figure in question, and with these estimates we have to keep in mind that there is an element of statistical uncertainty. The smaller the sample we have to work with, the greater this uncertainty relative to the sample in question. This is known as the standard error. While this may seem like the splitting of statistical hairs to many, these standard errors have a large impact on certainty. They were previously published by the LFS and in 2009, for a sample of 10,000 the 95% confidence interval stood at +/- 3,900. This would mean, for example, that if the survey estimates a pool of 10,000 housing benefit claimants, we can be 95% certain that the true value lies somewhere between 6,100 and 13,900 and 5% of cases will actually lie outside this range.

With this in mind, if we add additional focus on housing benefit by age group, how many respondents does this leave us with? For the group showing the largest disparity between UK and Pakistan and Bangladesh claimants (40-44yrs), we are left with 27 individuals. Thus the column in the chart showing that migrants aged 40 to 44 and born in Pakistan and Bangladesh have a 10 point higher rate of housing benefit claims is based on 27 survey respondents.

Moreover, as you might expect from such a limited set of respondents, there are other factors that bias these figures. If we ignore the age breakdown and focus on all migrants born in Bangladesh or Pakistan of working age (16 to 69 as used by Migration Watch), we are left with 140 LFS respondents. Despite the larger sample size, it is the geographical spread of these respondents that also raises concerns over the efficacy of comparing claimants across countries of birth. The chart below shows the proportion of housing benefit claimants by region with two separate series – claimants born in Pakistan or Bangladesh and all claimants. It shows that claimants in the former group are significantly overrepresented in London. This may not come as a surprise given the concentration of many migrant groups in the capital (32% of Pakistan/Bangladesh- born population in England and Wales according to the Census), yet this presents a clear problem when comparing claimant rates by country of birth.

150728 fig4

London has the second highest housing benefit claimant rate for people of all countries of birth.  Thus if respondents born in Pakistan or Bangladesh are disproportionately concentrated in the capital, it follows that they are more likely to have higher rates of housing because of where they live, not necessarily because of their status as migrants. The report also claims that migrants born in Somalia have the highest housing benefit claimant rate at over 45%, yet over 62% of those surveyed live in the capital. This is what is known in statistics as omitted variable bias, when a model incorrectly leaves out one or more important causal factors. While Migration Watch briefly mention the importance of a London-focused population (in reference to African-born migrants), this issue deserves more prominence, particularly with regard to Housing Benefit.

Delving into other benefits and other groups of migrants is beyond the scope of this post, but even at a very broad level there are issues with using the LFS to make inferences about benefit claimants. The LFS has shortcomings as a source of benefit claimant data due to discrepancies (which can be substantial) between individual respondents’ descriptions of the benefits and tax credit they receive and the official DWP / HMRC figures on benefit and tax credit claimants drawn from administrative databases. The ONS Labour Force Survey user guide (volume 3) notes:

Comparison between the data collected by the LFS and administrative data collected by other Government departments shows that the LFS consistently undercounts benefit claimants.

If such undercounting is spread evenly across different groups (such as migrants and non-migrants), this may have little effect on differences between claimant rates. If, however, one group is less likely to under report than another, this is likely to further bias the data one way or another.

The LFS is not the only source of data on benefit claims by nationality. While Migration Watch predominantly focus on non-DWP-administered working age benefits (tax credits, child benefit, etc.), there is another data source, not used in the Migration Watch report, that matches DWP out-of-work benefit data with records of those who were non-UK-nationals when they first registered for a UK National Insurance number (NINo).

This is not the same as those who are currently non-UK nationals, as some of these will have subsequently obtained British nationality. Moreover, this measure does not capture foreign-born individuals who were naturalised (obtained UK nationality) prior to NINo registration.

Based on the LFS data for all working-age groups, Migration Watch say: “Rates of claim for Income Support and Jobseeker’s Allowance differ little between migrants and the UK-born”. DWP NINo data from June 2014 suggests rates among the former group are 3% compared with 5% for the latter – a statistically significant difference. For all out-of-work benefits, claimant rates were 6% for those who were non-UK nationals when registering for their NINo and 11% for UK nationals.

The intention here is to show that different data sources appear to be showing different things across different benefits among different age groups, allowing different conclusions to be drawn and presenting a confusing picture.

As the Home Office mention (and I pointed out above), the LFS is an incredibly useful source of labour market data and we should continue putting it to good use. However, due to the shortcomings discussed above, a degree of caution should be exercised in interpreting results, particularly on highly sensitive topics such as welfare and migration.

Steven Ayres

Following comments this blog was amended on 30 July.

First, we recognise that Migration Watch did not specifically make a “claim” about Housing Benefit receipt of Pakistani and Bangladeshi-born people aged 40-44 and have amended the text to indicate that this figure was taken from a chart in the Migration Watch report.

Secondly, a comment was that the DWP NINo data did not show a different conclusion to the LFS on rates of claim for out-of-work benefits. We have amended the text to clarify that using NINo data, claims are higher for UK nationals when looking at Income Support and Jobseekers Allowance separately as well as for out-of-work benefits overall. Also, we have clarified that this section was intended to illustrate the point that the use of different sources can lead to different conclusions being drawn.

Picture Credit:  “St Pancras”, 20 July 2011 by Aurellen Gulchard; Creative Commons Attribution 2.0 Generic (CC by 2.0)