Finding Beneficiaries of Public Money

December 17, 2020
Data, Procurement

By Alex Yeung.

Part 1: The Matching Process

Linking entities is a common theme in procurement data, but it is a significant problem in the UK. The concern with entity matching is the cost of false positives. This issue is particularly stark when finding the beneficiaries of money. The main challenge is that name matching is largely insufficient for people or companies, and here’s why.

Take for example, the “John Smith” conundrum.

“John Smith” – matches – “John Smith” is insufficient information to be useful.

“John Smith, Born: 1971/01/21, Reigate, Surrey” matches “Jon Smith, DoB: 21/01/1971, Surrey, UK”

This match is more useful because of the other pieces of data that corroborate this match, such as the date of birth or a string of text from the address. Even then for common names such as John Smith, it is possible that there might be two born on the same date who reside in the same town.

Lost In Translation

This problem is exacerbated by some names, especially those from the Far East, who are character-based. There’s often little consistency in how they’re represented in other languages. Take the author’s Chinese name: it can be anglicised to Saiman, Sai mun, or even Simon. The latter might not even be the native language name as many people adopt a name like ‘Thomas’.

Therefore, for John Smith and other names, much of the battle for matching is finding supporting data that allows the match to be made. This includes investigation using publicly available data to corroborate matches.

Matching Bias

Of course, the reverse is also possible. Many cultures and countries have more unique naming systems such as longer and/or more unique names. Take the name of present UK chancellor Rishi Sunak. At the time of writing, on Companies House there are only three entries for company officers of this name, with two entries having the same month as a birth date. This narrows the scope for verification somewhat. Contrast this to 375,407 entries for a search for John Smith! (https://find-and-update.company-information.service.gov.uk/search/officers?q=john+smith#)

Batch Matching Names

Of course, when it comes to Companies House, a more efficient approach is needed. Even generously assuming that it would take 5 minutes on average to verify each name and 8-hour workdays, it would take over 3900 years to go through John Smith alone. To put that into context, the last known woolly mammoth reportedly died 4000 years ago. (https://www.sciencedirect.com/science/article/pii/S0277379119301398).

We Just Use A Script.

Using our data infrastructure and our algorithms, we can compare two lists within days, not millennia. Of course, we add our own investigatory magic to verify our matches. We have to. To give an example, here is what a machine might see:

Name list 1: glmb zimrh

Name list 2: glmb qznvh zimrh

Judging solely by what can be seen, a match is not immediately obvious in these two strings. In fact, what has been done is a simple inversion of the alphabet for the following two strings:

Tony Arnis

Tony James Arnis

Note: ‘Tony Arnis’ and ‘Tony James Arnis’ are both names made up for this article.

To the human eye, it’s pretty obvious that this is a match. This is because an Anglophone reader would have a priori knowledge of common surnames (Arnis is not a common surname: https://find-and-update.company-information.service.gov.uk/search/officers?q=arnis). The script does not have this a priori knowledge and would likely reject it. Of course, it can do: a more advanced algorithm trained on substantial amounts of prepared data might well address this but this takes time and resources to develop and train.

Stuck In The Middle

The ‘Tony James Arnis’ issue is representative of a broader challenge: middle names are really annoying for name matches because they are so inconsistently applied. A list of names might have middle names, it might not. Another list of names might have these names, it might not. Even personal use of middle names is not consistent. Incidentally, the UK passport only has so much space. If a person has too many middle names, one or more might be cut off partway. Again, this requires investigatory work to get right.

A Titular Distinction

Similar to the middle name problem is titles. Titles can appear anywhere within a name and means that script matching can reject what might appear to be perfectly good matches. Mrs, Dr., Prof., Professor, LLM, MCRVS, Eng, FBCS, these are but a few titles that can confound simpler scripts. This is especially troublesome when there are no consistent conventions for naming within any lists of names: some lists might put Dr. in the front of the name, others at the back. There are ways around this however through better scripts, but that’s a story for another time.

The challenges are many but not insurmountable. Matching is critical in bringing beneficial ownership data and procurement data together. With our colleagues at Open Ownership, we will continue to think about how we will do this in the future, both with the tools at our disposal and the tools we can create.

September 29, 2022

£900k Government Fund To Help Charities Win Public Contracts.

The Department for Digital, Culture, Media & Sport (DCMS) is running a Voluntary, Community and Social Enterprise (VCSE) Contract Readiness Fund grant...
September 27, 2022

New EU Procurement Instrument Now Law.

About a year ago, we wrote an article on the new procurement instrument approved by the European Union. In recent weeks this...
September 20, 2022

Blacklisting Gets Tested.

Back in June we wrote about blacklisting of suppliers and the Government’s intention to prevent poorly performing suppliers from bidding for government...
September 8, 2022

UK Risks Its Place On Anti-Corruption Body

The UK has been placed ‘under review’ by the 77-country-strong Open Government Partnership (OGP) due to its failure to meet mandatory criteria...
September 8, 2022

Thurrock Exposes Transparency Blind Spot

An investigation by The Bureau of Investigative Journalism (TBIJ) into investments by Thurrock Borough Council has led to the resignation of the...
September 6, 2022

New Zealand Government Reviewing Procurement System.

It’s always encouraging when we see governments around the world looking to improve their procurement transparency and efficiency. The New Zealand Government...
September 1, 2022

Collecting Data For Sustainable Procurement In Construction

Over the last few weeks we have been looking at setting a sustainable procurement framework in the construction industry, and what kind...
August 30, 2022

Selecting Data For Sustainable Procurement In Construction

It is estimated that around 40-50% of natural resources are transformed into construction material, and that as much as 30% of all...
August 25, 2022

Setting A Sustainable Procurement Framework For Construction

When procuring construction projects, it can be useful to underpin sustainability criteria on existing policy and regulation. When assessing the enabling framework,...
August 18, 2022

Big Net Zero Contract Win For Small Cornish Business

A small Cornish company has purportedly won a £70bn contract to help deliver the country's transition to Net Zero. The Penzance based...
August 16, 2022

Supporting Sustainable Procurement In ICT

One of the key challenges of sustainably procuring ICT lies in the lack of transparency in supply chains. To overcome this challenge,...
August 11, 2022

Why Is Sustainable Procurement Important For The ICT Sector?

The extraction of raw materials, manufacturing, transportation, use, and disposal of ICT products is associated with a number of environmental, social, and...
August 10, 2022

Shifts Towards Sustainable Sourcing

A while ago,  we shared an article on findings by the Boston Consulting Group and the World Economic Forum that showed procurement is responsible...
August 9, 2022

Supporting Sustainable Procurement In The Construction Industry

Construction projects are usually long and complex, involving the participation of different stakeholders throughout the different project stages. There are certain factors...
August 4, 2022

Why Is Open SPP Important In The Construction Sector?

The construction industry is estimated to account for 6% of global GDP, with Africa's construction market valued at around USD 5.4 billion...

Newsletter

Compelling research, insights and data directly into your inbox.

Recent media stories

The Times
May 30, 2022
CIPFA
August 3, 2021

Search

Scroll to Top