New Year, New Tool. Introducing Our New Classifier.

December 23, 2020
Procurement

Our New Solution To Classification

In 2020, we developed an advanced classifier. This tool adds multiple labels to procurement notices based on the Common Procurement Vocabulary (CPV), with an accuracy of 84%. This classifier gives notices five scored, Level 3 CPV codes based on their text and description, that work for the 90 languages in our database. This classifier can process millions of recommendations within hours. As of writing, we are classifying all our new records each day, with full predictions appearing in our API in early in 2021.

This classifier is a series of algorithms that were trained on millions of rows of procurement data in our database. Our model has an F1 score of at least 84%, a precision of 83% and a recall of 88%.

What Does F1, Precision and Frequency Mean?

These terminologies are industry standard terms within the practise of demonstrating algorithmic accuracy.  F1 score is the ‘harmonic’ mean or average of precision and recall. Precision analyses correct and incorrect predictions based on the rows that went into the model. Recall looks at the wider picture: which rows the model got right versus the ones it completely missed.

The Challenges With The Classification

Developing code to categorise spending and tenders has proven technically challenging, due to a number of complicating factors in the underlying published data and the scale of the problem to be addressed.

Multi-Label Data Challenge. All of the documents published in TED and many of the public sources around Europe publish documents with multiple classifications, with no primary classification identified. Many records have more than ten classification references and some buyers deploy a technique of adding diverse requirements to a single tender in different lots. This requires buyers to provide diverse categorisations for the same tender, making it impossible to identify a primary classification for a tender.

So a document with the following codes 72541000, 72510000, 72590000, 30210000 would be classified to 72000000 : IT Services: consulting, software development, Internet and support.

Inaccurate Classification Challenge. We’ve found that a high number of documents have been incorrectly classified. Mostly this is down to confusion around the correct class to apply, but occasionally there is a clear misapplication of codes, usually around services. We often see consulting described as training or research. This problem is exacerbated by the inconsistent nature of CPV codes which are poorly optimised for the description of services. ‘Consultancy’ is mentioned 6 times in the top two layers of the classification and in four different divisions of the vocabulary.

Poor Narrative Data. With some documents having just five words in their combined titles and descriptions, this is barely enough to classify the documents and makes the task of classifying data much harder. This issue is particularly prevalent in Spain where narratives are, on average, half as long as narratives in the UK.

Scale of Classification Challenge. Scale increases the complexity of the analysis we need to do significantly, with thousands of classification options open to the algorithm. The risk of inaccurate classification increases exponentially with the number of classes to be used.

Addressing These Challenges

With our classifier tool, we tackle each of these problems. Our tool is detailed, accurate, and can be deployed at scale. We can use this on private and public sector tenders, contracts and spend. This tool classifies to CPV but we can adapt this tool to other major classifications.

Using The Classification

→ In our api / source data

We can provide data with enhanced classification through our off-the-shelf API or through a custom API. This can be done for either all or a curated part of our data. For instance, if you are a haulage company, we can give you a customised feed of tender, contracts and spend data for logistics, infrastructure, construction.

→ We can classify your data, send and return

We can take your dataset, apply classification labels and send it back to you. Whether you want more details of corporate costs or a B2C analytics firm looking at better segmentation of consumer spend, we can help.

→ Building a custom classification for you

We can work with existing classifications such as CPV or Proclass or even make one for you, tailored to your needs. We can create a new classification that both fully reflects modern buying trends such as consultancy, and improves on existing, outdated taxonomies such as digital and software services.

Examples

Where We Improve The CPV Level

TITLE ORIGINAL CPV PREDICTED CPV
GP IT Systems IT services: consulting, software development, Internet and … Software-related services
Provision of Summative Assessment … Business and management consultancy and related … Business and management consultancy services
Re-Roofing of Admin Building, Police HQ Portishead BS20 8QJ Construction work Roof works and other special trade construction works
MPV Passenger Transport without Passenger Assistance Transport services (excl. Waste transport) Taxi services

 

Adding CPVs Where None Existed

TITLE PREDICTED CPV
New build mixed-use development of 17 apartments and business hub Building construction work
PHE – ICT – Application Service Support and Maintenance for contract tracing solution Software-related services
Citizen & Consumer Protection Officer x4 Supply services of personnel inc. temporary staff
Provision of Youth and Young Carers Services Social work services
Evaluation of the Small Business Leadership Programme (SBLP) Business and management consultancy services

Different CPV For Better Accuracy

TITLE ORIGINAL CPV PREDICTED CPV
The Provision and Maintenance of NHS Health Check Software Health and social work services 85000000 Software-related services 72260000
C-19 Corporate Finance and Market Intelligence Support… Business and management consultancy … 79400000 Financial consultancy… 66170000
CA7688 – Roofing Services Tender Building services 71315000 Roof works and other works 45260000
Housing Solutions Officer x11 Housing services 70333000 Supply services of personnel inc temporary staff 79620000

Multilingual Example

TITLE ORIGINAL CPV PREDICTED CPV
Puhastusteenused Cleaning services Cleaning services
Services d’assurance Insurance services Insurance services
Villamos energia beszerzése Electricity Electricity
Furnizare alimente Chickens Meat
Rahmenvereinbarung Lieferung und Montage von Schulmöbeln School furniture School furniture
Servicios de reparación y mantenimiento … Repairs & Maintenance  Repairs and Maintenance

Get in touch if you’d like:

  • classified procurement data;
  • to have your own data classified;
  • or to build your own classifier.
January 12, 2021

Controversial Food Box Contractors Face Scrutiny

There is increased scrutiny for the suppliers of food boxes provided to the chronically ill and those asked to shelter through the...
January 7, 2021

£550 Million Missile Contract Signed.

Yesterday defence Defence Minister Jeremy Quin announced a £550 million contract was awarded for new surge-attack missile The contract award promises 'hundreds...
January 7, 2021

Time To Build More Open Products For Government

-Ian Makgill, Founder Spend Network Just before Christmas, DXC (formerly Hewlett Packard) was awarded a contract for £430,000 by the Business Services...
December 18, 2020

The buyers that spoiled Christmas 2020

Welcome to our annual run down of the buyers that are most likely to spread misery for suppliers at Christmas. Here are...
December 17, 2020

Why Blacklisting Is Harder Than You Think.

Sadly, we don't have to look far to find examples of suppliers being accused of illegality. The Grenfell enquiry heard evidence that...
December 17, 2020

Build Back Younger?

Joe Biden's exhortation to 'build back better', which has also been used by Boris Johnson, is broadly equivalent to the more arch...
December 17, 2020

UK Government Launches Plans To Transform Procurement.

The UK government yesterday launched a green paper, a series of proposed changes to procurement rules, purporting to put transparency and increased...
December 23, 2020

More Governments Improve Transparency.

It is always encouraging to see government procurement transparency improving around the world. Brazil and Cote d'Ivoire have both recently applied to...
December 14, 2020

Life in The Fast Lane

The NAO has just published a report criticising the Government for using a 'fast-lane', where suppliers that were known to MPs were...
December 14, 2020

Adding Value

We're analysts. We work with data, every day. We know what works and what doesn't work. We know about values that can't...
December 14, 2020

Visualise your data

Harness the power of procurement data to make informed decisions. Using our advanced analytics, we can create custom visualisations and dashboards for...
December 14, 2020

Clean and enrich your data

Cleanse all of the supplier records within your organisation, creating a single, consolidated record with rich information to help you make better...
December 17, 2020

Classify your data

We have built a state-of-the-art algorithm just to categorise procurement data. Work with us to categorise millions of records with stunning accuracy...
December 17, 2020

Analysis for the $13trn global market

If you need authoritative, evidence-based insight to apply to a broader strategic challenge, we're here to help. Unlike other consultants, we start...
December 17, 2020

Algorithms That Are Better Than Humans

Categorisation is about making it easier to find opportunities. Getting it right means being better than humans. For it be really useful,...

Newsletter

Compelling research, insights and data directly into your inbox.

Recent media stories

Welp Magazine
December 23, 2020
FT PPE Story
The Financial Times
December 9, 2020

Search