Deduplication Routines

Deduplication is a deceptively complex problem to handle at scale. There’s a simple, ugly and brutal way to do it, where you compare two records and determine if they are exactly the same, but this doesn’t work if the records are very slightly different. A missing comma is enough to render that algorithm useless. Doing deduplication properly means more than just …

Deduplication Routines Read More »