•   
  • Blog   
  • When manual line by line cleaning is not enough

When manual line by line cleaning is not enough

Martin Magdinier  |  04 October 2014

   ETL

One of the big news in the industry this month was CrowdFlower raising $12.5 million in funding to support its growth. CrowdFlower is like a souped up Amazon Mechanical Turk with a very nice API and well-thought-out back end for job editors. I couldn’t agree more when Mark Sullivan say:

But invariably some data may be placed in the wrong fields, some fields may be left empty, and information might be incomplete or incorrect. You can’t really teach a machine to fix these types of things. You need human beings.

I have used CrowdFlower in the past to parse addresses and it helped me to save time. However, CrowdFlower is limited to specific use cases where you can break the job into very small tasks requiring little context to make the cleaning decision.

Manual cleaning is often not enough

Sometimes you, the domain expert, need to have a conversation with your data, an iterative process where you can discover and easily transformation them. RefinePro lets you have this conversation in a single interface. Excel wasn’t designed to navigate and clean large datasets and learning new programming languages would require too much ramp up time for most of us. By using RefinePro’s facets and filter you can go from the high level picture to the tiniest details in you data: spot dirty records right away and fix them on the fly.

Open Source Self Service Data Preparation Software

This mouthful of a title describes well what RefinePro is: A solution based on OpenRefine (an open source project) that lets domain expert help themselves in their data project.

When you look at the community using Refine (librarians, researchers, open data enthusiasts, data journalists, and semantic web professionals) they all have one things in common: They are not data experts or programmers, but data is taking a more and more important place in their world – and Refine helps them to derive insight from it.

More audiences can benefit from Refine’s power as data has become the fuel of professional systems – from customer relationship management system, event management platform, accounting software, reporting and business intelligence tool.

At RefinePro, we believe self-service data preparation software allowing non-programmer user to engage easily with their data is the future of data transformation and will be mainstream in the coming years. We also believe that open source model should have its voice in this new market and OpenRefine is in the best place to be this voice.

Categories

Newsletter

Never miss an update! Subscribe for OpenRefine's announcements and RefinePro's news.