October 2, 2015

OpenRefine

Why OpenRefine?

The recent proliferation of data has become the centre point for analytic, system migration, and reconciliation processes. However, data cleaning and preparation (which includes data normalization, duplicate removal, pivoting, joining, and splitting data) is still a major hurdle in the process. Tools available for end users haven’t fully caught up to the demand. Spreadsheets offer entry level interface to the data but are time consuming and don’t scale, while programming languages offer flexibility but have a steep learning curve for the non technical person.

OpenRefine addresses the growing data literacy gap by lowering the technical skills needed to normalize and prepare data. OpenRefine empowers those who understand the context in which the data are generated or used by offering them the best of both worlds with an iterative interface for data discovery and preparation and an easy-to-learn scripting language.

At RefinePro, we believe OpenRefine can be the WordPress for the data processing world. WordPress has democratized website building over the past 15 years, moving the web industry from web developers crafting pages to website assembler and editors, writing content and building a website, all achieved by extending a core base with plugins and extensions in a What You See Is What You Get interface. OpenRefine is the platform that will power this shift in the data industry by moving from developers crafting custom code for data cleaning and processing to builders of the data process using connectors in a point and click interface.

Thanks to OpenRefine, subject matter experts with an in-depth knowledge of a specific issue or domain can:

  • explore data related to the topic; drill down to have a sense of the information available, find nuggets of information or inconsistencies and quality gaps;
  • clean and export the data to a format useful for their needs by doing data normalization, removing duplicates and typos, pivoting, joining and splitting columns; and
  • enrich the project by joining data sets together, processing data via an API, or working with a reconciliation service.

 

Project Website          Github Project