OpenRefine

The recent proliferation of data has become the center point for analytic, system migration, and reconciliation processes. However, data cleaning and preparation (which includes data normalization, duplicate removal, pivoting, joining, and splitting data) is still a major hurdle in the process. Tools available for end users haven’t fully caught up to the demand. Spreadsheets offer entry level interface to the data but are time consuming and don’t scale, while programming languages offer flexibility but have a steep learning curve for the non technical person.

OpenRefine addresses the growing data literacy gap by lowering the technical skills needed to normalize and prepare data. OpenRefine empowers those who understand the context in which the data are generated or used by offering them the best of both worlds with an iterative interface for data discovery and preparation and an easy-to-learn scripting language.

Thanks to OpenRefine, subject matter experts with an in-depth knowledge of a specific issue or domain can:

  • explore data related to the topic; drill down to have a sense of the information available, find nuggets of information or inconsistencies and quality gaps;
  • clean and export the data to a format useful for their needs by doing data normalization, removing duplicates and typos, pivoting, joining and splitting columns; and
  • enrich the project by joining data sets together, processing data via an API, or working with a reconciliation service.

OpenRefine is a powerful point and clicks web interface for data normalization and preparation. RefinePro offers training, hosting and integration services.

 

Project Website          Github Project

 

OpenRefine Features


Extensive Input & Output Support

Start your project from Excel, CSV, JSON, XLM or any text-based file, with any encoding. Once your project is completed, Refine lets you customize your file export, encoding, and line breaks — or create hierarcical RDF, JSON or XML templates.

Cluster & Deduplication

Identify and remove duplicates in a breeze thanks to a set of powerful auto-suggestion algorithms. Refine’s embedded search and replace function streamlines data normalization.

Filter & Sort

Drill down through large data sets and see your data from a new perspective. Use Refine to build custom facets to list all values in a field, create numeric and timeline facets, or filter blank rows or duplicates with a single click.

 

Join, Merge & Reconcile

Join and merge different data sets to build new views and insights. Create lookups and reconcile against master data sources. Easily concatenate multiple fields together.

Split

Split a field into multiple columns based on any character(s). Create new rows from multi-value cells.

Undo / Redo

Do your data cleaning in a safe environment where you can undo any changes. Review and audit your transformation history. Once your project is completed you can save your steps to reapply them next time.

 


 

Transpose

Pivot columns into rows or transpose rows into columns with just a few clicks.

Custom Query Language

The General Refine Query Language (GREL) is flexible, powerful, and yet simple enough to create custom filters and transformation expressions. Preview your changes in real time before committing your changes.

Fetch Web Pages & work with APIs

Call web services and fetch web pages from within Refine. Extend your dataset with your favorite API or machine intelligence services with limited coding knowledge.