Online OpenRefine Foundation Course Now Available

  Learn the basics of data science with the new OpenRefine Foundation course available on Big Data university. The OpenRefine Foundation course is a progressive program that provides structure and direction for students new to data cleaning and preparation. Each lesson comes with a comprehensive overview of its goals and content, five video tutorials, and Read more about Online OpenRefine Foundation Course Now Available[…]

Agile Data Process

Stefan Urbanek when laying the foundation for the school of data program at the Open Knowledge presented the following Data Processing Pipeline going from: data discovery and acquisition, to data extraction, cleansing, transformation and integration, before enabling analytical modeling, and presentation, analysis and publishing.     For anyone working with data on a regular basis, Read more about Agile Data Process[…]

Some thoughts of the OpenRefine and Akka Integration

akka-logoFollowing my article on enabling parallel processing for OpenRefine: Spark vs Akka, I drafted a road map to integrate OpenRefine with Akka.

Akka is a toolkit and runtime for building highly concurrent, distributed, and resilient message-driven applications on the JVM. In this arcile I will try to explore the possibility to integrate Akka to OpenRefine to enhance the data processing capability.


Enabling parallel processing for OpenRefine: Spark vs Akka

akka vs sparkAndrey from SpazioDati developed Refine on Spark in an attempt to process larger dataset which is good. However it fell short in some areas and I wanted to benchmark it with an other parallelism engine like Akka.

Spark supports the Akka in its core module and Spark and Akka can interact with each other. Akka provides the Spark template. But it makes more sense to only choose one. If we want to enable the parallel processing for OpenRefine, they have their pros and cons (IMO).  See also a proposed road-map to integrate OpenRefine with Akka.


A Vision for OpenRefine

Over the last five years, OpenRefine has built a robust platform, to which many developers have contributed plugins and extensions useful for their own audiences. That list of plugins and reconciliation services grows month after month, demonstrating that the community is active and thriving, with a healthy and expanding user base.

The extensions around the OpenRefine core can be divided in three categories: