Some thoughts of the OpenRefine and Akka Integration

akka-logoFollowing my article on enabling parallel processing for OpenRefine: Spark vs Akka, I drafted a road map to integrate OpenRefine with Akka.

Akka is a toolkit and runtime for building highly concurrent, distributed, and resilient message-driven applications on the JVM. In this arcile I will try to explore the possibility to integrate Akka to OpenRefine to enhance the data processing capability.

[…]

Enabling parallel processing for OpenRefine: Spark vs Akka

akka vs sparkAndrey from SpazioDati developed Refine on Spark in an attempt to process larger dataset which is good. However it fell short in some areas and I wanted to benchmark it with an other parallelism engine like Akka.

Spark supports the Akka in its core module and Spark and Akka can interact with each other. Akka provides the Spark template. But it makes more sense to only choose one. If we want to enable the parallel processing for OpenRefine, they have their pros and cons (IMO).  See also a proposed road-map to integrate OpenRefine with Akka.

[…]