In September 2015, we submitted the following application to the Knight News Challenge: How might we make data work for individuals and communities? We are cross posting it here for archive. You can also consult it directly on the Knight Challenge website.
Learn the basics of data science with the new OpenRefine Foundation course available on Big Data university. The OpenRefine Foundation course is a progressive program that provides structure and direction for students new to data cleaning and preparation. Each lesson comes with a comprehensive overview of its goals and content, five video tutorials, and Read more about Online OpenRefine Foundation Course Now Available[…]
Stefan Urbanek when laying the foundation for the school of data program at the Open Knowledge presented the following Data Processing Pipeline going from: data discovery and acquisition, to data extraction, cleansing, transformation and integration, before enabling analytical modeling, and presentation, analysis and publishing. For anyone working with data on a regular basis, Read more about Agile Data Process[…]
Following my article on enabling parallel processing for OpenRefine: Spark vs Akka, I drafted a road map to integrate OpenRefine with Akka.
Akka is a toolkit and runtime for building highly concurrent, distributed, and resilient message-driven applications on the JVM. In this arcile I will try to explore the possibility to integrate Akka to OpenRefine to enhance the data processing capability.
Andrey from SpazioDati developed Refine on Spark in an attempt to process larger dataset which is good. However it fell short in some areas and I wanted to benchmark it with an other parallelism engine like Akka.
Spark supports the Akka in its core module and Spark and Akka can interact with each other. Akka provides the Spark template. But it makes more sense to only choose one. If we want to enable the parallel processing for OpenRefine, they have their pros and cons (IMO). See also a proposed road-map to integrate OpenRefine with Akka.
Over the last five years, OpenRefine has built a robust platform, to which many developers have contributed plugins and extensions useful for their own audiences. That list of plugins and reconciliation services grows month after month, demonstrating that the community is active and thriving, with a healthy and expanding user base.
The extensions around the OpenRefine core can be divided in three categories:
One of the big news in the industry this month was CrowdFlower raising $12.5 million in funding to support its growth. CrowdFlower is like a souped up Amazon Mechanical Turk with a very nice API and well-thought-out back end for job editors. I couldn’t agree more when Mark Sullivan say: