In our effort to engage with local OpenRefine community and foster peer-to-peer connections, the latest Toronto OpenRefine Meet-Up (on November 17) focused on actual use cases, presented by OpenRefine user from various backgrounds. The goal to demonstrate how OpenRefine plays in a specific contexts and data environment, rather than drilling down on specific functionality. This article provides a summary of the night and highlight of each talk.
Martin Magdinier RefinePro CEO
Martin Magdinier, RefinePro CEO, opened the night by providing a high level overview of how the goal for most users is to go from understanding the data they process, which is a business skill, to honing their technical skills so they can transform data into a useful format.
Data cleaning is a necessary step to make information reliable. It is estimated that we spent between 60 to 80% of a data project on cleaning and transforming the data because they come in all shapes and forms, often bringing inconsistencies like spelling errors in a person’s name or city, general typos, encoding errors etc.
Martin talked about how users explore, normalize and enrich their data with OpenRefine. When the data that is collected contains errors, OpenRefine helps to clean it up, so that the information can be used in the project. In order for this to happen, users need a both a good understanding of the actual data and the context in which is is being used, as well as an understanding of OpenRefine.
As information is compiled there are a number of questions to be answered:
- what format should it follow?
- what are the related factors?
- what fields are most critical?
These are important questions as they will define what is considered as “clean data”.
Once the data is cleaned and the output is relevant, users might want to enhance it with Data Services like address geocoding or sentiment analysis. Most of these services are accessed through an application program interface (API). The API-first approach requires technical skills to take full advantage of it. Programs such as OpenRefine help the user connect with the APIs and understand the data better so that smarter business decisions can be made and faster. This is one of the key benefits that OpenRefine has over Excel: it can directly (natively) connect to services and keep pace with the volume of requests.
There are a number of services out there that can help both technical and business users to scale OpenRefine. RefinePro provides services such as Training, Hosting and Integration so that the best results and data is gained.
Bianca Wylie from ODI
Bianca Wylie has a great deal of knowledge to share and is very passionate about open data within the tech community. Open data is seeing more and more data being made available by organizations, business, and individuals for anyone to access, use, and share. Her presentation provided insight into how OpenRefine supports the use of open data in Toronto.
Everyday curious users may have a general idea about an available block of data, but need to be able to dive deeper to really understand it. There are a number of ways of going about this and the City of Toronto is slowly embracing the change.
In her presentation she talked about the fact that it is very often the government employees who are actually the ones requesting access to more Open Data so that they can do their jobs more effectively — and she showed this by analysing the data on the Freedom of Information Requests.
Her presentation also discussed the need for Open Data being transparent. Anyone who currently requests this information has to go through a lengthy and challenging process. She also discussed the need for improved quality on the quality of the data that is given in response to the requests. She then demonstrated how OpenRefine makes working with the questionable output much easier and helps the person using it understand the content better, with basic features such as using facets and text filters.
Heerbod Etemadi – Data Consultant
Heerbod Etemadi‘s presentation was about the various projects he had worked on around Toronto. As a consultant he uses OpenRefine regularly to assist him in understanding huge data sets, that excel can’t handle. He gave a lot of details into how OpenRefine was able to make work easier for him and his projects. As a consultant he got a number of contracts from the government and a lot of information with OpenRefine it was much easier to handle, and present.
One example he gave was in regards to a piece of code he had written to go along with the data sets he had processed through OpenRefine that would assist the company in saving money and where they can cut cost using certain algorithms. Due to the large amount of data they had on their hands to work with, OpenRefine really worked well by making it much easier to analyze.
Once the information collected had been run through OpenRefine, it could be easily interpreted. For him, OpenRefine’s “killer feature” is the ability to clean data sets quickly and get the best results. The way information was presented was much easier to work with and understand.
Overall this was a very successful meet up with many different people coming together to share ideas. The conversations about OpenRefine and how it is being used on a day to day basis with real world example were great.