OpenRefine is a free, open-source tool for working with messy data. Its intuitive interface for data discovery and preparation empowers those who understand the context in which the data are generated to explore and normalize them.
RefinePro uses OpenRefine to prototype a complex data normalization project and identify future challenges before starting the project. RefinePro also uses OpenRefine for data profiling and nonrepeatable data cleaning project including one-time migration.
Clients reached out to improve a data integration project initially build with OpenRefine. OpenRefine projects can be automated and scheduled using RefinePro Platform.
RefinePro uses DocParser and its API to integrate the PDF extraction task directly into our workflow. When a file format changes, we use Docparser user interface to quickly and easily update a parser settings without the need for coding skills. For uneven PDF files, we use custom libraries to extract the information needed.
Content Grabber is a powerful tool for web scraping edited by Sequentum. Its point and click interface allows developers to write consistent, non-breaking scrapers without limited coding skills. Content Grabber is a deployed on-premise or using RefinePro infrastructure. Customers have control over data and infrastructure.
Visual Web Ripper
Visual Web Ripper (VWR) is the predecessor of Content Grabber. Sequentum stopped maintaining the software in 2018.
RefinePro helps organizations to scale existing Visual Web Ripper projects using its platform or migrate existing scrapers to another solution.
ParseHub is a hosted web scraping software. ParseHub comes with a powerful API to schedule scrapers and retrieves data. Scrapers run on ParseHub's infrastructure, which reduces the effort to maintain web scraping servers and proxy networks.
RefinePro is an official ParseHub partner. Contact us for more information
Talend Open Studio
Talend Open Studio is the industry leader in open integration solutions and democratizes application integration by providing open source solutions to address any integration challenge - from simple departmental projects to complex, heterogeneous IT environments. Talend’s open source products and open architecture create unmatched flexibility so to solve integration challenges.
RefinePro developed custom components and routines to integrate custom and cost-effective ETL job using open source components. Talend Open Source scripts can be scheduled on RefinePro Platform.
Alteryx is a windows based software for data cleaning and advanced analytics.
Alteryx projects can be scheduled on RefinePro Platform without the need to purchase Alteryx Server edition
Luminati is the world’s largest business proxy network. Thousands of corporations, including RefinePro, are using Luminati’s residential proxy network: huge online retailers - for collecting comparative pricing information, top websites - for testing their web sites from any city in the world, the largest ad networks - for ensuring the ads they deliver are safe and compliant, cyber security firms - for ensuring that sites are not malicious.
Captchas are images containing distorted text that should be entered or a set of different images where you should select only those fitting some condition. All this needs to be done to confirm that you're not a robot. 2Captcha helps businesses who need to recognize many captchas in real time.