Trust in data quality and integrity is key to the success of any data initiative. Ranging from records addition and edition to data integration, any operations on a dataset present risks to corrupt data. Data profiling, cleaning and validation processes are the three pillars to build confidence in data.
RefinePro guides organizations through the entire data quality process.
Without well-defined goals, data cleaning can be an endless task. Data quality is a subjective topic as expectation varies from one business to another. For example, within the same organization, different departments will have competing definitions of a clean address. Data profiling is the opportunity to understand business needs, assess the current quality of the data and identify any gaps. Based on that information, RefinePro proposes a data quality strategy and implementation plan based on the client’s budget.
Before starting the actual cleansing, RefinePro recommends automating the data quality enforcement with a script. It allows all parties to agree on a common standard before beginning the project. It also ensures that developers clean the data accordingly as they can continuously check their result against defined rules. RefinePro identified fours type of data validation rules:
Schema and format compliance
Validation against custom business rules
Validation against master dataset
Data Sampling with acceptance threshold
RefinePro has extensive knowledge working with dirty data and can perform the following normalization steps:
|Field Mapping||Pivot, transform, split or merge fields from multiple sources to match different application and file schema.|
|Field Conversion||Standardize and change the field type and format between date, number, text or choice list|
|Duplicate Removal||RefinePro leverages multiples techniques including clustering and entity resolution to detect and merge duplicate records.|
|Missing Data||Missing data may be derived based on other values, retrieved from a third party source or filtered out.|
Ongoing Validation and Monitoring
RefinePro suggests implementing data validation steps early in the data flow to isolate bad records before they propagate to other systems. Using rules defined previously, the data validation script raises alerts when erroneous records are identified or when the error rate goes over a threshold. For more sensitive workflow, RefinePro implement circuit breaker to stop the process and prevent poor data from corrupting downstream systems. RefinePro provides managed services to monitors alerts and updates the data cleaning scripts accordingly.
Data Cleansing Toolbox
RefinePro adapts the tool to the project type and complexity.