Data Quality | Data Cleaning Using Power Query | Effect of Bad Data
Technology / March, 29 2022

Effect of Bad Data and Data Cleaning using Power Query

Information from data is the backbone of any business that needs to utilize truth-based independent decisions and influence the force of data to change the business towards growth and profitability. Data quality plays a very vital role in making the right decision. The most significant obstruction for any organization in making the correct decision is the accessibility and nature. Typically, data is stored within a company either in a data warehouse or in the cloud. Each business process that the company works on produces a lot of data, and based on these data insights; specific requirements are met. Ideally, a lot of possibilities are there when the underlying dataset can create insights, but most of the system-generated data are not clean, which makes the process of creating a competitive advantage for businesses very intricate. 

Also Read: Microsoft Build 2 2022 Conference Highlights

Data Cleaning

Data cleaning is viewed as the most troublesome, tedious, and costly assignment in the BI world. There have been estimations by specialists that 60-80% of the total time is utilized for data cleaning in a typical analytics project. These projects include incomplete datasets and are derived from complex systems; their business has underlying structural inconsistency and sometimes a skillset barrier. In a Harvard Business Review article, Thomas Redman estimated that insufficient data costs US companies around $3 trillion per year. 

(Fact Source: https://www.sigmamagic.com/blogs/data-cleaning-best-practices/

Fundamental Problems in Data

The following types of problems exist when referring to bad data (inconsistent, does not meet the primary constraints, unit of measure of values is wrong): 

Missing Data: This is a problem when the insights that must be produced are not properly in line with the system that is generating the data. For example, if the system is legacy and it produces a form for capturing data, there supposedly would be some fields that are not required, but later, if insights have to be created on those business functions, then there is a problem.  

Insufficient Data: If the system that captures the data is not designed to collect information about the analysis that has to be done, maybe because the data was collected for some other reason. It is always bad to make a business decision based on partial data/information, which in turn will have a negative impact. 

Bad Structure: This is any data with duplicate records, outdated information, and bad quality, which may include the dataset not being BI friendly. For example, the system created a dataset with wrong data types for some features and samples that are repeating or even shifted horizontally or vertically, which will ruin the analysis overall.

Incorrect Data: This is one of the most common issues in the dataset and an essential piece of the cleaning. These types of errors are hard to identify as the error could be a definitive mistake created manually or any errors which are caused due to data generated from a system or from an instrument. Overall decides the accuracy of the analysis that is derived from the data.

Effect of Bad Data and Data Cleaning using Power Query

Here, the above figure is an example of bad data. The above data is informative; reading the information from this figure is very easy. But the focus is not on the data being very comprehensible; rather, it must be BI friendly. Some basic data cleaning steps that need to be applied here are removing the top few rows, as those rows do not include any analytics sample.

We need to separate columns based on category and subcategory, and all the other column with blank or no information needs to be removed. 

Power Query as a Data Cleaning Tool

Power Query is a powerful data cleaning, connection, shaping technology that is a core part of the Microsoft Analytics Business Intelligence tools. With power query, data cleaning is transformed and automated, which gives time for analysis and provides solutions for business impact. Analytics is correct with accurate data, so Power Query could be used to extract data from various sources, transform it to suit the needs/business function, and load it to the required destination. 

To give you an example of Power Query and its capability, I will show some applied steps that have been done on the above figure to improve data quality for reporting and insights. 

Power Query as a Data Cleaning Tool

The first few steps are basic connections, setting up the source location, defining the headers for the dataset. The current column that exists now is given a proper name based on the features that they possess. Apart from all these basic initial steps, the most important beginning of the cleaning process is defining the data types properly. The Category Column and Subcategory Column are the steps where basic logic is applied to identify the category rows and subcategory rows. The logic here is to create differentiation among all the rows, which could be based on any criteria that match your definition. In this scenario, these two steps were crucial in the transformation as creating a dynamic logic was very difficult.

If you see the dataset in excel, you will know that the differentiation could be the fact that the indentation is provided different to different rows based on the category and subcategory, but when we import the dataset to Power BI, the indentation and the way dataset is defined in Excel changes even if it possesses the same information but the structure overall changes. So, this is a perfect example where we can say that Power Query is necessary as we have to do all the cleaning and transformation once we are already on the analytics tool. The next steps include row filtering, which is to clean the redundant information. Filtering is an important step in the transformation as it directly removes the unwanted samples from the dataset. The last and one of the most important steps was doing the Unpivot function. The better-defined columns as samples should not be kept as features. This way, the size of the dataset keeps on increasing, but the data format is very friendly for BI and developing different solutions for business analytics. 

All the above steps that have been applied to the raw data are always remembered. Any step could be changed as well, depending on choices. So, what it does is it simplifies the whole process of data cleaning and automates the process for further cleaning on any new day with the same sets of raw data.  

Best Practices for Data Cleaning

  1. It is always necessary to keep track of the data types defined according to the business function. Ensuring that the data types are used and stored in the source as much as possible is essential. 
  2. Removing all the duplicate rows before starting any analytics project is the key to creating an accurate insight that will further ensure a positive business impact. 
  3. Thinking about the data in the most holistic way possible, both the developer and the person who is deriving results from the insights should be kept in mind. 
Get in Touch

Newsletter

Stay up-to-date with our latest news, updates, and promotions by subscribing to our newsletter.

Copyright © 2008-2023 Saxon. All rights reserved | Privacy Policy

Address: 1320 Greenway Drive Suite # 660, Irving, TX 75038

Archana Aila

Archana Aila

Position Here

With 2 years of hands-on experience in Power Platform, I’ve excelled in developing and implementing solutions for businesses, harnessing the power of Power Apps, Power Automate, Power BI, and Power Virtual Agents to streamline processes and enhance productivity. My proficiency extends to crafting custom applications, automating workflows, generating data insights, and creating chatbots to aid operational efficiency and data-driven decision-making.

With an intermediate knowledge in Azure cognitive services, incorporating them into Power Platform use cases to innovate and solve complex challenges. My expertise in client engagement and requirements gathering, coupled with effective team coordination, ensures on-time, high-quality project deliveries. These efforts have yielded significant accomplishments, solidifying my role as a valuable asset in this field.

Akash Jakkidi

Akash Jakkidi

Position Here

I am committed to resolving complicated business difficulties into simplified, user-friendly solutions, and I have extensive experience in Power Apps development. I thrive in integrating cutting-edge technology to optimise process efficiency, leveraging intermediate knowledge in Azure, Cognitive Services, and Power BI. My interest is developing dynamic apps within the Power Apps ecosystem to help organisations achieve operational excellence and data-driven insights.

As a tech enthusiast, my passion for innovation leads me to constantly explore new ideas and push the frontiers of what is possible, assuring significant contributions to our technological world.

Palak Intodia

Palak Intodia

Position Here

I am a tech graduate with a strong passion for technology and innovation. With three years of experience in the IT industry, I’ve been on a continuous journey of professional growth and skill development. My expertise lies in Power Apps and Automate, where I’ve had the privilege of contributing to multiple successful projects.

I’m dedicated to delivering results that not only meet expectations but also drive the success of the projects I’m involved in. I’m committed to my ongoing professional development and the pursuit of excellence.

Roshan

Roshan Jaiswal

Position Here

With nearly 2 years of dedicated experience in Power Platform technology, my expertise lies in crafting customized business solutions using Power Apps and Power Automate. I excel in identifying intricate business requirements and translating them into innovative, user-friendly applications. My daily tasks involve meticulously deploying applications across diverse environments and harnessing the full potential of the Microsoft ecosystem within business applications.

I have proven my adaptability by consistently meeting the demands of creating responsive and scalable applications. Also seamlessly integrating complex workflows and data sources, ultimately enhancing operational efficiency and driving sustainable business growth.

Sugandha

Sugandha Chawla

Position Here

Sugandha is a seasoned technocrat and a full stack developer, manager, and lead. Having 8 years of industry experience, she has been able to build excellent working relationships with all her customers, successfully establishing repeat business, from almost all of them. She has worked with renowned giants like Infosys, Ernst & Young, Mindtree and Tech Mahindra.

She has very diverse and enriching work experience, having worked extensively on Microsoft Power Platform, .NET, Angular, Azure, Office 365, SQL. Her distinctiveness lies in the profound domain knowledge, managerial skills, and process mastery, that she additionally holds, as a result of possessing a customer facing role, working with different sectors, and managing and driving numerous critical executions, single-handedly, end to end.

Vibhuti Dandhich

Vibhuti Dadhich

Position Here

Vibhuti, a Power Platform technology evangelist, has passionately embraced the transformative potential of low-code development. With a background that includes experience at EY and Wipro, she’s been a trusted advisor for clients seeking innovative solutions. Her expertise in unraveling complex business challenges and crafting tailored solutions has propelled organizations to new heights.

Vibhuti’s commitment to staying at the forefront of technological advancements and her forward-thinking approach have solidified her as an industry thought leader. Her mission is to empower businesses to thrive in the digital age, revolutionizing operations through the Power Platform.

Ruturaj Kulkarni

Ruturaj Kulkarni

Position Here

With 8 years of dedicated expertise in the IT realm, I am a seasoned professional specializing in .NET technologies and Microsoft Azure Cloud. My journey encompasses a profound understanding of software development using the .NET framework and a robust command over Azure’s cloud ecosystem. Throughout my career, I’ve demonstrated a knack for crafting scalable and efficient solutions, leveraging the power of cloud computing.

My passion lies in staying at the forefront of technological advancements, ensuring that my skills align seamlessly with the dynamic landscape of IT. Ready to tackle challenges and drive innovation, I bring a wealth of experience to any project or team.

Sija Kuttan

Sija Kuttan

Vice President - Sales

Sija.V. K is a distinguished sales leader with a remarkable journey that spans over 15 years across diverse industries. Her expertise is a fusion of capital expenditure (CAPEX) machinery sales and the intricacies of cybersecurity.

Currently serving as the Vice President of Sales at Saxon AI, Sija adeptly navigates market dynamics, client acquisition, and channel management. Her distinguished track record of nurturing strong relationships, leading diverse teams, and driving growth underscores her as an adaptable and seasoned sales professional.

Gopi Kandukuri

Gopi Kandukuri

Chief Executive Officer

Gopi is the President and CEO of Saxon Inc since its inception and is responsible for the overall leadership, strategy, and management of the Company. As a true visionary, Gopi is quick to spot the next-generation technology trends and navigate the organization to build centers of excellence.

As a digital leader responsible for driving company growth and ROI, he believes in a business strategy built upon continuous innovation, investment in core capabilities, and a unique partner ecosystem. Gopi has served as founding member and 2018 President of ITServe, a non-profit organization of all mid-sized IT Services organization in US.

Vineesha Karri

Vineesha Karri

Associate Director - Marketing

Meet Vineesha Karri, the driving force behind our marketing endeavors. With over 12+ years of experience and a robust background in the B2B landscape across the US, EMEA, and APAC regions, she is pivotal in setting up high-performance marketing teams that drive business growth through a transformation based on new-age marketing practice.

Beyond her extensive experience driving business success across Digital, Data, AI, and Automation technologies, Vineesha’s diverse skill set shines as she collaborates with varied stakeholders across hierarchies, cultivating a harmonious and results-driven workspace.

Sridevi Edupuganti

Sridevi Edupuganti

Vice President – Cloud Solutions

Sridevi Edupuganti is an innovative leader known for strategically enhancing business opportunities through technology planning, orchestrating roadmaps, and guiding technology architecture choices. With a rich career spanning over two decades as a Senior Business and Technology Executive, she has driven teams to empower customers for digital transformation.

Her leadership fosters democratized digital experiences across enterprises. She has successfully expanded service portfolios globally, including major roles at Microsoft, NTT Data, Tech Mahindra. Proficient in diverse database technologies and Cloud platforms (AWS, Azure), she excels in operational excellence. Beyond her professional achievements, Sridevi also serves as a Health & Wellness coach, impacting IT professionals positively through engaging sessions.

Joel Jolly

Joel Jolly

Vice President – Technology

Joel has over 18 years of diverse global experience and multiple leadership assignments across Big 4 consulting, IT services and product engineering. He has distinguished himself by providing strategic vision and leadership for solving common industry problems on cutting-edge technologies.

As a leader surfacing and operationalizing next-generation ideas, he was responsible for exploring new technology directions, articulating a long-term technical vision, developing effective engineering processes, partnering with key stakeholders to build a strong internal and external brand and recruiting, mentoring, and growing great talent.

Haricharan Mylaraiah

Haricharan Mylaraiah

Senior Vice President - Strategy, Offerings & Sales Enablement

Hari is a Digital Marketer and Digital transformation specialist. He is adept at cultivating strong executive and customer relationships, utilizing data across all interactions (customers, employees, services, products) to lead cross-functionally as a strategic thought partner to install discipline, process, and methodology into a scalable company-wide customer-centric model.

He has 18+ years experience in Customer Acquisition, Product Strategy, Sales & Pre-Sales Management, Customer Success, Operations Management He is a Mechanical Engineering Graduate with MBA in International Business and Information Technology.