Why artificial intelligence is good but only as good as the data fed into it

News Desk - 25/01/2022

By Ramprakash Ramamoorthy, director of research at ManageEngine

Artificial intelligence (AI) is already proving its potential to deliver significant benefits to businesses in the UAE and the wider economy. Organisations are increasingly investing in AI because they see its potential. As per the UAE Artificial Intelligence website, AI technologies will hold a global market value of USD 15.7 trillion in 2030, boosting the UAE’s GDP by 35 percent (USD 96 billion in GDP). Utilising AI technologies, UAE AI adds, will reduce government spending costs by up to 50 percent, or approximately USD 3 billion in savings.

However, despite the increased investment in and use of AI across industries and businesses, there are lingering concerns over the technology’s capacity to deliver on expectations. According to our 2021 Digital Readiness Survey, 86 per cent of organisations across the globe reported an increase in the use of AI and about 81 per cent said their confidence in AI solutions has grown.

The organisations surveyed voiced concerns of potential barriers in AI delivering on expectations, including the complexity of AI projects, the availability of employees possessing the required skills for AI projects, and a lack of internal expertise to develop AI. However, another major consideration is that AI models are really only as good as the data fed into them. This means that without access to clean, high-quality data, AI may fail to produce the expected results.

Starting fresh with AI

AI is incredibly powerful at turning data into insights. This power is applied to data regardless of quality or biases, which means that any unintentional biases found in the data will only be emphasised by the AI algorithms. This makes the quality of data the number one predictor of how successful an AI project will be. Data quality must be exceptional for an AI system to work even reasonably well because even with only slightly polluted data, AI may produce poor results. Getting high-quality data can be more challenging than it seems. The skills required to identify and use clean data and understand what constitutes clean data may differ depending on industry and use case.

For example, an IT team may use AI technologies to help identify network outages before they occur. If the average IT system has an uptime of 99 per cent, then it’s the data that pertains to the other one per cent of the time that is important for training purposes. So, the data set that is fed into the model must be tuned to identify that target one per cent. Getting the right data into the model can be challenging, but it’s essential.

This creates an urgent need for professionals working with AI technologies to understand how to identify clean, relevant data and know what clean data would be based on their needs.

On a fundamental level, clean data should not include any personally identifiable information or other sensitive data. This isn’t just important to protect people’s privacy; some of this data could skew the model and result in biased decisions. The type of data that can pollute an AI model can include demographic data, names, years of experience, and known anomalies.

Therefore, to help ensure data is clean and appropriate for use, data should have parameters removed that are not relevant to the final classification. For example, within hospitals and healthcare, data that includes patient identification information and certain diseases may result in a false correlation between demographic information and disease patterns. To avoid this, data scientists must make the parameters of the project clear and ensure only relevant data is included in the model.

Clean data clears the path for AI

With such significant investment in AI technologies, it’s essential that the tools being used and produced provide a great return on investment. The skills, experience, and know-how of the IT professionals and data scientists involved in AI projects can determine how successful those projects will be and, therefore, whether they’ll deliver a strong return. Being able to ensure the data used for AI models is clean and relevant is a key skill that’s required before starting an AI project.

IT professionals and data scientists must be able to not only identify and provide clean data but also understand how to identify results that have been skewed by biased data. They can then retrain the model using more appropriate data, leading to ever-improving results from AI projects.

The world sits on the cusp of significant changes and benefits from the use of AI. It’s up to the data scientists and IT professionals driving these projects to ensure that outcomes are unbiased and clear-eyed.