- More than four-in-10 (43%) respondents in a new study said they are already using data streaming for training or running AI.
- Respondents said that integration with AI/ML is now the most important data streaming system capability to address, ahead of ‘connectors to mainstream applications’, and ‘security and governance’.
- When using streaming data for AI, teams encounter quality issues including inconsistent data formats, duplicate events causing skewed AI outputs, and missing or incomplete data.
Large organizations are experiencing a wide range of pain points when using data streaming to train or run artificial intelligence (AI) systems. This is one finding from a new research report by Conduktor, the intelligent data hub for streaming data and AI.
The study with 200 senior IT and data executives at large companies with an annual revenue of $50 million or more, found more than four-in-10 (43%) respondents said they are already using data streaming for training or running AI, compared to 83% that are using it for automating workflows and 51% who said they were using it to support making real-time decisions.
Respondents said that integration with AI/ML is now the most important data streaming system capability to address, ahead of ‘connectors to mainstream applications’, and ‘security and governance’.
The top three quality issues reported by respondents surveyed for the research were: inconsistent data formats or schemas, duplicate events causing skewed AI outputs, and missing or incomplete data.
Further problems are introduced when organizations attempt to scale up data infrastructure to support deeper AI initiatives. The most important of these challenges are data privacy and security concerns, cited by 72% of respondents; high infrastructure costs (59%); and lack of real-time processing capabilities (58%).
Respondents reported using a range of AI tools for data preparation, training and inference. The top four were Google Cloud Vertex AI, Microsoft Azure Machine Learning, Amazon SageMaker Data Wrangler, and AWS Glue DataBrew.
Confidence in the use of data streaming to support AI is high
Despite the pain points highlighted in the research, respondents are largely positive about their use of data streaming to support AI initiatives.
The majority of respondents (80%) rated the current level of integration of their data streaming platforms with their AI and ML platforms as ‘good’, while 9% said it was ‘excellent’ and 11% said it was ‘average’.
Nicolas Orban, CEO of Conduktor, said: “AI workloads are now core business drivers and require fresh, contextualized, high-quality data — provided by streaming data platforms.
“However, although 80% of respondents believed their organizations properly integrated data streaming with AI and ML, problems remain: 58% of respondents still struggle with data quality issues, while 72% felt that data privacy and security concerns (such as compliance with PII and PHI laws) were their biggest challenges to scaling data infrastructure for AI.
“Fragmented data creates chaos, including missed signals, duplicated work, low trust, and poor decisions. With Conduktor, organizations can unify streaming data into one platform for full visibility and control, significantly improving the productivity of IT teams.”
According to Dataintelo, the global market size for streaming data processing system software was valued at approximately USD 9.5 billion in 2023 and is projected to reach around USD 23.8 billion by 2032, reflecting a compound annual growth rate (CAGR) of 10.8% over the forecast period.
Dataintelo says that: “The surge in the need for real-time data processing capabilities, driven by the exponential growth of data from various sources such as social media, IoT devices, and enterprise data systems, is a significant growth factor for this market.”
Learn more about Conduktor’s streaming data hub here: https://conduktor.io/















