Microsoft's Generative AI Shows Inaccurate Responses, Leaked Audio Reveals

A leaked audio of an internal presentation has shed light on Microsoft’s generative AI tool, Security Copilot, and its struggle with providing accurate answers. The presentation discussed the results of “threat hunter” tests, where the AI analyzed a Windows security log for potential malicious activity. According to a Microsoft researcher, the tool would frequently “hallucinate” incorrect responses, making it difficult to obtain reliable information. To showcase the tool’s capabilities, Microsoft had to cherry-pick examples that appeared to be accurate, as the AI generated different answers for the same question due to its stochastic nature.

Security Copilot functions similarly to a chatbot, providing responses in the style of a customer service representative. It relies on OpenAI’s GPT-4 large language model, which also powers Microsoft’s other generative AI applications like the Bing Search assistant. The leaked audio suggests that Microsoft had early access to GPT-4, and the demonstrations were initial explorations of its potential.

However, the researchers revealed that the AI frequently produced incorrect responses during its early iterations. The phenomenon of hallucination, where the AI generates responses unrelated to the query, was a major challenge. Microsoft attempted to address this problem by grounding the AI with real data, but for Security Copilot, the LLM (large language model) used, GPT-4, was not specifically trained on cybersecurity data. Instead, it relied on its large, general dataset.

It is unclear whether Microsoft presented these cherry-picked examples to the government and potential customers or if the company was transparent about the selection process. Microsoft stated that the technology discussed in the meeting was predated Security Copilot and was tested on simulations created from public datasets, with no customer data used.

This leak raises questions about the reliability and accuracy of generative AI tools, especially in critical domains like cybersecurity. Further research and development efforts are necessary to eliminate hallucinations and improve the performance of these AI systems.

The source of the article is from the blog qhubo.com.ni