Security experts are raising concerns about how sensitive data, even if briefly exposed, can remain accessible through AI tools like Microsoft Copilot long after being made private. A cybersecurity firm, Lasso, found that thousands of once-public GitHub repositories from major companies, including Microsoft, remain accessible due to indexing by Bing’s search engine.
Lasso discovered this when content from its own private GitHub repository appeared in Copilot’s responses, despite the repository being set to private. This raised alarms about the risk of data retention by AI models, as even a short exposure can lead to long-term data leaks.
The investigation revealed that over 20,000 private GitHub repositories, belonging to more than 16,000 organizations, were still accessible through Copilot. Affected companies include Google, IBM, PayPal, Tencent, and Microsoft, among others. Some of these repositories contained confidential data, access keys, and intellectual property, making them a serious cybersecurity risk.
Microsoft was notified in November 2024 but classified the issue as low severity. While Bing’s cache links were removed in December 2024, Lasso found that Copilot could still retrieve the data, suggesting that the issue remains unresolved.
Companies are now being urged to rotate or revoke exposed access keys and take measures to protect sensitive data from AI-powered indexing. This incident highlights the growing risks of AI-driven data retention, reinforcing the need for better security measures in cloud and AI integrations.