Cloudflare Upgrades Workers AI with Faster Inference and Global GPUs

News Desk - 27/09/2024

Cloudflare, Inc. (NYSE: NET), the connectivity cloud company, has announced groundbreaking advancements to its Workers AI platform. These enhancements are set to elevate the development of AI applications by providing faster inference, support for larger models, and improved performance analytics. Workers AI remains the premier platform for building global AI applications, offering seamless AI inference capabilities irrespective of user location.

As large language models (LLMs) evolve to become more compact and efficient, network speed emerges as a critical factor for AI adoption and interaction. Cloudflare’s expansive global network addresses this challenge by reducing latency through its distributed resources. With GPUs deployed in over 180 cities worldwide, Workers AI delivers unmatched global accessibility and low latency, making it one of the most widespread AI platforms available. This infrastructure ensures that AI inference occurs close to end users, enhancing performance and safeguarding customer data.

Matthew Prince, Cloudflare’s co-founder and CEO, emphasized the significance of network performance in the AI landscape: “As AI becomes integral to our daily lives, the importance of network speeds and latency cannot be overstated. With AI workloads transitioning from training to inference, regional performance and availability will be crucial. Cloudflare’s global AI platform, supported by GPUs in cities worldwide, is set to transition AI from a novel concept to an everyday utility, much like faster internet did for smartphones.”

Key Enhancements to Cloudflare Workers AI:

– Upgraded Performance and Larger Model Support: Cloudflare’s global network now features advanced GPUs to boost AI inference performance. This upgrade enables the handling of larger models, such as Llama 3.1 70B and upcoming Llama 3.2 models with sizes ranging from 1B to 90B. The improved performance and larger context windows support more complex tasks and deliver a smoother user experience.

– Enhanced Monitoring with Persistent Logs: New persistent logs in AI Gateway, available in open beta, provide developers with the ability to store and analyze users’ prompts and model responses over extended periods. This feature offers detailed insights into application performance, including request costs and durations, aiding in application refinement. Since its launch, AI Gateway has processed over two billion requests.

– Faster and More Cost-Effective Queries: Cloudflare’s vector database, Vectorize, is now generally available and has been upgraded to support indexes of up to five million vectors, up from 200,000. This enhancement has reduced median query latency to 31 milliseconds from 549 milliseconds, enabling quicker information retrieval and more affordable AI applications.

These updates position Cloudflare’s Workers AI as the most advanced and accessible platform for global AI application development, setting new standards for performance and efficiency in the AI industry.