Proxis - The cheapest API for serving Llama models.
blog2

Cutting Compute Costs in Half: How Proxis is Disrupting the AI Market

Proxis is a San Francisco-based start-up founded in 2024 with the mission to make serving Llama models more affordable and efficient for developers. The company, co-founded by Liam Collins and Jackson Stokes, brings innovative technology optimizations to the world of AI, specifically targeting the cost and performance challenges associated with running large-scale Llama models.

The idea behind Proxis originated from the founders' frustration with the current offerings in the market. On existing cloud platforms, developers must choose between different Llama models, such as Llama 8B, 70B, or 405B. Each model comes with its own set of trade-offs between cost and performance, often forcing developers to compromise either on the quality of their applications or on the cost of computation. Proxis aims to solve this problem by providing a more flexible, cost-effective solution.

How Does Proxis Optimize Costs for Llama Models?

Proxis leverages advanced hardware-specific kernel-level optimizations and cascaded serving techniques to reduce the cost of AI inference. This approach enables the dynamic allocation of computational resources based on the complexity of the query, allowing the most appropriate model architecture to be used at any given time.

By optimizing the underlying technology, Proxis claims to cut inference costs by up to 50% while maintaining the quality of Llama 405B models on complex queries. This cost reduction is achieved with a single serverless API endpoint, requiring just one line of code change for developers. This ease of integration allows developers to quickly start saving on compute costs without the need for extensive modifications to their existing systems.

What is Cascaded Serving and Why is it Important?

Cascaded serving is a key innovation introduced by Proxis. Unlike traditional methods where developers have to decide upfront which Llama model to use, cascaded serving allows Proxis to triage queries in real-time based on their complexity. This means simpler queries can be processed using less resource-intensive models, while more complex queries are routed to more powerful models.

This dynamic approach not only optimizes computational resources but also significantly lowers costs by avoiding the overuse of expensive models when they are not necessary. The ability to interchange models per query rather than being locked into a single model choice represents a major shift in how developers can approach AI inference.

Who are the Founders of Proxis?

The driving force behind Proxis is its two co-founders, Liam Collins and Jackson Stokes.

Liam Collins, the CEO, has a diverse background that combines technical expertise and business acumen. Before founding Proxis, Liam worked as a software engineer at a climate tech startup, where he specialized in building zero-to-one systems. His experience also includes a stint as an investment banker focusing on renewable energy. Liam holds a dual degree from Columbia University and the City University of Hong Kong and left his MBA at Wharton to fully commit to building Proxis.

Jackson Stokes, the CTO, brings a wealth of experience in machine learning and optimization. At Google, Jackson worked on kernel-level optimizations for the Gemini model and developed framework-level optimizations for video models at Google Research. His expertise in optimizing ML models to run more efficiently is central to Proxis's ability to deliver on its promise of cost-effective AI inference.

How Does Proxis Compare to Other Cloud Providers?

Proxis sets itself apart from other cloud providers by focusing specifically on cost-efficiency and ease of use for Llama models. While other providers require developers to make a trade-off between cost and performance, Proxis's unique cascaded serving approach and hardware optimizations allow for a more flexible and affordable solution.

Furthermore, Proxis's API is designed to be easy to integrate, with a single line of code change needed to start saving on compute costs. This simplicity, combined with significant cost savings, makes Proxis an attractive option for developers looking to optimize their AI workloads.

What is the Market Opportunity for Proxis?

The market for Llama models is growing rapidly, with downloads increasing tenfold over the past year, reaching approximately 350 million downloads to date. In the last month alone, there were 20 million downloads, underscoring Llama's position as the leading open-source model family.

The usage of Llama models by token volume across major cloud service providers has more than doubled in just three months from May through July 2024, following the release of Llama 3.1. This growth trajectory indicates a significant demand for cost-effective solutions like Proxis that can handle the increasing scale and complexity of AI workloads.

What Does the Future Hold for Proxis?

Proxis is currently focused on ensuring the reliability and uptime of its API while continuing to refine its technology. The company is also actively building a waitlist at proxis.ai, signaling its readiness to scale and meet the growing demand for its services.

Looking ahead, Proxis aims to expand its offerings and continue to innovate in the space of AI inference optimization. By staying at the forefront of technology developments and maintaining a strong focus on cost-efficiency, Proxis is well-positioned to become a key player in the AI infrastructure market.

Why Should Developers Consider Joining the Proxis Waitlist?

For developers looking to optimize their AI inference costs without sacrificing performance, joining the Proxis waitlist is a compelling opportunity. By being early adopters, developers can gain access to a cutting-edge API that leverages the latest optimizations for Llama models, ensuring they are well-prepared to handle the growing demands of their applications.

Moreover, the ability to implement Proxis's solution with minimal code changes means that developers can quickly and easily start realizing cost savings, freeing up resources to focus on what matters most – building great products.

Conclusion

Proxis is poised to revolutionize the AI infrastructure landscape with its innovative approach to serving Llama models. By combining advanced optimizations with a flexible, easy-to-use API, Proxis offers a unique solution that addresses the key pain points faced by developers today. With a strong team, a clear vision, and a rapidly growing market, Proxis is set to make a significant impact in the world of AI.