FiddleCube - Automated dataset generation for fine-tuning LLMs

Unlocking the Power of Fine-Tuned LLMs: FiddleCube's Innovative Approach

In the fast-paced world of artificial intelligence, access to high-quality datasets is the key to unlocking the full potential of Language Model Models (LLMs). However, creating these datasets can be a resource-intensive and time-consuming endeavor. That's where FiddleCube, a San Francisco-based startup founded in 2022 by the dynamic duo Neha Nupoor and Kaushik Srinivasan, steps in. With a mission to democratize access to top-tier datasets, FiddleCube offers an automated solution for fine-tuning LLMs using high-quality synthetic data. In this article, we'll dive deep into the world of FiddleCube, exploring its founders' journey, the problem it aims to solve, and the innovative solutions it offers.

Who Are the Visionaries Behind FiddleCube?

Neha Nupoor and Kaushik Srinivasan

Before we delve into FiddleCube's groundbreaking solutions, it's essential to meet the brains behind this innovative startup. Neha Nupoor and Kaushik Srinivasan are the power couple leading FiddleCube to success.

Neha Nupoor is not your average entrepreneur. With a background as a Full Stack engineer and a part-time illustrator, she brings a unique blend of technical expertise and creative thinking to the table. Neha is deeply passionate about AI alignment, health-tech, design, and fitness. Her diverse skill set and curiosity have been instrumental in shaping FiddleCube's vision.

Kaushik Srinivasan, on the other hand, is a seasoned software engineer with over a decade of experience working at tech giants like Google, Uber, and LinkedIn. His expertise lies in building highly reliable, low-latency, and fault-tolerant software systems at a planetary scale. Kaushik's obsession with creating high-quality datasets led to the birth of FiddleCube.

These two remarkable individuals crossed paths while working at Uber, eventually fell in love, and decided to embark on a journey together, combining their passion for AI and data. With this powerful partnership at the helm, FiddleCube was founded with the goal of making high-quality datasets accessible to everyone.

What Is FiddleCube's Core Offering?

FiddleCube - Automated Dataset Generation for Fine-Tuning LLMs

FiddleCube's flagship offering revolves around automating the process of dataset generation for fine-tuning LLMs. Let's take a closer look at what FiddleCube brings to the table:

Create high-quality datasets for fine-tuning and reinforcement learning.

FiddleCube addresses a pressing challenge in the world of AI: fine-tuning LLMs requires access to high-quality datasets. While this fine-tuning is crucial to ensure that LLMs align with human instructions, it's often a bottleneck due to the lack of suitable data.

FiddleCube's solution is simple yet revolutionary: automagically generate fine-tuning datasets from your existing data. This means that user data sources can be transformed into high-quality datasets without the need for extensive manual effort or exorbitant costs.

The Problem FiddleCube Solves

Why Do LLMs Need Fine-Tuning with High-Quality Datasets?

In the real world, LLMs need to do more than just generate text. They must be aligned to follow human instructions accurately and ethically. This alignment involves responding in a manner that is not only accurate but also:

Positive, Truthful & Honest: LLMs must provide responses that are not misleading or deceptive.

In Accordance with Human Beliefs and Sensibilities: LLMs should respect cultural norms, ethical values, and societal sensibilities.

Achieving these objectives is challenging and often requires fine-tuning and reinforcement learning with high-quality datasets. However, creating such datasets can be a daunting task due to the resources and effort involved.

How Does FiddleCube Address the Problem?

Leveraging AI for High-Quality Dataset Generation

FiddleCube tackles this problem head-on by leveraging a suite of AI models and innovative techniques. Here's how they do it:

Generate Annotated Datasets from Raw Data: FiddleCube's AI models can transform raw data into annotated datasets, adding the necessary context and labels for fine-tuning.

Augment the Datasets: To significantly improve model performance, FiddleCube creates large datasets by augmenting existing ones. This approach not only enriches the dataset but also ensures that the models have a more extensive and diverse set of examples to learn from.

Evaluate and Improve Data Quality: FiddleCube doesn't stop at dataset creation; they also focus on data quality. Their experts rigorously evaluate and refine training datasets to ensure they meet the highest standards.

The result? FiddleCube delivers a rich, diverse, and high-quality dataset that enables better models to be built with a smaller corpus of data.

What Can You Do with FiddleCube's Fine-Tuned LLMs?

Real-World Applications

Now that we understand how FiddleCube works, let's explore some of the real-world applications and use cases where their fine-tuned LLMs shine:


Imagine giving an LLM a distinct personality, voice, and tone. FiddleCube makes this possible. For example, you can create a safe Dora the Explorer or Peppa Pig model that speaks to children in a way that is engaging and age-appropriate. This level of personalization opens up exciting possibilities in content creation and interaction.

API Calling and Coding

In specific use cases like making API calls or generating code, fine-tuning LLMs has proven to yield significantly better results. Developers and businesses can fine-tune LLMs on a corpus of code or API data, enhancing their ability to perform these tasks with precision and efficiency. This can streamline software development and automation processes, saving both time and resources.

Increase Throughput, Reduce Latency, and Cost

Fine-tuned LLMs are more compact and specialized compared to foundational models. This means they require fewer computational resources to operate efficiently. Organizations can leverage these fine-tuned models to increase throughput, reduce latency, and cut down on operational costs. It's a win-win situation for businesses looking to optimize their AI-driven services.

Low Resource Domains

In certain domains like vernacular languages, LLMs often perform poorly due to the lack of a sufficient corpus of high-quality data. FiddleCube's approach to fine-tuning using generated datasets has demonstrated remarkable improvements over the state of the art in these cases. This means that even in low-resource domains, FiddleCube can empower LLMs to deliver exceptional results.


In a world increasingly driven by artificial intelligence and natural language understanding, FiddleCube stands as a beacon of innovation. Founded by two passionate individuals, Neha Nupoor and Kaushik Srinivasan, this startup is poised to revolutionize the way we approach fine-tuning LLMs.

By automating the process of dataset generation, FiddleCube makes it easier for businesses and developers to harness the power of AI alignment and ethical language models. Their commitment to high-quality datasets, AI-driven augmentation, and data quality improvement ensures that the models they produce meet the highest standards.

With a wide range of applications, from personalization to code generation and cost optimization, FiddleCube's fine-tuned LLMs have the potential to transform industries and enhance the capabilities of AI systems across the board.

As the AI landscape continues to evolve, FiddleCube's innovative approach to fine-tuning LLMs with high-quality synthetic data places them at the forefront of AI-driven solutions, helping businesses and individuals alike unlock the full potential of language models.