KAIKAKU.AI Compresses Cooking Knowledge into 2MB

KAIKAKU.AI's Epicure condenses extensive culinary knowledge into just 2MB, using AI models to explore ingredient relationships without storing recipes.

Summary

KAIKAKU.AI introduced Epicure, a set of three AI models trained on 4.14 million multilingual recipes.
This model does not store recipes but captures learned insights, allowing users to explore culinary knowledge mathematically.
The three versions—Cooc, Chem, and Core—each address different culinary inquiries using the same concise 2MB dataset.

Josef Chen asserts he has managed to condense all of human culinary knowledge into a mere two megabytes, a claim that appears to hold true.

As co-founder and CEO of the London-based food AI firm KAIKAKU.AI, Chen, along with researcher Jakub Radzikowski, released a paper on arXiv this week detailing Epicure, which comprises three AI models trained on an extensive dataset of 4.14 million recipes across 11 different languages. The outcome is a detailed representation of 1,790 ingredients, each quantified by 300 numbers, all fitting comfortably within the size limits of an email attachment.

"4.1M recipes. 7 languages. 1,790 ingredients. 300 dimensions," Chen stated on X. "All of human cooking compressed into 2 megabytes."

We are excited to announce our new paper on arXiv: we have developed the largest multilingual food model ever created.

4.1M recipes. 7 languages. 1,790 ingredients. 300 dimensions.

All of human cooking compressed into 2 megabytes. pic.twitter.com/b4GiZ62UMt

— Josef Chen (@josefchen) May 26, 2026

Understanding the Model

Before envisioning a compact USB drive filled with cooking instructions, it’s important to clarify that the model does not retain any specific recipes. Instead, the two megabytes serve more as a coordinate system than a traditional cookbook.

Consider it a navigational map: each ingredient is assigned a specific location based on its usage across a multitude of real-world dishes. The calculation is straightforward: 1,790 ingredients multiplied by 300 numerical descriptors per ingredient multiplied by 4 bytes each results in approximately 2.05 megabytes. These numerical values indicate ingredient pairings, shared flavor compounds, and cultural culinary practices. Once the model assimilates this knowledge from the recipes, the actual recipes are no longer needed; the information resides in the coordinates.

This approach is akin to the method used by word2vec in language processing back in 2013, where Google researchers demonstrated that meaning could be manipulated mathematically. Epicure applies this concept to food, allowing users to pivot ingredient focus based on cuisine. For instance, directing beef towards American cuisine leads to associations with bread, lettuce, and possibly beer, while shifting focus to Southeast Asian cuisine shifts the model's associations to soy sauce, ginger, and sesame oil.

This process utilizes a mathematical steering operator known as SLERP rotation. By taking an initial ingredient—like chicken—and mathematically adjusting its angle towards a specific culinary direction, users can explore different flavor profiles. For example, a 30-degree shift may reveal Tex-Mex influences, while a 60-degree shift could bring chicken and beef together around common Mexican ingredients such as corn tortillas and salsa.

Epicure offers three distinct models, each tailored to specific inquiries. Cooc is focused on the co-occurrence of ingredients in recipes, Chem analyzes flavor chemistry based on shared aroma compounds from the FlavorDB chemical database, and Core combines elements from both previous models.

When asking Cooc about complementary ingredients for chocolate, responses may include dessert staples like cocoa powder and vanilla. In contrast, Chem might suggest flavor-related peers such as toffee or fudge.

Different inquiries yield different insights, highlighting that a chef seeking a substitution will have different requirements than one exploring flavor compatibility.

Limitations of Epicure

Epicure does not possess general knowledge, nor can it generate language or invent ingredients it has not encountered. Its scope is limited to 1,790 ingredients, which is the extent of its universe. This focused approach enhances reliability, in contrast to recipe chatbots that might mistakenly suggest harmful ingredients under pressure.

Previously, the leading model in this space was FlavorGraph, a 2021 creation that integrated chemical data with the English-centric Recipe1M+ dataset. Epicure significantly expands on this by incorporating a multilingual dataset that is over four times larger while optimizing vocabulary for better efficiency.

Potential applications are easily envisioned. For instance, a chef might inquire about the East Asian equivalent of a Mediterranean ingredient, or a food product developer could seek a minimally processed alternative that matches the flavor profile of an additive. Recipe applications could benefit from coherent substitutions when specific ingredients are unavailable. This last scenario illustrates where specialized small models can outperform larger generalist models.

The Epicure paper marks a significant research advancement, with the trained models available on Hugging Face and an interactive ingredient map accessible at epicure.kaikaku.ai. Additionally, they have released an MCP for your agents. However, full training code has not been made publicly available at this time.

Daily Debrief Newsletter

Stay updated with the latest news stories, along with original features, podcasts, videos, and more.

AI Compresses Global Cooking Knowledge into Just 2MB

Summary

Daily Debrief Newsletter