Show HN: Knowledge Graph of Restaurants and Chefs Using LLMs

Show HN: Knowledge Graph of Restaurants and Chefs Using LLMs

This post has been shared by .txt on Linkedin and Bluesky.

The French restaurant scene is rich in history, culinary expertise, and connections. LeFooding.com, a renowned source for French restaurant intel, offers a wealth of information about the industry. Their anonymous critics conduct systematic reviews of establishments across France (and now Belgium), documenting their findings in a witty - if peculiar - style.

Beyond choosing the place for a delicious night out, these reviews can be used to map and understand France’s culinary network. A network is composed of nodes and edges. Nodes represent entities, such as people or restaurants. Edges represent the relationships between people and restaurants, or absence thereof.

In the French restaurant scene, nodes represent both people and restaurants. A person node is connected to a restaurant if that person is known to have worked at the restaurant. Restaurants with very many neighbor nodes (nodes connected to it) are restaurants whose alumni go on to create and work in other prestigious restaurants. This is the case of Ducasse (ok, not a restaurant but a placeholder for all the restaurants in the industry), L'Atelier de Joël Robuchon, or Septime.

These connections can be visualized using a graph, where each node represents an entity and each edge represents a relationship. By analyzing this graph, we can gain insights into the structure of the culinary network and identify key players in the industry.

To extract information from these reviews, I used a large language model (LLM) to process the text data. The LLM is trained on a vast amount of text data, including restaurant reviews, and can be fine-tuned for specific tasks such as entity recognition, named entity disambiguation, and relation extraction.

In this project, I used OpenAI's gpt4o-mini LLM to extract information from LeFooding.com reviews. The model was trained on a custom dataset of restaurant reviews and was fine-tuned using a Pydantic schema to ensure that the output respected the desired structure.

Behind the scenes, this relies on regular expressions and finite state machines (the same techniques used when parsing JSON). Once a prompt_template and Summary schema are designed, it’s as easy as using outlines library built by .txt makes this possible for your favorite open model, such as Mistral-7B-v0.3.

The total cost of inference over 2000 reviews with gpt4o-mini with its structured generation endpoint is less than 1€! I used gephi-lite to create a work on the visualization. The key step was spatialization, which defines a spatial layout of the graph that makes sense given the information contained in the graph.

In our case, I used the force simulation setting which sends connected nodes close and nodes with larger degree to the center. The embedded visualization runs using WebGL thanks to the Retina project2 In order to handle duplicates and eliminating some obvious errors visible on the graph once visualized I asked Claude 3.7 Sonnet to create a simple web app to allow me to edit the inferred entities while keeping the structure intact (as well as the encoding issues caused by sqlite/myself).

This project illustrates how LLMs can be used to extract information from rich sources of textual data, in this instance restaurant reviews. I am very happy with the tools I chose. A good next step is to get all this running on open models, locally or in a public cloud.