Case Study: How We Reduced Pizza Order Errors by 25% Using Swedish NLP

How We Reduced Pizza Order Errors
by 25% Using Swedish NLP

A busy Swedish pizzeria. A 69-item menu. Regional accents, phonetic mishears, and a peak-hour chaos problem. Here's exactly what we built and what changed.

Order accuracy

95%

was 70% — +25pp improvement

Average call time

−20%

fewer seconds per order call

Kitchen processing

+15%

faster ticket-to-prep time

Allergy incidents

was a documented concern pre-deployment

The situation

A restaurant where the phone was the problem

This pizzeria was doing well — good location, loyal regulars, solid kitchen. But the phone line was a constant source of friction. Staff spent Friday nights trying to hear orders over kitchen noise. Customers repeated themselves. Errors made it to the kitchen. They tried an off-the-shelf AI voice tool and it made things worse, not better.

AI couldn't recognize the menu

Swedish pizza names didn't match anything in the model's training data. "Capricciosa" came through as "kabbalisch ås". The AI confidently passed garbage to the kitchen.

Customers had to repeat themselves

When the AI failed to parse an order, it asked the customer to repeat. Then again. On a Friday at 18:30, most just hung up. Lost order, frustrated customer.

Staff errors on transcribed orders

The AI would produce a phonetic transcript and staff would type it into the POS manually. This introduced a second point of failure — and removed any efficiency gain.

Peak hours were genuinely chaotic

Between 17:00–21:00 on weekends, the phone rang continuously. Staff were split between the floor, the kitchen, and a phone that required constant attention. Orders were missed entirely.

The core problem

Off-the-shelf ASR isn't built
for Swedish menus

Standard AI voice tools aren't trained for Swedish accents, business-specific menus, or regional dialects. They're trained on broadcast Swedish — not the casual, noisy, phonetically compressed way people actually order pizza over the phone. Off-the-shelf solutions failed 30% of the time on this menu alone.

What the customer said

"en svepperoni, stor"

"kabbalisch ås"

"en med kantarell, utan lök"

"halv gyros och halv kebab"

What the system now understands

Pepperoni · large · no mods

Capricciosa · medium

Kantarell · mod: remove onion

Kebab & Gyros split · half/half

What happens on every call

Seven stages from a customer saying "hej" to a kitchen ticket printing — designed around the specific failure modes we found in the audit.

Step 1

📞

Voice call

Customer calls the restaurant's existing number. No new number required.

→

Step 2

🎙️

Vapi answers

AI agent picks up in <1s. Greets in Swedish. Begins streaming audio to Whisper ASR.

→

Step 3

🧠

NLP extraction

Claude parses the transcript: items, quantity, size, modifiers, allergies.

→

Step 4

🔍

Fuzzy menu match

Levenshtein + Dice bigram against the 69-item menu. Phonetic alias table applied first.

→

Step 5

⚙️

Modifier parsing

Regex + NLP extracts gluten-free, sauce, sliced, extra/remove toppings. Validated against schema.

→

Step 6

🗄️

DB normalization

Zod-validated order written to PostgreSQL. Full audit trail. Two-pass validation runs here.

→

Step 7

🎫

Kitchen ticket

Structured order pushed to kitchen display in real time. Staff never touch a phone for this order.

Steps 4 and 5 (highlighted) are the parts that off-the-shelf solutions skip entirely — and where 90% of errors originated.

The build

Five things we built
that didn't exist off the shelf

Custom fuzzy matcher for a 69-item menu

We built a hybrid matching algorithm combining Levenshtein distance (weighted 40%) and Dice bigram similarity (weighted 60%). Pure Levenshtein penalises long menu names unfairly — Dice bigram handles phonetic similarity better for Swedish compound words. The matcher runs against every item in the menu and returns the best candidate above a confidence threshold.

// Hybrid score: 40% Levenshtein + 60% Dice bigram
score = 0.4 × levenshtein(input, candidate)
+ 0.6 × dice_bigram(input, candidate)
// Accept if score > 0.72 confidence threshold

Swedish phonetic alias table

Before fuzzy matching runs, each input token is checked against a hand-curated phonetic alias table built from the actual mishears found in the 400-call audit. This handles the systematic errors — "svepperoni" will never fuzzy-match to "Pepperoni" without a hint, but with the alias table it maps cleanly.

"svepperoni" → "Pepperoni"
"kabbalisch ås" → "Capricciosa"
"kantarell" → "Kantarellpizza"
"margereta" → "Margherita"
"vesuvio" → "Vesuvio" // exact, kept for coverage

Modifier extraction from transcript

Getting the pizza name right is 50% of the problem. The other 50% is extracting modifications accurately — especially allergy-related ones where errors have real consequences. We built a regex + NLP pipeline that processes Swedish keywords and maps them to structured flags.

// Swedish modifier → structured flag
"glutenfri" / "glutenfritt" → { gluten_free: true }
"skivad" / "slice" → { sliced: true }
"pirri pirri" / "vitlökssås" → { sauce: "garlic" }
"utan lök" / "ta bort lök" → { remove: ["onion"] }
"halv gyros och halv kebab" → { toppingMods: "split" }

Two-pass validation (AI + Whisper audio review)

The first pass runs immediately on the live transcript. The second pass runs asynchronously on the Whisper audio review of the same call — comparing the two outputs for discrepancies. If they disagree above a threshold, the order is flagged for staff review before it prints to the kitchen. In practice, this catches about 95% of remaining errors the first pass would have passed through.

Real-time kitchen integration

Every previous system had a human in the loop between phone and kitchen. Our integration pushes validated orders directly to the kitchen display via webhook — zero manual re-entry, zero transcription lag. The kitchen sees the order within seconds of the customer finishing the call, with the same structure every time.

Before → After

Measured over a 60-day period post-deployment against the same 60-day baseline pre-deployment.

Order accuracy

70%

95%

+25pp

Avg call time

baseline

−20%

shorter

Kitchen speed

baseline

+15%

faster

Allergy incidents

concern

zero incidents

Staff morale

stressful

Noticeably improved

qualitative

"Staff morale" is qualitative — reported by the owner in a post-deployment review. The phrase used was: "the team doesn't dread the phone anymore."

Under the hood

⚙️

Architecture & implementation details

Stack, algorithms, schema, and database design — expand if you want the full picture

▼

Stack

Vapi.ai — voice answering, streaming audio, call management
Claude — NLP order extraction from transcript
Whisper — second-pass audio review for validation
Custom fuzzy matcher — Levenshtein + Dice bigram hybrid
PostgreSQL on Railway — normalized order storage
Zod — schema validation for type-safe order objects
Webhook → Kitchen Display System (KDS)

Menu structure

69 items across 4 categories: Ordinarie, Mellan, Special, Kebab & Gyros
Each item: name, id, price, allowed modifiers, allergen flags
Phonetic alias table: 40+ entries built from call audit
Modifier vocabulary: ~80 Swedish phrases mapped to 12 structured flags

Fuzzy matching algorithm

// Normalise both strings first
normalise(s) = s
  .toLowerCase()
  .replace(/[åä]/g, 'a')
  .replace(/ö/g, 'o')
  .trim()

// Dice bigram similarity
function dice(a, b) {
  bigrams_a = getBigrams(a)
  bigrams_b = getBigrams(b)
  intersection = |bigrams_a ∩ bigrams_b|
  return (2 × intersection) / (|a| + |b|)
}

// Final score
score = 0.4 × levenshtein(a, b)
      + 0.6 × dice(a, b)
threshold = 0.72
              

Order schema (Zod)

ChilliOrder = z.object({
  items: z.array(z.object({
    menuItemId: z.string(),
    name:       z.string(),
    quantity:   z.number(),
    size:       z.enum(['small','medium','large']),
    modifiers: z.object({
      gluten_free: z.boolean().default(false),
      sliced:      z.boolean().default(false),
      sauce:       z.string().optional(),
      remove:      z.array(z.string()),
      toppingMods: z.string().optional(),
    })
  })),
  orderType: z.enum(['pickup','delivery']),
  confidence: z.number().min(0).max(1),
})
              

Two-pass validation flow

Pass 1 — Live (synchronous)

Claude extracts structured order from real-time transcript during the call. Order confirmed verbally with customer. Provisional kitchen ticket queued.

→

Pass 2 — Whisper review (async, <8s)

Whisper re-processes the full call audio independently. Output compared with Pass 1. Discrepancy above threshold flags the order for staff review before final print.

→

Kitchen ticket confirmed

Clean orders print automatically. Flagged orders pause at the POS station for 10-second staff confirmation. Catches ~95% of remaining errors.

Five things this deployment
taught us about Swedish voice AI

Swedish accents require specific training — general ASR models won't do

Broadcast-quality Swedish and phone-quality colloquial Swedish are very different inputs. Models trained on the former consistently fail on the latter, especially under background noise. You need a model or a post-processing layer that accounts for how people actually talk in context.

Menu familiarity is non-negotiable — fuzzy matching saves the day

No general-purpose LLM knows your 69-item pizza menu. The model doesn't need to be perfect at transcription if your matching layer is good enough to recover from imperfect input. A well-tuned fuzzy matcher against a curated menu will outperform a better ASR model with no post-processing.

Modifier extraction is the hard part — not the pizza names

Everyone focuses on getting the item name right. In practice, modifier parsing is where errors with real consequences occur. Allergy information, sauce preferences, and removal requests are communicated in highly variable Swedish phrasing. Getting this right requires a purpose-built extraction layer, not a generic one.

Two-pass validation (AI + Whisper) catches 95% of what slips through

A single extraction pass on a noisy call transcript will always have some error rate. Running a second pass on the audio — independently, with a different model — and comparing outputs is a reliable way to catch the cases where the first pass was wrong. The 8-second async window is acceptable; most kitchens don't need the ticket faster than that.

Real-time kitchen integration creates an immediate feedback loop

When orders go directly to the kitchen without human re-entry, errors become immediately visible and correctable. Staff stop being transcription operators. The kitchen gets consistent, structured input. This operational change has downstream effects on speed, morale, and error rates that are hard to fully attribute to any single part of the system.

The same problems exist
across every vertical that takes calls.

The specific challenge here was Swedish pizza names. But the underlying pattern — generic AI failing on domain-specific language — shows up everywhere.

This work is likely relevant if your business deals with any of these:

Swedish language AI that fails on accents, dialects, or your specific vocabulary
Complex order or appointment processing where errors have consequences
Staff spending significant time on repetitive, structured phone calls
Missed opportunities during peak hours because the phone isn't answered
Manual data entry between phone and your internal system (POS, CRM, booking system)
Allergy, safety, or compliance information that must be captured accurately

Ring för demo Maila oss

Vår lösning för restauranger

📱

Enkel hemsida

Din meny online, synlig på Google, beställning direkt

→ 📞

AI-telefonassistent

Svarar dygnet runt, tar bokningar och order

→ 💰

Från 500 kr/mån

1 månad gratis test, inga startavgifter

→