En produkt från TellAR
Real deployment Swedish NLP Restaurant 2025

How We Reduced Pizza Order Errors
by 25% Using Swedish NLP

A busy Swedish pizzeria. A 69-item menu. Regional accents, phonetic mishears, and a peak-hour chaos problem. Here's exactly what we built and what changed.

Order accuracy
95%
was 70% — +25pp improvement
Average call time
−20%
fewer seconds per order call
Kitchen processing
+15%
faster ticket-to-prep time
Allergy incidents
0
was a documented concern pre-deployment
The situation

A restaurant where the phone was the problem

This pizzeria was doing well — good location, loyal regulars, solid kitchen. But the phone line was a constant source of friction. Staff spent Friday nights trying to hear orders over kitchen noise. Customers repeated themselves. Errors made it to the kitchen. They tried an off-the-shelf AI voice tool and it made things worse, not better.

AI couldn't recognize the menu

Swedish pizza names didn't match anything in the model's training data. "Capricciosa" came through as "kabbalisch ås". The AI confidently passed garbage to the kitchen.

Customers had to repeat themselves

When the AI failed to parse an order, it asked the customer to repeat. Then again. On a Friday at 18:30, most just hung up. Lost order, frustrated customer.

Staff errors on transcribed orders

The AI would produce a phonetic transcript and staff would type it into the POS manually. This introduced a second point of failure — and removed any efficiency gain.

Peak hours were genuinely chaotic

Between 17:00–21:00 on weekends, the phone rang continuously. Staff were split between the floor, the kitchen, and a phone that required constant attention. Orders were missed entirely.

The core problem

Off-the-shelf ASR isn't built
for Swedish menus

Standard AI voice tools aren't trained for Swedish accents, business-specific menus, or regional dialects. They're trained on broadcast Swedish — not the casual, noisy, phonetically compressed way people actually order pizza over the phone. Off-the-shelf solutions failed 30% of the time on this menu alone.

— SolutionOps assessment after auditing 400 call recordings
What the customer said
"en svepperoni, stor"
"kabbalisch ås"
"en med kantarell, utan lök"
"halv gyros och halv kebab"
What the system now understands
Pepperoni · large · no mods
Capricciosa · medium
Kantarell · mod: remove onion
Kebab & Gyros split · half/half
The architecture

What happens on every call

Seven stages from a customer saying "hej" to a kitchen ticket printing — designed around the specific failure modes we found in the audit.

Step 1
📞

Voice call

Customer calls the restaurant's existing number. No new number required.

Step 2
🎙️

Vapi answers

AI agent picks up in <1s. Greets in Swedish. Begins streaming audio to Whisper ASR.

Step 3
🧠

NLP extraction

Claude parses the transcript: items, quantity, size, modifiers, allergies.

Step 4
🔍

Fuzzy menu match

Levenshtein + Dice bigram against the 69-item menu. Phonetic alias table applied first.

Step 5
⚙️

Modifier parsing

Regex + NLP extracts gluten-free, sauce, sliced, extra/remove toppings. Validated against schema.

Step 6
🗄️

DB normalization

Zod-validated order written to PostgreSQL. Full audit trail. Two-pass validation runs here.

Step 7
🎫

Kitchen ticket

Structured order pushed to kitchen display in real time. Staff never touch a phone for this order.

Steps 4 and 5 (highlighted) are the parts that off-the-shelf solutions skip entirely — and where 90% of errors originated.

The build

Five things we built
that didn't exist off the shelf

1

Custom fuzzy matcher for a 69-item menu

We built a hybrid matching algorithm combining Levenshtein distance (weighted 40%) and Dice bigram similarity (weighted 60%). Pure Levenshtein penalises long menu names unfairly — Dice bigram handles phonetic similarity better for Swedish compound words. The matcher runs against every item in the menu and returns the best candidate above a confidence threshold.

// Hybrid score: 40% Levenshtein + 60% Dice bigram
score = 0.4 × levenshtein(input, candidate)
      + 0.6 × dice_bigram(input, candidate)
// Accept if score > 0.72 confidence threshold
2

Swedish phonetic alias table

Before fuzzy matching runs, each input token is checked against a hand-curated phonetic alias table built from the actual mishears found in the 400-call audit. This handles the systematic errors — "svepperoni" will never fuzzy-match to "Pepperoni" without a hint, but with the alias table it maps cleanly.

"svepperoni""Pepperoni"
"kabbalisch ås""Capricciosa"
"kantarell""Kantarellpizza"
"margereta""Margherita"
"vesuvio""Vesuvio" // exact, kept for coverage
3

Modifier extraction from transcript

Getting the pizza name right is 50% of the problem. The other 50% is extracting modifications accurately — especially allergy-related ones where errors have real consequences. We built a regex + NLP pipeline that processes Swedish keywords and maps them to structured flags.

// Swedish modifier → structured flag
"glutenfri" / "glutenfritt"{ gluten_free: true }
"skivad" / "slice"{ sliced: true }
"pirri pirri" / "vitlökssås"{ sauce: "garlic" }
"utan lök" / "ta bort lök"{ remove: ["onion"] }
"halv gyros och halv kebab"{ toppingMods: "split" }
4

Two-pass validation (AI + Whisper audio review)

The first pass runs immediately on the live transcript. The second pass runs asynchronously on the Whisper audio review of the same call — comparing the two outputs for discrepancies. If they disagree above a threshold, the order is flagged for staff review before it prints to the kitchen. In practice, this catches about 95% of remaining errors the first pass would have passed through.

5

Real-time kitchen integration

Every previous system had a human in the loop between phone and kitchen. Our integration pushes validated orders directly to the kitchen display via webhook — zero manual re-entry, zero transcription lag. The kitchen sees the order within seconds of the customer finishing the call, with the same structure every time.

The results

Before → After

Measured over a 60-day period post-deployment against the same 60-day baseline pre-deployment.

Order accuracy
70%
95%
+25pp
Avg call time
baseline
−20%
shorter
Kitchen speed
baseline
+15%
faster
Allergy incidents
concern
0
zero incidents
Staff morale
stressful
Noticeably improved
qualitative

"Staff morale" is qualitative — reported by the owner in a post-deployment review. The phrase used was: "the team doesn't dread the phone anymore."

For technical teams

Under the hood

⚙️

Architecture & implementation details

Stack, algorithms, schema, and database design — expand if you want the full picture

Stack

  • Vapi.ai — voice answering, streaming audio, call management
  • Claude — NLP order extraction from transcript
  • Whisper — second-pass audio review for validation
  • Custom fuzzy matcher — Levenshtein + Dice bigram hybrid
  • PostgreSQL on Railway — normalized order storage
  • Zod — schema validation for type-safe order objects
  • Webhook → Kitchen Display System (KDS)

Menu structure

  • 69 items across 4 categories: Ordinarie, Mellan, Special, Kebab & Gyros
  • Each item: name, id, price, allowed modifiers, allergen flags
  • Phonetic alias table: 40+ entries built from call audit
  • Modifier vocabulary: ~80 Swedish phrases mapped to 12 structured flags

Fuzzy matching algorithm

// Normalise both strings first normalise(s) = s .toLowerCase() .replace(/[åä]/g, 'a') .replace(/ö/g, 'o') .trim() // Dice bigram similarity function dice(a, b) { bigrams_a = getBigrams(a) bigrams_b = getBigrams(b) intersection = |bigrams_a ∩ bigrams_b| return (2 × intersection) / (|a| + |b|) } // Final score score = 0.4 × levenshtein(a, b) + 0.6 × dice(a, b) threshold = 0.72

Order schema (Zod)

ChilliOrder = z.object({ items: z.array(z.object({ menuItemId: z.string(), name: z.string(), quantity: z.number(), size: z.enum(['small','medium','large']), modifiers: z.object({ gluten_free: z.boolean().default(false), sliced: z.boolean().default(false), sauce: z.string().optional(), remove: z.array(z.string()), toppingMods: z.string().optional(), }) })), orderType: z.enum(['pickup','delivery']), confidence: z.number().min(0).max(1), })

Two-pass validation flow

Pass 1 — Live (synchronous)

Claude extracts structured order from real-time transcript during the call. Order confirmed verbally with customer. Provisional kitchen ticket queued.

Pass 2 — Whisper review (async, <8s)

Whisper re-processes the full call audio independently. Output compared with Pass 1. Discrepancy above threshold flags the order for staff review before final print.

Kitchen ticket confirmed

Clean orders print automatically. Flagged orders pause at the POS station for 10-second staff confirmation. Catches ~95% of remaining errors.

What we learned

Five things this deployment
taught us about Swedish voice AI

01

Swedish accents require specific training — general ASR models won't do

Broadcast-quality Swedish and phone-quality colloquial Swedish are very different inputs. Models trained on the former consistently fail on the latter, especially under background noise. You need a model or a post-processing layer that accounts for how people actually talk in context.

02

Menu familiarity is non-negotiable — fuzzy matching saves the day

No general-purpose LLM knows your 69-item pizza menu. The model doesn't need to be perfect at transcription if your matching layer is good enough to recover from imperfect input. A well-tuned fuzzy matcher against a curated menu will outperform a better ASR model with no post-processing.

03

Modifier extraction is the hard part — not the pizza names

Everyone focuses on getting the item name right. In practice, modifier parsing is where errors with real consequences occur. Allergy information, sauce preferences, and removal requests are communicated in highly variable Swedish phrasing. Getting this right requires a purpose-built extraction layer, not a generic one.

04

Two-pass validation (AI + Whisper) catches 95% of what slips through

A single extraction pass on a noisy call transcript will always have some error rate. Running a second pass on the audio — independently, with a different model — and comparing outputs is a reliable way to catch the cases where the first pass was wrong. The 8-second async window is acceptable; most kitchens don't need the ticket faster than that.

05

Real-time kitchen integration creates an immediate feedback loop

When orders go directly to the kitchen without human re-entry, errors become immediately visible and correctable. Staff stop being transcription operators. The kitchen gets consistent, structured input. This operational change has downstream effects on speed, morale, and error rates that are hard to fully attribute to any single part of the system.

Could this help your business?

The same problems exist
across every vertical that takes calls.

The specific challenge here was Swedish pizza names. But the underlying pattern — generic AI failing on domain-specific language — shows up everywhere.

This work is likely relevant if your business deals with any of these:

  • Swedish language AI that fails on accents, dialects, or your specific vocabulary
  • Complex order or appointment processing where errors have consequences
  • Staff spending significant time on repetitive, structured phone calls
  • Missed opportunities during peak hours because the phone isn't answered
  • Manual data entry between phone and your internal system (POS, CRM, booking system)
  • Allergy, safety, or compliance information that must be captured accurately
See Our Services Request a Demo
We've built similar systems for
Start a conversation

Ready to see what we'd build
for your business?

30 minutes. No pitch deck. We'll listen to your current setup and tell you honestly whether this is the right fit — and what the build would look like.