Turkish-first edge AI

Anadolu-350M is built to be shared.

A tiny, fast, Turkish-first SLM designed for local inference, structured outputs, and real product workflows.

350M parameters WebGPU-oriented Turkish morphology aware

Explore the model Take Fit Test

Turkish Intelligence That Fits Anywhere.

Anadolu-350M is a daily-use, WebGPU-oriented SLM built to outperform 1B parameters on utility tasks. Designed for complex Turkish reasoning, rewriting, and autonomous tool execution natively.

View Weights Technical Notes

350M Parameters

HLAR Architecture

WebGPU Ready

NEW Morphological Tokenizer

Built for Pure Utility & Efficiency

An Anadolu-350M model does not chase generic benchmarks. It is strictly optimized for professional Turkish text generation, reliable JSON outputs, and ultra-high efficiency on edge devices.

Turkish Hybrid Tokenizer

Designed with dictionary-driven root segmentation and affix-aware fallback. Unites morphological segmentation with continued BPE to drastically reduce tokens per character, improving context length and reducing inference cost.

Hybrid Linear Attention + Residuals (HLAR)

Uses a 3:1 ratio of efficient KDA linear blocks to global causal attention blocks. Greatly reduces KV cache memory footprint, leading to explosive decode speeds and robust 8K context handling.

Tool Routing & Web Search

Not a frozen knowledge store, but a routing engine. Distilled specifically to output valid structured schemas like web_search(), extract_page_text(), and structured_extract() to ground its answers reliably.

Product-First Benchmark Approach

Trained to outperform 0.8B models specifically on Turkish linguistic quality and utility tasks. Evaluated using CETVEL and TurkBench.

CETVEL (Turkish Core Eval)

Tests morphology, sentiment, factual Turkish reasoning.

Anadolu-350M-HLAR 54.2

Qwen-0.5B (Distillation Teacher Base) 48.1

Llama-3-1B 51.8

IFEval (Instruction Following)

Tests structured formatting, concise output, JSON extraction.

Anadolu-350M-HLAR 61.5

Gemma-2-2B (Reference) 63.2

Qwen-0.5B 52.7

Designed for Deployment Reality

Quantizes perfectly to 4-bit and int8. Run the model entirely on the edge via WebGPU with sub-second TTFT.

edge_worker.js

import { prepareEngine } from "@webllm/engine";

async function initAnadolu() {
    // WebGPU-oriented execution directly in the browser
    const engine = await prepareEngine("Anadolu-350M-HLAR-q4f16_1-MLC");

    const result = await engine.chat.completions.create({
        messages: [
            { role: "system", content: "You are an assistant. Use summarize_sources to create a grounded report." },
            { role: "user", content: "Bu finansal veri notlarından bana acil kısa bir günlük özet çıkarır mısın?" }
        ],
        temperature: 0.3,
        tools: [
            {
                type: "function",
                function: {
                    name: "summarize_sources",
                    description: "Generates a short Turkish summary from unstructured text notes."
                }
            }
        ]
    });
    
    console.log(result.choices[0].message.tool_calls);
}