Every product team wants AI in the app this quarter. The usual path is to bolt on a cloud LLM, and that path comes with a tax: API keys to rotate and protect, a per-token bill that grows with every active user, network latency that makes the feature feel sluggish, and a privacy review because you are now sending user content to a third party. For a lot of useful features, that overhead is out of proportion with the value.

Apple's Foundation Models framework changes the calculation. It is a native Swift API to the same on-device large language model that powers Apple Intelligence. It runs locally, works offline, keeps user content on the device, and costs nothing per request. There is no API key, no usage dashboard, and no bill at the end of the month. For summarization, classification, extraction, and short structured generation, that is often everything you need.

This guide is a hands-on build walkthrough. We cover what the framework is, when on-device beats cloud, how to architect an AI feature cleanly, and the core APIs with realistic Swift: opening a LanguageModelSession, guided generation with @Generable, streaming responses, and tool calling. We close with the production concerns that separate a demo from a shipped feature, including Swift 6.2 concurrency, fallback handling, and treating model output as untrusted.

📋 Table of Contents

1.What the Foundation Models Framework Is
2.On-Device vs Cloud LLM: When to Use Which
3.Architecture of an AI Feature
4.Setting Up: A Basic Session
5.Guided Generation with @Generable
6.Streaming Responses
7.Tool Calling: Letting the Model Use Your App
8.Production Concerns
9.Why Lushbinary for Your AI iOS App

1What the Foundation Models Framework Is

The Foundation Models framework is a native Swift API to the on-device large language model that sits behind Apple Intelligence. Apple introduced it at WWDC 2025 alongside iOS 26 (September 2025), giving developers direct, first-party access to a model that ships with the operating system. You are not calling out to a server. You are talking to a model that already lives on the user's device.

The properties that matter for product decisions are straightforward: it is free with no per-token cost, it runs on-device so user content never leaves the phone, it is private by construction, and it works offline with no network round trip. That combination is hard to match with any cloud provider, because the cloud always adds a key, a bill, a latency budget, and a data-handling question.

The framework is available on iOS, iPadOS, macOS, and visionOS 26 and later, on Apple-silicon-class devices that support Apple Intelligence. The core surface area is compact and well-designed:

SystemLanguageModel: represents the on-device model and tells you whether it is available on this device and in this region.
LanguageModelSession: the object you create to send prompts and receive responses, with optional instructions and tools.
@Generable and @Guide: guided generation macros that let the model produce typed Swift values directly.
streamResponse: an async sequence of partial results so you can update the UI as the model produces tokens.
Tool calling: a protocol that lets the model call functions you provide, so it can fetch live app data during a response.

At WWDC 2026 Apple extended the framework with image input, optional server-model support for heavier tasks, and custom skills. The on-device core stayed the same, which means the patterns in this guide remain the foundation even as the surface grows. If you are tracking the broader platform direction, our WWDC 2026 announcements and iOS 27 developer guide covers what changed and what it means for app teams.

2On-Device vs Cloud LLM: When to Use Which

The on-device model is compact and fast, not a frontier reasoning engine. Picking the right tool for each feature is the single most important design decision you will make, because it determines your cost structure, your privacy posture, and how the feature feels to use.

Dimension	On-Device (Foundation Models)	Cloud LLM
Cost	Free, no per-token bill	Per-token, scales with usage
Privacy	Content stays on device	Content sent to a third party
Offline	Works with no network	Requires a connection
Latency	No network round trip	Network plus inference time
Best at	Summarize, classify, extract, short generation	Frontier reasoning, very large context
Context window	Compact, suited to focused tasks	Large, suited to long documents

Reach for the on-device model when the task is bounded and the value is in privacy, offline support, or zero cost. Summarizing a note, tagging an email by intent, pulling structured fields out of a receipt, drafting a short reply, or classifying user input all fit beautifully. These run instantly, cost nothing, and never expose user content.

A useful test is to ask three questions about the feature. Does the input contain anything the user would not want sent to a server? Does the feature need to work on a plane or in a dead zone? Will the feature run often enough that a per-token bill would compound into a real cost? If you answer yes to any of them, the on-device model is probably the right default, and you only escalate to the cloud for the subset of requests that genuinely exceed the compact model's reach.

Reach for a cloud LLM when you genuinely need frontier reasoning, a huge context window, or up-to-date world knowledge the compact model does not carry. The mature pattern in production is hybrid: handle the common path on-device and escalate the rare heavy task to the cloud. If you are weighing the budget impact of each approach, our iOS app development cost in the AI era guide breaks down where on-device inference removes a recurring line item.

3Architecture of an AI Feature

A clean AI feature keeps the model work isolated from the UI. The SwiftUI view stays declarative, a view model coordinates the request on a background actor, and the LanguageModelSession talks to the on-device model. Structured output flows back as a typed @Generable value, and when the model needs live data it branches into your tools. The diagram below shows the flow.

The key idea is separation. The view never touches the model directly, the actor guarantees the work runs off the main thread, and the session is the only thing that knows how to talk to the model. That structure makes the feature testable, keeps the UI responsive, and lets you swap the on-device path for a cloud path later without rewriting the interface.

4Setting Up: A Basic Session

Start by importing the framework and checking availability. Not every device and region has the model, so SystemLanguageModel.default exposes an availability state you must inspect before you create a session. Once you confirm the model is available, you create a LanguageModelSession and call respond(to:) with async/await.

// Basic availability check and a single response

import FoundationModels

func summarize(_ text: String) async throws -> String {
    let model = SystemLanguageModel.default

    // Not all devices and regions support the model.
    switch model.availability {
    case .available:
        let session = LanguageModelSession(
            instructions: "You write concise, neutral summaries."
        )
        let response = try await session.respond(
            to: "Summarize the following note in two sentences:\n\n\(text)"
        )
        return response.content

    case .unavailable(let reason):
        throw SummaryError.modelUnavailable(reason)
    }
}

A few things to note. The instructions argument sets durable guidance for the session, separate from the per-request prompt, which keeps your behavior steering out of the user content. The respond(to:) call is asynchronous and returns a response whose content holds the text. And the availability switch is not optional: shipping without it means the feature crashes or silently fails on devices that do not have the model.

Treat the .unavailable case as a first-class path, not an edge case. We come back to fallback strategy in the production section, because how you degrade when the model is missing is part of the feature, not an afterthought.

One more habit worth forming early: keep the prompt and the instructions distinct. Instructions describe the role and the rules you want to hold for the whole session, while the prompt carries the specific request and any user content. Mixing the two makes behavior harder to reason about and, as we will see in the production section, weakens your defense against prompt injection. Separating them now costs nothing and pays off when the feature grows.

5Guided Generation with @Generable

Parsing free text out of a model is brittle. The shape drifts, a field goes missing, and your string handling breaks in production. Guided generation solves this by letting the model produce a typed Swift value directly. You mark a type @Generable, describe each property with @Guide, and ask the session to generate that type.

// A typed result the model fills directly

import FoundationModels

@Generable
struct ReceiptInfo {
    @Guide(description: "Merchant or store name")
    var merchant: String

    @Guide(description: "Total amount in dollars as a number")
    var total: Double

    @Guide(description: "Spending category for this purchase")
    var category: Category

    @Generable
    enum Category {
        case groceries, dining, travel, other
    }
}

func parseReceipt(_ raw: String) async throws -> ReceiptInfo {
    let session = LanguageModelSession(
        instructions: "Extract structured fields from receipt text."
    )
    let response = try await session.respond(
        to: "Extract the receipt details from:\n\(raw)",
        generating: ReceiptInfo.self
    )
    return response.content
}

The payoff is real. You get a ReceiptInfo value with a typed Double total and an enum category, so the compiler enforces the shape and you never write a regular expression to scrape a number out of a sentence. @Guide descriptions steer the model toward the right value for each field, which raises accuracy on extraction tasks. This is the single highest-leverage pattern in the framework for app developers.

6Streaming Responses

For anything longer than a sentence, waiting for the full response feels slow even when it is fast. Streaming fixes the perceived latency: streamResponse returns an async sequence of partial results, and you update SwiftUI state as each chunk arrives so the user watches the text build.

// Consume the stream and update the UI as it grows

import FoundationModels
import SwiftUI

@MainActor
@Observable
final class DraftViewModel {
    var draft: String = ""
    private let session = LanguageModelSession()

    func generate(prompt: String) async {
        do {
            let stream = session.streamResponse(to: prompt)
            for try await partial in stream {
                // Each partial holds the response so far.
                draft = partial.content
            }
        } catch {
            draft = "Could not generate a draft right now."
        }
    }
}

Because the view model is annotated @MainActor and @Observable, assigning to draft inside the loop updates the view safely on the main actor while the heavy inference work happens behind the async sequence. The result is a response that appears to type itself out, which is exactly the experience users now expect from AI features.

7Tool Calling: Letting the Model Use Your App

The on-device model knows nothing about your app's live data until you give it a way to ask. Tool calling does that. You define a type conforming to the Tool protocol with a call method, register it on the session, and the model can invoke it mid-response to fetch what it needs, then fold the result into its answer.

// A tool the model can call to fetch app data

import FoundationModels

struct CalendarTool: Tool {
    let name = "getEvents"
    let description = "Returns the user's events for a given day."

    @Generable
    struct Arguments {
        @Guide(description: "Day to look up, like 'today' or 'Friday'")
        var day: String
    }

    func call(arguments: Arguments) async throws -> String {
        let events = await EventStore.shared.events(for: arguments.day)
        guard !events.isEmpty else { return "No events." }
        return events.map { "\($0.time): \($0.title)" }
            .joined(separator: "\n")
    }
}

let session = LanguageModelSession(tools: [CalendarTool()])
let answer = try await session.respond(
    to: "What does my Friday look like? Keep it short."
)

The model decides when to call getEvents, receives the string your call method returns, and uses it to ground the reply. This is how you build an assistant that answers from real app state instead of guessing. It pairs naturally with App Intents if you want the same actions exposed to Siri and Spotlight, which our guide to modernizing an iOS app with Apple Intelligence and App Intents walks through in detail.

8Production Concerns

A working demo and a shipped feature are different things. The gap is mostly concurrency, failure handling, and security. Here is what to get right before release.

Swift 6.2 concurrency

Swift 6.2 (September 2025) ships with Xcode 26 and brings approachable concurrency: new projects are main-actor-by-default, with strict compile-time data-race safety and a @concurrent attribute for moving work off the main actor. Inference is not cheap, so it must not run on the main actor. Isolate the session in an actor, mark the types you pass across boundaries Sendable, and await the result before you touch UI state. The compiler will flag data races for you, which is a feature, not a nuisance. For the broader language changes, see the Swift 6.2 release notes.

Graceful fallback

The model is absent on older hardware and in some regions. Design the unavailable path on purpose: hide the AI entry point, fall back to a deterministic non-AI implementation, or escalate to a cloud model if your product supports it. Never let a missing model turn into a broken screen.

Context window limits

The on-device model has a compact context window. Long inputs need to be chunked, summarized first, or trimmed to the relevant section. Sending a whole document and hoping for the best is a reliable way to get truncated or low-quality output. Budget your tokens the way you would budget memory.

Prompt injection and testing

When user content or fetched data flows into a prompt, treat it as a potential attack surface. A note, an email body, or a web snippet can contain instructions aimed at your model. Keep your steering in the session instructions, separate untrusted content from your directives, and never let model output trigger a destructive action without validation. For testing, mock the session behind a protocol so you can run your view models deterministically in unit tests, then verify behavior on real devices across OS versions.

⚠️ Treat model output as untrusted

Model output is a suggestion, not a command. Validate it before you act on it: bound numbers to sane ranges, confirm enums against your allowed set, and require explicit user confirmation before anything irreversible like deleting data, sending a message, or making a purchase. A model that was nudged by injected text should never be able to take a destructive action on its own.

❓ Frequently Asked Questions

Is the Foundation Models framework free?

Yes. The Foundation Models framework gives you a native Swift API to the same on-device model that powers Apple Intelligence, and there is no per-token cost, no API key, and no usage bill. The model runs locally on the user's device, so you pay nothing for inference no matter how many requests your app makes.

What devices support Foundation Models?

The framework is available on iOS, iPadOS, macOS, and visionOS 26 and later, running on Apple-silicon-class devices that support Apple Intelligence. Older hardware and some regions do not have the model, so you must always check SystemLanguageModel.default availability at runtime and provide a graceful fallback when it is not available.

When should I use a cloud LLM instead?

Use a cloud LLM when you need frontier-level reasoning, a very large context window, or knowledge beyond what a compact on-device model holds. The on-device model is excellent for summarization, classification, extraction, and short structured generation where privacy, offline support, and zero cost matter. Many production apps use both: on-device for the common path and a cloud model for heavy tasks.

What are @Generable and @Guide used for?

@Generable marks a Swift type that the model can produce directly as structured output, and @Guide annotations describe each property so the model fills it correctly. Together they give you guided generation: the model returns a typed Swift value instead of free text, so you avoid brittle string parsing and get compile-time safety on the result.

How do I handle Swift 6.2 concurrency with a model session?

Keep session work off the main actor by isolating it in an actor or a background task, mark shared types Sendable, and await the results before touching UI state. Swift 6.2 is main-actor-by-default in new Xcode 26 projects, so use the @concurrent attribute or explicit actors to move inference off the main thread and keep the interface responsive.

9Why Lushbinary for Your AI iOS App

At Lushbinary, we build production iOS apps with on-device intelligence baked in from the start. We know where the Foundation Models framework shines, where a cloud model still earns its keep, and how to architect a hybrid feature that stays cheap, private, and fast. The patterns in this guide, guided generation, streaming, and tool calling, are ones we ship.

We do not just wire up an API. We help you pick the right model for each feature, design the concurrency so the UI never stalls, and build the fallback paths so the experience holds up on every device and in every region. The result is an AI feature that feels native because it is native.

On-device AI with Foundation Models: summarization, extraction, classification, and structured generation
Guided generation with @Generable and tool calling grounded in your real app data
Streaming UI that feels responsive on Swift 6.2 concurrency
Hybrid on-device plus cloud architecture when the task needs frontier reasoning
App Intents and Apple Intelligence integration for Siri and Spotlight
Security review: prompt-injection awareness and validation of model output before any action

Whether you are adding your first AI feature or modernizing an existing app, we can help you scope it correctly and build it right. If you are also thinking about who to bring on, our guide to hiring an iOS developer in the AI era covers what to look for.

🚀 Free Consultation

Have an AI feature in mind? Book a free 30-minute call with our iOS team. We'll tell you honestly whether on-device, cloud, or a hybrid is the right fit, and give you a clear scope and timeline. No sales pitch.

📚 Sources

Foundation Models Framework Documentation - Official Apple Developer documentation
WWDC 2025: Meet the Foundation Models framework - Apple Developer session video
Swift 6.2 Released - Official Swift.org announcement

Content was rephrased for compliance with licensing restrictions. API details sourced from official Apple Developer documentation as of June 2026. APIs may change - always verify against the current Apple docs.

Building an AI-Powered iOS App?

Tell us about your project. We'll review your idea and get back to you within 24 hours with an honest assessment of scope, architecture, and timeline.

Ready to Build Something Great?

Get a free 30-minute strategy call. We'll map out your project, timeline, and tech stack - no strings attached.

Let's Talk About Your Project

Prefer email? Reach us directly:

connect@lushbinary.com

Build an AI-Powered iOS App With Foundation Models

📋 Table of Contents

1What the Foundation Models Framework Is

2On-Device vs Cloud LLM: When to Use Which

3Architecture of an AI Feature

4Setting Up: A Basic Session

5Guided Generation with @Generable

6Streaming Responses

7Tool Calling: Letting the Model Use Your App

8Production Concerns

Swift 6.2 concurrency

Graceful fallback

Context window limits

Prompt injection and testing

❓ Frequently Asked Questions

Is the Foundation Models framework free?

What devices support Foundation Models?

When should I use a cloud LLM instead?

What are @Generable and @Guide used for?

How do I handle Swift 6.2 concurrency with a model session?

9Why Lushbinary for Your AI iOS App

Building an AI-Powered iOS App?

Ready to Build Something Great?

Contact Us

Ship On-Device AI Features

One Subscription. Every Flagship AI Model.

More from the Blog

Apple Foundation Models Framework: 2026 Swift Guide

SiriKit to App Intents: The Complete Migration Guide

ContactUs

Our Address

Phone

Email