deviceai

0.0.1indexed

On-device AI runtime enabling speech recognition, TTS, and local LLM inference with offline RAG, auto model downloads, streaming generation, and GPU acceleration for low-latency, privacy-preserving apps.

AndroidJVMNative·deviceai-labs/deviceai

Stars

—

Used by

dependents

—

Health

/ 100

DeviceAI

On-device AI for Android & iOS — speech recognition, text-to-speech, and LLM chat. Zero cloud latency, zero privacy risk. Optional cloud backend for OTA model updates, telemetry, and device management.

Install

Android (Kotlin)

// build.gradle.kts
implementation("dev.deviceai:core:0.0.1")
implementation("dev.deviceai:speech:0.0.1")   // STT + TTS
implementation("dev.deviceai:llm:0.0.1")      // LLM + RAG

iOS / macOS (Swift Package Manager)

Add the DeviceAI package to your Xcode project or Package.swift:

// Package.swift
dependencies: [
    .package(url: "https://github.com/deviceai-labs/deviceai", from: "0.0.1")
]

Then add the modules you need:

.target(
    name: "YourApp",
    dependencies: [
        .product(name: "DeviceAI", package: "deviceai"),
        .product(name: "DeviceAISpeech", package: "deviceai"),   // STT + TTS
        .product(name: "DeviceAILLM", package: "deviceai"),      // LLM + RAG
    ]
)

Or in Xcode: File → Add Package Dependencies → paste https://github.com/deviceai-labs/deviceai → select the modules you need.

Initialize

Android

class MyApp : Application() {
    override fun onCreate() {
        super.onCreate()
        PlatformStorage.initialize(this)
        DeviceAI.initialize(context = this)
    }
}

iOS / macOS

import DeviceAI

// Local mode — no cloud, fully offline
DeviceAI.initialize()

// With cloud backend (optional)
DeviceAI.initialize(apiKey: "<YOUR_API_KEY>") {
    $0.telemetry = .minimal
}

That's it. The SDK runs fully on-device with no backend required.

With cloud backend (optional)

Android:

DeviceAI.initialize(context = this, apiKey = "<YOUR_API_KEY>") {
    telemetry = TelemetryLevel.Minimal
    appVersion = BuildConfig.VERSION_NAME
}

iOS:

DeviceAI.initialize(apiKey: "<YOUR_API_KEY>") {
    $0.telemetry = .minimal
    $0.appVersion = Bundle.main.infoDictionary?["CFBundleShortVersionString"] as? String
}

The API key connects the SDK to the DeviceAI cloud backend. Device hardware (RAM, CPU, SoC) is detected automatically — no manual configuration needed.

Speech-to-Text

Android

SpeechBridge.initStt(modelPath, SttConfig(language = "en", useGpu = true))

// From raw audio samples
val text = SpeechBridge.transcribeAudio(samples)  // FloatArray, 16kHz mono

// From a WAV file
val textFromFile = SpeechBridge.transcribe("/path/to/audio.wav")

SpeechBridge.shutdownStt()

iOS

let engine = try await SttEngine(modelPath: path, config: .init(language: "en"))

// From raw audio samples
let text = try await engine.transcribe(samples: audioBuffer)  


 textFromFile    engine.transcribe(audioPath: )

engine.shutdown()

Text-to-Speech

Android

SpeechBridge.initTts(modelPath, tokensPath, TtsConfig(speechRate = 1.0f))

val pcm: ShortArray = SpeechBridge.synthesize("Hello from DeviceAI.")
// Play with AudioTrack

SpeechBridge.shutdownTts()

iOS

let tts = try await TtsEngine(modelPath: path, tokensPath: tokens)
let audio = try await tts.synthesize("Hello from DeviceAI")
tts.shutdown()

// Or use Apple's built-in voices (zero setup, no model download):
let systemTts = SystemTTSEngine()
try  systemTts.speak()

LLM Chat

Android

val session = DeviceAI.llm.chat("/path/to/model.gguf") {
    systemPrompt = "You are a helpful assistant."
    maxTokens = 512
    temperature = 0.7f
    useGpu = true
}

// Streaming (recommended for UI)
session.send("What is Kotlin?").collect { token -> print(token) }

// Multi-turn — history managed automatically
session.send("Give me an example.").collect { print(it) }

// Lifecycle
session.cancel()        // abort generation
session.clearHistory()  
session.close()

iOS

Offline RAG

Android

val store = BM25RagStore(rawChunks = listOf(
    "DeviceAI supports Android and iOS.",
    "LLM inference uses llama.cpp with Vulkan GPU."
))
val session = DeviceAI.llm.chat("/path/to/model.gguf") { ragStore = store }
session.send("What GPU does DeviceAI use?").collect { print(it) }

iOS

let store = BM25RagStore(chunks: [
    "DeviceAI supports Android and iOS.",
    "LLM inference uses llama.cpp with Metal GPU."
])
let session = try await ChatSession(modelPath: path) {
    .ragStore  store
}
   token   session.send() {
    (token, terminator: )
}

No embedding model needed — BM25 keyword retrieval runs entirely on-device.

Telemetry

When telemetry is enabled, the SDK automatically tracks performance metrics for all modules:

What's collected

Module	Metrics
STT	Model load time, transcription latency, audio duration (input_length_ms)

What's NEVER collected

Prompt or response text content
Audio recordings or transcript content
PII by default

Apps should avoid putting PII in appAttributes, since developer-provided attributes are sent in the capability profile.

Telemetry levels

Android:

DeviceAI.initialize(context = this, apiKey = "<YOUR_API_KEY>") {
    telemetry = TelemetryLevel.Off      // default — nothing sent
    telemetry = TelemetryLevel.Minimal  // model load/unload + inference metrics
    telemetry = TelemetryLevel.Full     // includes OTA downloads + manifest syncs
}

iOS:

DeviceAI.initialize(apiKey: "<YOUR_API_KEY>") {
    $0.telemetry = .off      // default — nothing sent
    $0.telemetry = .minimal  // model load/unload + inference metrics
    $0.telemetry = .full     // includes OTA downloads + manifest syncs
}

Events are batched on-device and delivered efficiently — respects Wi-Fi preference, data-saver mode, and flushes automatically when the app goes to background.

Custom telemetry sink

Route events to your own analytics instead of the DeviceAI backend:

Android:

DeviceAI.initialize(context = this, apiKey = "<YOUR_API_KEY>") {
    telemetry = TelemetryLevel.Minimal
    telemetrySink = object : TelemetrySink {
        override suspend fun ingest(events: List<TelemetryEvent>) {
            myAnalytics.track(events)
        }
    }
}

iOS:

DeviceAI.initialize(apiKey: "<YOUR_API_KEY>") {
    $0.telemetry = .minimal
    $0.telemetrySink = MyAnalyticsSink()  // conforms to TelemetrySink protocol
}

Cloud Backend

The SDK optionally connects to a cloud control plane. When an API key is provided:

No cloud calls are made without an API key. Local mode works fully offline.

Models

Whisper (STT)

LLM (GGUF via llama.cpp)

Browse LLM models with LlmCatalog. Download Whisper/TTS models via ModelRegistry.

Features

Platform support

Benchmarks

Device	SoC	Model	Audio	Inference	RTF
Redmi Note 9 Pro

RTF < 1.0 = faster than real-time. 0.14x = ~7× faster than real-time.

Building from source

Android

git clone https://github.com/deviceai-labs/deviceai.git
cd deviceai
make setup
./gradlew :kotlin:core:compileDebugKotlinAndroid
./gradlew :kotlin:speech:compileDebugKotlinAndroid
./gradlew :kotlin:llm:compileDebugKotlinAndroid

iOS (Swift)

git clone https://github.com/deviceai-labs/deviceai.git
cd deviceai

# Build XCFrameworks (requires Xcode + CMake)
./sdk/deviceai-commons/scripts/build-xcframeworks.sh

# Build the Swift package
cd swift
swift build

Sample App

# Android: Open samples/androidApp/ in Android Studio and run on device/emulator
# iOS: Open samples/iosApp/ in Xcode

Contributing

Issues and PRs welcome. Platform SDK contributions (flutter/, react-native/) are especially welcome.

License

Apache 2.0 — see LICENSE.

Related libraries

Surfaced from shared tags and platforms — no rankings paid for.

MediaPlayer-KMP★ 302

KhubaibKhan4Enables seamless YouTube video and audio playback across multiple platforms, integrating with JetBrains Compose Multiplatform. Features include authentication tokens, event handling, and reels view support.Shared: compose-multiplatform, compose, audio

kubriko★ 249

pandulapeterLightweight 2D game engine enables creation of simple games with modular plugins for scaling, object management, audio, shaders, and physics. Offers tools like Scene Editor and Debug Menu.Shared: compose-multiplatform, compose, audio

SaltAudioTag★ 161

MoriaflyCross-platform audio tag editor in early development, leveraging Compose for UI and kotlinx-io for IO operations, supporting FLAC format metadata reading and writing.Shared: file, compose-multiplatform, compose

kfswatch★ 135

irgalyMonitors file system changes across multiple directories, emitting events such as create, delete, and modify. Allows observing file events using flows, supports multiple platforms, and provides raw event access for debugging.Shared: kotlin-native, kotlin-flow, file

adk-kotlin★ 119

googleCode-first toolkit for building, evaluating, and deploying sophisticated AI agents; offers rich tool ecosystem, modular multi-agent orchestration, built-in development UI and cloud integrations.Shared: sdk, llm, ai

Rive-CMP★ 108

muazkadanIntegrates Rive animations with a unified API for Android and iOS, offering easy integration, native performance, state machine support, and flexible configuration options. Currently experimental.Shared: wrapper, compose-multiplatform, compose

Device registration	Automatic — hardware profile sent, capability tier assigned
Model manifest	Backend assigns the right model for each device tier, synced every 6h
OTA updates	Push new models with canary rollouts and instant kill-switch
Telemetry	Performance metrics batched and delivered (when enabled)
Device identity	Stable across reinstalls — same device always gets the same ID

`ggml-tiny.en.bin`	75 MB	7× real-time	English, mobile-first
`ggml-base.bin`	142 MB	Fast	Multilingual, balanced
`ggml-small.bin`	466 MB	Medium	Higher accuracy

SmolLM2-360M-Instruct (Q4)	~220 MB	Fastest, mobile-first
Qwen2.5-0.5B-Instruct (Q4)	~400 MB	Multilingual, compact
Llama-3.2-1B-Instruct (Q4)	~700 MB	Strong reasoning
SmolLM2-1.7B-Instruct (Q4)	~1 GB	Balanced

Speech-to-Text (whisper.cpp)	✅	✅
Text-to-Speech (sherpa-onnx VITS / Kokoro)	✅	✅
System TTS (Apple AVSpeechSynthesizer)	—	✅
Voice Activity Detection	✅	✅
LLM inference (llama.cpp, GGUF)	✅	✅
Streaming generation	✅	✅
Stateful multi-turn chat	✅	✅
Offline RAG (BM25)	✅	✅
Auto model download (HuggingFace)	✅	🗓
GPU acceleration	✅ Vulkan	✅ Metal
Cloud backend (registration, manifest, telemetry)	✅	✅
Auto hardware detection	✅	✅
Stable device identity (survives reinstall)	✅	✅
Telemetry (STT/TTS/LLM)	✅	✅
Custom telemetry sink	✅	✅
OTA model rollouts + kill switch	✅	✅
Flutter plugin	🗓	🗓
React Native module	🗓	🗓

Android (API 26+)	✅	✅	✅	0.0.1	Available
iOS 17+ / macOS 14+	✅	✅	✅	0.0.1	Available
Flutter	—	—	—	—	Planned
React Native	—	—	—	—	Planned