deviceai
0.0.1indexedOn-device AI runtime enabling speech recognition, TTS, and local LLM inference with offline RAG, auto model downloads, streaming generation, and GPU acceleration for low-latency, privacy-preserving apps.
On-device AI runtime enabling speech recognition, TTS, and local LLM inference with offline RAG, auto model downloads, streaming generation, and GPU acceleration for low-latency, privacy-preserving apps.
On-device AI for Android & iOS — speech recognition, text-to-speech, and LLM chat. Zero cloud latency, zero privacy risk. Optional cloud backend for OTA model updates, telemetry, and device management.
// build.gradle.kts
implementation("dev.deviceai:core:0.0.1")
implementation("dev.deviceai:speech:0.0.1") // STT + TTS
implementation("dev.deviceai:llm:0.0.1") // LLM + RAG
Add the DeviceAI package to your Xcode project or Package.swift:
// Package.swift
dependencies: [
.package(url: "https://github.com/deviceai-labs/deviceai", from: "0.0.1")
]
Then add the modules you need:
.target(
name: "YourApp",
dependencies: [
.product(name: "DeviceAI", package: "deviceai"),
.product(name: "DeviceAISpeech", package: "deviceai"), // STT + TTS
.product(name: "DeviceAILLM", package: "deviceai"), // LLM + RAG
]
)
Or in Xcode: File → Add Package Dependencies → paste https://github.com/deviceai-labs/deviceai → select the modules you need.
class MyApp : Application() {
override fun onCreate() {
super.onCreate()
PlatformStorage.initialize(this)
DeviceAI.initialize(context = this)
}
}
import DeviceAI
// Local mode — no cloud, fully offline
DeviceAI.initialize()
// With cloud backend (optional)
DeviceAI.initialize(apiKey: "<YOUR_API_KEY>") {
$0.telemetry = .minimal
}
That's it. The SDK runs fully on-device with no backend required.
Android:
DeviceAI.initialize(context = this, apiKey = "<YOUR_API_KEY>") {
telemetry = TelemetryLevel.Minimal
appVersion = BuildConfig.VERSION_NAME
}
iOS:
DeviceAI.initialize(apiKey: "<YOUR_API_KEY>") {
$0.telemetry = .minimal
$0.appVersion = Bundle.main.infoDictionary?["CFBundleShortVersionString"] as? String
}
The API key connects the SDK to the DeviceAI cloud backend. Device hardware (RAM, CPU, SoC) is detected automatically — no manual configuration needed.
SpeechBridge.initStt(modelPath, SttConfig(language = "en", useGpu = true))
// From raw audio samples
val text = SpeechBridge.transcribeAudio(samples) // FloatArray, 16kHz mono
// From a WAV file
val textFromFile = SpeechBridge.transcribe("/path/to/audio.wav")
SpeechBridge.shutdownStt()
let engine = try await SttEngine(modelPath: path, config: .init(language: "en"))
// From raw audio samples
let text = try await engine.transcribe(samples: audioBuffer)
textFromFile engine.transcribe(audioPath: )
engine.shutdown()
Powered by whisper.cpp. Runs 7× faster than real-time on mid-range hardware.
SpeechBridge.initTts(modelPath, tokensPath, TtsConfig(speechRate = 1.0f))
val pcm: ShortArray = SpeechBridge.synthesize("Hello from DeviceAI.")
// Play with AudioTrack
SpeechBridge.shutdownTts()
let tts = try await TtsEngine(modelPath: path, tokensPath: tokens)
let audio = try await tts.synthesize("Hello from DeviceAI")
tts.shutdown()
// Or use Apple's built-in voices (zero setup, no model download):
let systemTts = SystemTTSEngine()
try systemTts.speak()
Powered by sherpa-onnx. Supports VITS and Kokoro voice models.
val session = DeviceAI.llm.chat("/path/to/model.gguf") {
systemPrompt = "You are a helpful assistant."
maxTokens = 512
temperature = 0.7f
useGpu = true
}
// Streaming (recommended for UI)
session.send("What is Kotlin?").collect { token -> print(token) }
// Multi-turn — history managed automatically
session.send("Give me an example.").collect { print(it) }
// Lifecycle
session.cancel() // abort generation
session.clearHistory()
session.close()
Powered by llama.cpp. Supports any GGUF model. Vulkan GPU on Android, Metal GPU on iOS.
val store = BM25RagStore(rawChunks = listOf(
"DeviceAI supports Android and iOS.",
"LLM inference uses llama.cpp with Vulkan GPU."
))
val session = DeviceAI.llm.chat("/path/to/model.gguf") { ragStore = store }
session.send("What GPU does DeviceAI use?").collect { print(it) }
let store = BM25RagStore(chunks: [
"DeviceAI supports Android and iOS.",
"LLM inference uses llama.cpp with Metal GPU."
])
let session = try await ChatSession(modelPath: path) {
.ragStore store
}
token session.send() {
(token, terminator: )
}
No embedding model needed — BM25 keyword retrieval runs entirely on-device.
When telemetry is enabled, the SDK automatically tracks performance metrics for all modules:
| Module | Metrics |
|---|---|
| STT | Model load time, transcription latency, audio duration (input_length_ms) |
Apps should avoid putting PII in
appAttributes, since developer-provided attributes are sent in the capability profile.
Android:
DeviceAI.initialize(context = this, apiKey = "<YOUR_API_KEY>") {
telemetry = TelemetryLevel.Off // default — nothing sent
telemetry = TelemetryLevel.Minimal // model load/unload + inference metrics
telemetry = TelemetryLevel.Full // includes OTA downloads + manifest syncs
}
iOS:
DeviceAI.initialize(apiKey: "<YOUR_API_KEY>") {
$0.telemetry = .off // default — nothing sent
$0.telemetry = .minimal // model load/unload + inference metrics
$0.telemetry = .full // includes OTA downloads + manifest syncs
}
Events are batched on-device and delivered efficiently — respects Wi-Fi preference, data-saver mode, and flushes automatically when the app goes to background.
Route events to your own analytics instead of the DeviceAI backend:
Android:
DeviceAI.initialize(context = this, apiKey = "<YOUR_API_KEY>") {
telemetry = TelemetryLevel.Minimal
telemetrySink = object : TelemetrySink {
override suspend fun ingest(events: List<TelemetryEvent>) {
myAnalytics.track(events)
}
}
}
iOS:
DeviceAI.initialize(apiKey: "<YOUR_API_KEY>") {
$0.telemetry = .minimal
$0.telemetrySink = MyAnalyticsSink() // conforms to TelemetrySink protocol
}
The SDK optionally connects to a cloud control plane. When an API key is provided:
No cloud calls are made without an API key. Local mode works fully offline.
Browse LLM models with LlmCatalog. Download Whisper/TTS models via ModelRegistry.
| Device | SoC | Model | Audio | Inference | RTF |
|---|---|---|---|---|---|
| Redmi Note 9 Pro |
RTF < 1.0 = faster than real-time. 0.14x = ~7× faster than real-time.
git clone https://github.com/deviceai-labs/deviceai.git
cd deviceai
make setup
./gradlew :kotlin:core:compileDebugKotlinAndroid
./gradlew :kotlin:speech:compileDebugKotlinAndroid
./gradlew :kotlin:llm:compileDebugKotlinAndroid
git clone https://github.com/deviceai-labs/deviceai.git
cd deviceai
# Build XCFrameworks (requires Xcode + CMake)
./sdk/deviceai-commons/scripts/build-xcframeworks.sh
# Build the Swift package
cd swift
swift build
# Android: Open samples/androidApp/ in Android Studio and run on device/emulator
# iOS: Open samples/iosApp/ in Xcode
Issues and PRs welcome. Platform SDK contributions (flutter/, react-native/) are especially welcome.
Apache 2.0 — see LICENSE.
let session = try await ChatSession(modelPath: path) {
$0.systemPrompt = "You are a helpful assistant."
$0.maxTokens = 512
$0.temperature = 0.7
}
// Streaming
for try await token in try session.send("What is Swift?") {
print(token, terminator: "")
}
// Multi-turn — history managed automatically
for try await token in try session.send("Give me an example.") {
print(token, terminator: "")
}
// Lifecycle
session.cancel() // abort generation
session.clearHistory() // fresh conversation
session.close() // unload model
| Model load time, synthesis latency, text length (output_chars) |
| LLM | Model load time, inference latency, time-to-first-token, tokens/sec, token counts, finish reason |
| Feature | What happens |
|---|
| Device registration | Automatic — hardware profile sent, capability tier assigned |
| Model manifest | Backend assigns the right model for each device tier, synced every 6h |
| OTA updates | Push new models with canary rollouts and instant kill-switch |
| Telemetry | Performance metrics batched and delivered (when enabled) |
| Device identity | Stable across reinstalls — same device always gets the same ID |
| Model | Size | Speed | Best for |
|---|
ggml-tiny.en.bin | 75 MB | 7× real-time | English, mobile-first |
ggml-base.bin | 142 MB | Fast | Multilingual, balanced |
ggml-small.bin | 466 MB | Medium | Higher accuracy |
| Model | Size | Best for |
|---|
| SmolLM2-360M-Instruct (Q4) | ~220 MB | Fastest, mobile-first |
| Qwen2.5-0.5B-Instruct (Q4) | ~400 MB | Multilingual, compact |
| Llama-3.2-1B-Instruct (Q4) | ~700 MB | Strong reasoning |
| SmolLM2-1.7B-Instruct (Q4) | ~1 GB | Balanced |
| Feature | Android | iOS |
|---|
| Speech-to-Text (whisper.cpp) | ✅ | ✅ |
| Text-to-Speech (sherpa-onnx VITS / Kokoro) | ✅ | ✅ |
| System TTS (Apple AVSpeechSynthesizer) | — | ✅ |
| Voice Activity Detection | ✅ | ✅ |
| LLM inference (llama.cpp, GGUF) | ✅ | ✅ |
| Streaming generation | ✅ | ✅ |
| Stateful multi-turn chat | ✅ | ✅ |
| Offline RAG (BM25) | ✅ | ✅ |
| Auto model download (HuggingFace) | ✅ | 🗓 |
| GPU acceleration | ✅ Vulkan | ✅ Metal |
| Cloud backend (registration, manifest, telemetry) | ✅ | ✅ |
| Auto hardware detection | ✅ | ✅ |
| Stable device identity (survives reinstall) | ✅ | ✅ |
| Telemetry (STT/TTS/LLM) | ✅ | ✅ |
| Custom telemetry sink | ✅ | ✅ |
| OTA model rollouts + kill switch | ✅ | ✅ |
| Flutter plugin | 🗓 | 🗓 |
| React Native module | 🗓 | 🗓 |
| Platform | STT | TTS | LLM | Version | Status |
|---|
| Android (API 26+) | ✅ | ✅ | ✅ | 0.0.1 | Available |
| iOS 17+ / macOS 14+ | ✅ | ✅ | ✅ | 0.0.1 | Available |
| Flutter | — | — | — | — | Planned |
| React Native | — | — | — | — | Planned |
| Snapdragon 720G |
| whisper-tiny.en |
| 5.4s |
| 746ms |
| 0.14x |
Surfaced from shared tags and platforms — no rankings paid for.