ktoken
0.4.0indexedBPE tokenizer designed for seamless integration with OpenAI models, supporting local and remote modes for encoding management. Offers multiplatform compatibility and flexible setup options.
BPE tokenizer designed for seamless integration with OpenAI models, supporting local and remote modes for encoding management. Offers multiplatform compatibility and flexible setup options.
Ktoken is a BPE tokenizer designed for seamless integration with OpenAI's models.
Install Ktoken by adding the dependency to your build.gradle file:
repositories {
mavenCentral()
}
dependencies {
implementation "com.aallam.ktoken:ktoken:0.4.0"
}
val tokenizer = Tokenizer.of(encoding = Encoding.CL100K_BASE)
// For a specific model in the OpenAI API:
val tokenizer = Tokenizer.of(model = "gpt-4")
val tokens = tokenizer.encode("hello world")
val text = tokenizer.decode(listOf(15339, 1917))
Ktoken operates in two modes: Local (default for JVM) and Remote (default for JS/Native).
Utilize LocalPbeLoader to retrieve encodings from local files:
val tokenizer = Tokenizer.of(encoding = Encoding.CL100K_BASE, loader = LocalPbeLoader(FileSystem.SYSTEM))
// For a specific model in the OpenAI API:
val tokenizer = Tokenizer.of(model = "gpt-4", loader = LocalPbeLoader(FileSystem.SYSTEM))
Artifacts for JVM include encoding files. Use FileSystem.RESOURCES to load them:
val tokenizer = Tokenizer.of(encoding = Encoding.CL100K_BASE, loader = LocalPbeLoader(FileSystem.RESOURCES))
Note: this is the default behavior for JVM.
RemoteBpeLoader: To load encoding from remote sources:val tokenizer = Tokenizer.of(encoding = Encoding.CL100K_BASE, loader = RemoteBpeLoader())
// For a specific model in the OpenAI API:
val tokenizer = Tokenizer.of(model = "gpt-4", loader = RemoteBpeLoader())
You might alternatively use ktoken-bom by adding the following dependency to your build.gradle file:
dependencies {
// Import Kotlin API client BOM
implementation platform('com.aallam.ktoken:ktoken-bom:0.4.0')
// Define dependencies without versions
implementation 'com.aallam.ktoken:ktoken'
runtimeOnly 'io.ktor:ktor-client-okhttp'
}
For multiplatform projects, add the ktoken dependency to commonMain, and select an engine for each target.
Ktoken is open-source software and distributed under the MIT license. This project is not affiliated with nor endorsed by OpenAI.
Surfaced from shared tags and platforms — no rankings paid for.