protokt

Protocol Buffer compiler and runtime for Kotlin.
Supports proto3 and editions but not proto2 features or their ported versions within editions.
Overview
Features
Not yet implemented
Compatibility
The Gradle plugin requires Java 17+ and Gradle 8.0+. It runs on recent versions of
MacOS, Linux, and Windows.
The runtime and generated code are compatible with Kotlin 2.1+, Java 17+, and Android 5.0+.
Usage
See examples in testing. The Gradle plugin is implemented as a custom
protoc plugin provided to the protobuf-gradle-plugin;
see its readme for proto source and other configuration options. The default proto
source directory is src/main/proto.
Gradle
plugins {
id("com.toasttab.protokt.v1") version "<version>"
}
or
buildscript {
dependencies {
classpath("com.toasttab.protokt.v1:protokt-gradle-plugin:<version>")
}
}
apply(plugin = "com.toasttab.protokt.v1")
This will automatically download and install protokt, apply the Google protobuf
plugin, and configure all the necessary boilerplate. By default it will also add
protokt-core to the api scope of the project.
By default the plugin auto-detects the best codec for your project type and adds
protokt-runtime-persistent-collections dependencies automatically. You can
customize which codec and collection implementation to use with the
codec and collections DSLs described below.
If your project has no Java code you may run into the following error:
Execution failed for task ':compileJava'.
> error: no source files
To work around it, disable all JavaCompile tasks in the project:
tasks.withType<JavaCompile> {
enabled = false
}
Generated Code
Generated code is placed in <buildDir>/generated/source/proto/main.
A simple example:
syntax = "proto3";
package toasttab.protokt.sample;
message Sample {
string sample_field = 1;
}
will produce:
Construct your protokt object like so:
Sample {
sampleField = "some-string"
}
Why not expose a public constructor or use a data class? One of the design goals
of protocol buffers is that protobuf definitions can be modified in
backwards-compatible ways without breaking wire or API compatibility of existing
code. Using a DSL to construct the object emulates named arguments and allows
shuffling of protobuf fields within a definition without breaking code as would
happen for a standard constructor or method call.
The canonical copy method on data classes is emulated via a generated copy
method:
val sample = Sample { sampleField = "some-string" }
val sample2 = sample.copy { sampleField = "some-other-string" }
Assigning a Map or List in the DSL makes a copy of that collection to prevent
any escaping mutability of the provided collection. The Java protobuf
implementation takes a similar approach; it only exposes mutation methods on the
builder and not assignment. Mutating the builder does a similar copy operation.
Runtime Modules
The protokt runtime is split into modules so that external dependencies are opt-in.
protokt-runtime (core)
Zero external dependencies. Contains implementations for serializing and deserializing
to byte arrays.
protokt-runtime-protobuf-java
Provides ProtobufJavaCodec, implementing serialization and deserialization to and from
OutputStream and InputStream.
dependencies {
implementation("com.toasttab.protokt.v1:protokt-runtime-protobuf-java:<version>")
}
protokt-runtime-kotlinx-io
Provides KotlinxIoCodec, implementing serialization and deserialization to and from
Source and Sink. On the JVM it also supports InputStream and OutputStream.
Provides OptimalKmpCodec, which blends ProtoktCodec for byte-array paths with
KotlinxIoCodec for streaming paths. Best for multiplatform projects.
dependencies {
implementation("com.toasttab.protokt.v1:protokt-runtime-kotlinx-io:<version>")
}
protokt-runtime-protobufjs
Provides ProtobufJsCodec for Kotlin/JS targets using the protobufjs npm package.
protokt-runtime-persistent-collections
Provides PersistentCollectionFactory backed by kotlinx-collections-immutable. See
Persistent collections below.
dependencies {
implementation("com.toasttab.protokt.v1:protokt-runtime-persistent-collections:<version>")
}
Codec DSL
The codec DSL controls which codec module the plugin adds to your project's
dependencies. The default is optimal(), which auto-detects the best codec
based on your Kotlin plugin type:
protokt {
codec {
optimal()
}
}
If you are generating a library for export, minimal() is likely the right
choice so that consumers can select their own codec without extra dependencies.
Codec selection
At runtime on JVM, the best available codec is auto-detected from the classpath
in the following order:
OptimalJvmCodec (from protokt-runtime-protobuf-java)
You can override auto-detection with a system property or environment variable:
-Dprotokt.v1.codec=protokt.v1.ProtobufJavaCodec
or
PROTOKT_V1_CODEC=protokt.v1.ProtobufJavaCodec
The codec class is loaded reflectively and must be a Kotlin object implementing Codec.
Available codecs:
See benchmarks/RESULTS.md for detailed performance
comparisons across codecs, protobuf-java, and Wire.
InputStream / OutputStream support
Message.serialize(OutputStream) and Deserializer.deserialize(InputStream) are available
on JVM but require a codec that implements JvmCodec. ProtoktCodec (the default) does
not implement JvmCodec - configure , ,
, or to use these methods.
Runtime Notes
Package
The Kotlin package of a generated file is the protobuf package prefixed with
protokt.v1. This scheme allows protokt-generated files to coexist on the
classpath with files generated by other compilers.
Message
Each protokt message implements the KtMessage interface. KtMessage defines
the serialize() method and its overloads which can serialize to a byte array
or an OutputStream.
Each protokt message has a companion object Deserializer that implements the
KtDeserializer interface, which provides the deserialize() method and its
overloads to construct an instance of the message from a byte array, a Java
InputStream, or others.
Enums
Representation
Protokt represents enum fields as sealed classes with an integer value and name.
Protobuf enums cannot be represented as Kotlin enum classes since Kotlin enum
classes are closed and cannot represent unknown values. The Protocol Buffers
specification requires that unknown enum values are preserved for
reserialization, so this compromise enables exhaustive case switching while
allowing representation of unknown values.
Naming
To keep enums ergonomic while promoting protobuf best practices, enums that have
all values
prefixed with the enum type name
will have that prefix stripped in their Kotlin representations.
Reflection
Descriptors
Protokt generates and embeds descriptors for protobuf files in its output by default. Generation can be disabled
while using the lite runtime:
protokt {
generate {
descriptors = false
}
}
Interop with protobuf-java
Protokt includes utilities to reflectively
(i.e., no-copy) convert a protokt.v1.Message to a com.google.protobuf.Message. Conversion requires that you specify
the RuntimeContext of your proto files. If you would like to scan your classpath for all known descriptors at runtime,
you may use Protokt's GeneratedFileDescriptor annotation to do so:
Collections DSL
The collections DSL controls which collection factory module the plugin adds
to your project's dependencies. The default is persistent(), which adds
protokt-runtime-persistent-collections:
protokt {
collections {
persistent()
}
}
As with the codec DSL, minimal() is likely the right choice for libraries to
avoid imposing a collection implementation on consumers.
Persistent collections
By default, deserialized repeated and map fields use unmodifiable
collections that are expensive to copy. When you use the copy {} DSL to
append to pre-populated collections (e.g. field = field + element), each +
copies the entire collection, costing O(n) per append.
With the default persistent() collections setting, the backing implementation
uses PersistentList and PersistentMap from
kotlinx-collections-immutable.
These use tree-based structural sharing so that + inside a copy {} block
runs in O(log n) instead of O(n).
At runtime on JVM, PersistentCollectionFactory is auto-detected from the
classpath if present; otherwise DefaultCollectionFactory is used.
Workloads that incrementally build up repeated or map fields via copy {} on messages that
already have large collections may benefit from this option. Benchmarks show
up to 47x speedup on list appends and 150x on map puts for pre-populated
messages.
Deserialization is ~5-7% slower for large messages because persistent list construction has
more overhead than regular mutable lists. Serialization is marginally slower for large messages.
If your workload is dominated by deserialize-then-read-only access patterns, use
collections { default() } for the built-in unmodifiable collections.
You can override auto-detection with a system property or environment variable:
-Dprotokt.v1.collection.factory=protokt.v1.PersistentCollectionFactory
or
PROTOKT_V1_COLLECTION_FACTORY=protokt.v1.PersistentCollectionFactory
You can also supply any custom CollectionFactory implementation by fully
qualified class name (the class must be a Kotlin object).
Other Notes
Extensions
See extension options defined in
protokt.proto.
See examples of each option in the options
project. All protokt-specific options require importing protokt/v1/protokt.proto
in the protocol file.
Wrapper Types
Sometimes a field on a protobuf message corresponds to a concrete nonprimitive
type. In standard protobuf the user would be responsible for this extra
transformation, but the protokt wrapper type option allows specification of a
converter that will automatically encode and decode custom types to protobuf
types. Some standard types are implemented in extensions.
Wrap a field by invoking the (protokt.v1.property).wrap option:
message WrapperMessage {
google.protobuf.Timestamp instant = 1 [
(protokt.v1.property).wrap = "java.time.Instant"
];
}
Converters implement the
Converter
interface:
interface Converter<WireT : Any, KotlinT : Any> {
val wrapper: KClass<KotlinT>
val wrapped: KClass<WireT>
fun wrap(unwrapped: WireT): KotlinT
fun unwrap(wrapped: KotlinT): WireT
}
and protokt will reference the converter's methods to wrap and unwrap from
protobuf primitives:
object InstantConverter : AbstractConverter<Timestamp, Instant>() {
override fun wrap(unwrapped: Timestamp): Instant =
Instant.ofEpochSecond(unwrapped.seconds, unwrapped.nanos.toLong())
override fun unwrap(wrapped: Instant) =
Timestamp {
seconds = wrapped.epochSecond
nanos = wrapped.nano
}
}
All wrapper types use lazy conversion via LazyReference. The wire-form value
is stored at deserialization time and only converted to the Kotlin type on first
access. This means:
- Deserialization never invokes the converter's
wrap() — it only reads the
raw wire value
- Serialization uses
wireValue() to write the original wire form without
re-encoding
- Conversion exceptions (e.g. malformed data) are deferred to the point of
access rather than thrown during deserialization
Each converter must be registered in a
META-INF/services/protokt.v1.Converter
classpath resource following the standard ServiceLoader convention. For
example, Google's AutoService
can register converters with an annotation:
@SuppressWarnings("rawtypes")
@AutoService(Converter::class)
object InstantConverter : Converter<Instant, Timestamp> { ... }
If the wrapper type is in the same package as the generated protobuf message,
then it does not need a fully-qualified name. Custom wrapper type converters can
be in the same project as protobuf types that reference them. In order to use any
wrapper type defined in extensions, the project must be included as a
dependency:
dependencies {
protoktExtensions("com.toasttab.protokt:protokt-extensions:<version>")
}
Wrapper types that wrap protobuf messages are nullable because message fields
have no presence in proto3 unless present on the wire. For example,
java.time.Instant wraps the well-known type google.protobuf.Timestamp and
generates val instant: Instant?. You can generate non-null accessors with the
generate_non_null_accessor option described below.
Wrapper types that wrap protobuf primitives are non-null. For example,
java.util.UUID wrapping bytes generates val uuid: UUID. Conversion from
the wire default value (e.g. empty bytes) to the Kotlin type is deferred until
the property is accessed, so a converter that rejects default values (like
UuidBytesConverter which requires exactly 16 bytes) will throw at access time
rather than at deserialization time:
bytes uuid = 1 [
(protokt.v1.property).wrap = "java.util.UUID"
];
// generates: val uuid: UUID (non-null, lazily converted)
optional primitive wrapper fields are nullable — null represents absence:
optional bytes optional_uuid = 2 [
(protokt.v1.property).wrap = "java.util.UUID"
];
// generates: val optionalUuid: UUID? (null when absent)
Well-known wrapper message types (e.g. BytesValue) are also nullable:
google.protobuf.BytesValue nullable_uuid = 3 [
(protokt.v1.property).wrap = "java.util.UUID"
];
// generates: val nullableUuid: UUID? (null when absent)
As for message types, you can generate non-null accessors with the
generate_non_null_accessor option.
Wrapper types can be repeated:
repeated bytes uuid = 1 [
(protokt.v1.property).wrap = "java.util.UUID"
];
And they can also be used for map keys and values:
map<string, protokt.ext.InetSocketAddress> map_string_socket_address = 1 [
(protokt.v1.property).key_wrap = "StringBox",
(protokt.v1.property).value_wrap = "java.net.InetSocketAddress"
];
Wrapper types should be immutable. If a wrapper type is defined in the same
package as generated protobuf message that uses it, then it does not need to
be referenced by its fully-qualified name and instead can be referenced by its
simple name, as done with StringBox in the map example above.
N.b. Well-known type nullability is implemented with
predefined wrapper types
for each message defined in
wrappers.proto.
Non-null accessors
If a message has no meaning whatsoever when a particular non-scalar field is
missing, you can emulate proto2's required key word by using the
(protokt.v1.property).generate_non_null_accessor option:
message Sample {}
message NonNullSampleMessage {
Sample non_null_sample = 1 [
(protokt.v1.property).generate_non_null_accessor = true
];
}
Generated code will include a non-null accessor prefixed with require, so the field can be referenced
without using Kotlin's !!.
This option also works on oneof fields:
message Sample {}
message NonNullSampleMessage {
oneof non_null_sample {
option (protokt.v1.oneof).generate_non_null_accessor = true;
Sample sample = 1;
}
}
Interface implementation
Messages
To avoid the need to create domain-specific objects from protobuf messages you
can declare that a protobuf message implements a custom interface with
properties and default methods.
package com.protokt.sample
interface Model {
val id: String
}
package protokt.sample;
message ImplementsSampleMessage {
option (protokt.v1.class).implements = "Model";
string id = 1;
}
Like wrapper types, if the implemented interface is in the same package as the
generated protobuf message that uses it, then it does not need to be referenced
by its fully-qualified name. Implemented interfaces cannot be used by protobuf
messages in the same project that defines them; the dependency must be declared
with protoktExtensions in build.gradle.kts:
dependencies {
protoktExtensions(project(":api-project"))
}
Messages can also implement interfaces by delegation to one of their fields;
in this case the delegated interface need not live in a separate project, as
protokt requires no inspection of it:
message ImplementsWithDelegate {
option (protokt.v1.class).implements = "Model2 by modelTwo";
ImplementsModel2 model_two = 1 [
(protokt.v1.property).generate_non_null_accessor = true
];
}
Note that the by clause references the field by its lower camel case name.
Properties on delegate interfaces must be nullable since fields themselves
may not be present on the wire.
Oneof Fields
Oneof fields can declare that they implement an interface with the
(protokt.v1.oneof).implements option. Each possible field type of the oneof must
also implement the interface. This allows access of common properties without a
when statement that always ultimately extracts the same property.
Suppose you have a domain object MyObjectWithConfig that has a configuration
that specifies a third-party server for communication. For flexibility, this
configuration will be modifiable by the server and versioned by a simple integer.
To hasten subsequent loading of the configuration, a client may save a resolved
version of the configuration with the same version and an additional field
storing an InetAddress representing the location of the server. Since the
server address may change over time, the client-resolved version of the config will
retain a copy of the original server copy. We can model this domain with protokt:
Given the Config interface:
package com.toasttab.example
interface Config {
val version: Int
}
And protobuf definitions:
syntax = "proto3";
package toasttab.example;
import "protokt/v1/protokt.proto";
message MyObjectWithConfig {
bytes id = 1 [
(protokt.v1.property).wrap = "java.util.UUID"
];
oneof Config {
option (protokt.v1.oneof).implements = "com.toasttab.example.Config";
ServerSpecified server_specified = 2;
ClientResolved client_resolved = 3;
}
}
message ServerSpecified {
option (protokt.v1.class).implements = "com.toasttab.example.Config";
int32 version = 1;
string server_registry = 2;
string server_name = 3;
}
message ClientResolved {
option (protokt.v1.class).implements = "com.toasttab.example.Config by config";
ServerSpecified config = 1;
bytes last_known_address = 2 [
(protokt.v1.property).wrap = "java.net.InetAddress"
];
}
Protokt will generate:
A MyObjectWithConfig.Config instance can be queried for its version without
accessing the property via a when expression:
fun printVersion(config: MyObjectWithConfig.Config) {
println(config?.version)
}
BytesSlice
When reading messages that contain other serialized messages as bytes fields,
protokt can keep a reference to the originating byte array to prevent a large
copy operation on deserialization. This can be desirable when the wrapping
message is short-lived or a thin metadata shim and doesn't include much memory
overhead:
message SliceModel {
int64 version = 1;
bytes encoded_message = 2 [
(protokt.v1.property).bytes_slice = true
];
}
gRPC code generation
Protokt will generate variations of code for gRPC method and service descriptors
when the gRPC generation options are enabled:
protokt {
generate {
grpcDescriptors = true
grpcKotlinStubs = true
}
}
The options can be enabled independently of each other.
Generated gRPC code
grpcDescriptors
Consider gRPC's canonical Health service:
syntax = "proto3";
package grpc.health.v1;
message HealthCheckRequest {
string service = 1;
}
message HealthCheckResponse {
enum ServingStatus {
UNKNOWN = 0;
SERVING = 1;
NOT_SERVING = 2;
}
ServingStatus status = 1;
}
service Health {
rpc Check(HealthCheckRequest) returns (HealthCheckResponse);
}
In addition to the request and response types, protokt will generate a service
descriptor and method descriptors for each method on the service:
Both grpc-java and grpc-kotlin expose server stubs for implementation via
abstract classes.
grpcKotlinStubs and gRPC's Kotlin API
Protokt uses grpc-kotlin to generate Kotlin coroutine-based stubs that compile
against protokt's generated types.
Integrating with gRPC's Java API
A gRPC service using grpc-java (and therefore using StreamObservers for
asynchronous communication):
abstract class HealthCheckService : BindableService {
override fun bindService() =
ServerServiceDefinition.builder(serviceDescriptor)
.addMethod(checkMethod, asyncUnaryCall(::check))
.build()
open : =
UNIMPLEMENTED.asException()
}
Calling methods from a client:
fun checkHealth(): HealthCheckResponse =
ClientCalls.blockingUnaryCall(
channel.newCall(HealthGrpc.checkMethod, CallOptions.DEFAULT),
HealthCheckRequest { service = "foo" }
)
Integrating with gRPC's NodeJS API
Protokt generates complete server and client stub implementations for use with NodeJS.
The generated implementations are nearly the same as those generated by grpc-kotlin and
are supported by an analogous runtime library in ServerCalls and ClientCalls objects.
These implementations are alpha-quality and for demonstration only. External contributions
to harden the implementation are welcome. They use the same grpcDescriptors and
grpcKotlinStubs plugin options to control code generation.
Integrating with kotlinx-rpc
Protokt can generate @Grpc-annotated interfaces for use with JetBrains'
kotlinx-rpc compiler plugin. The
kotlinx-rpc compiler plugin synthesizes stubs, service descriptors, and method
descriptors from these interfaces at compile time.
Enable generation with the grpcKrpc option:
protokt {
generate {
grpcKrpc = true
}
}
The project must also apply the kotlinx-rpc Gradle plugin and depend on
kotlinx-rpc-grpc-client/kotlinx-rpc-grpc-server. See the
grpc-krpc example for a complete working setup.
Protokt messages are bridged to kotlinx-rpc's marshaller system via kotlinx-io
serialization. On JVM, kotlinx-rpc delegates to grpc-java for transport. On
Kotlin/Native, kotlinx-rpc uses gRPC C Core via cinterop.
This integration is experimental and tracks kotlinx-rpc's dev preview releases.
Protovalidate integration
Add the protokt-protovalidate dependency, build a Validator, load descriptors, and
validate messages.
import protokt.v1.buf.validate.Validator
val validator = Validator()
foo_file_descriptor
.toProtobufJavaDescriptor()
.messageTypes
.forEach(validator::load)
val result = validator.validate(instanceOfFoo)
Build a gRPC interceptor following the example of protovalidate-java.
IntelliJ integration
If IntelliJ doesn't automatically detect the generated files as source files,
you may be missing the idea plugin. Apply the idea plugin to your Gradle
project:
plugins {
idea
}
Command line code generation
protokt % ./gradlew assemble
protokt % protoc \
--plugin=protoc-gen-custom=protokt-codegen/build/install/protoc-gen-protokt/bin/protoc-gen-protokt \
--custom_out=<output-directory> \
-I<path-to-proto-file-containing-directory> \
-Iprotokt-runtime/src/main/resources \
<path-to-proto-file>.proto
For example, to generate files in protokt/foo from a file called test.proto
located at protokt/test.proto:
protokt % protoc \
--plugin=protoc-gen-custom=protokt-codegen/build/install/protoc-gen-protokt/bin/protoc-gen-protokt \
--custom_out=foo \
-I. \
-Iprotokt-runtime/src/main/resources \
test.proto
Contribution
Community contributions are welcome. See the
contribution guidelines and the project
code of conduct.
To enable rapid development of the code generator, the protobuf conformance
tests have been compiled and included in the testing project. They run on
macOS and Linux as part of normal Gradle builds.
Integration tests run on Linux, macOS, and Windows across Kotlin 2.1-2.3 and
JDK 17, 21, and 25.
When integration testing the Gradle plugin, note that after changing the plugin
and republishing it to the integration repository, ./gradlew clean is needed
to trigger regeneration of the protobuf files with the fresh plugin.
Acknowledgements
Authors
Ben Gordon,
Andrew Parmet,
Oleg Golberg,
Frank Moda,
Romey Sklar, and
everyone in the commit history.
Thanks to the Google Kotlin team for their
Kotlin API Design
which inspired the DSL builder implemented in this library.