Modeling Simple Data in Swift: struct versus typealias

There are a couple of ways to model data in Swift and which one to pick varies based on your needs and API design. A struct is often a good way to model data, no matter how simple that data may be.

Sometimes the data is extremely simple, in the case of the POSIX socket API wrapper we came across in my previous blog post, it was as simple as defining a new type to represent just a single byte, i.e. eight bits. It doesn’t get much more basic than that.

To recap, the C function for sending data over a socket takes an UnsafePointer<Void> to a chunk of memory, plus an Int for the length indicating the number of bytes it should send:

func send(socket: Int32, buffer: UnsafePointer<Void>, length: Int, flags: Int32) -> Int

On the other hand, our Socket.send method wrapped that function like this:

struct Socket {
    func send(buffer: UnsafeBufferPointer<Byte>, flags: Int32 = 0) throws -> Int
}

This blog post explains what Byte is and why it is the way it is.

Void of Information

In C APIs, what I don’t like about a void * parameter – UnsafePointer<Void> in Swift – is that it doesn’t carry any information on what the thing is that the pointer points to. That is often intentional, of course, because the function that declares to take a void * simply doesn’t care. An additional length parameter defines the number of bytes that should be read from the memory where the pointer points to. But how do we know that it’s the number of bytes and not the number of word sized things, i.e. 32 or 64 bits depending on the architecture? After all, a void * is word sized and that is the type of the preceding parameter.

Actually, I don’t remember how I came to know about this fact the first time I dealt with such a function. It sure wasn’t the send() man page because literally all it says on this topic is: “The length of the message is given by length”. Thank you, man page, how very zen of you.

Maybe it’s just me, but I find this lack of information in the function signature unsatisfying, if not confusing. That’s why I want our socket wrapper API to be more expressive: It should clearly state that it’s working on a bunch of bytes. So we need a Byte type.

Representing a Byte

Since the concept of a byte doesn’t necessarily have anything to do with negative numbers, a common choice for representing it – regardless of the programming language – is by using an unsigned variant of an eight bit integer, that is UInt8 in Swift.

So we want a UInt8 to mean “a byte” in our API. How can we communicate that in a Swift API that takes an UnsafePointer to the bytes plus a length parameter? Three possibilities come to mind:

  • Using UInt8 in the function signature, and documenting it using parameter names and comments:
/// Does work on the given bytes.
///
/// - Parameter pointerToBytes: The bytes to work on.
/// - Parameter length: The number of bytes `pointerToBytes` points to.
func doWorkWithBytes(pointerToBytes: UnsafePointer<UInt8>, length: Int)
  • Using a typealias:
typealias Byte = UInt8

func doWorkWithBytes(pointerToBytes: UnsafePointer<Byte>, length: Int)
  • Using a single field struct:
struct Byte {
    var value: UInt8
}

func doWorkWithBytes(pointerToBytes: UnsafePointer<Byte>, length: Int)

(We should document the code in any case, of course.)

Option 1 isn’t really interesting because it relies on documentation instead of actual types to convey information. I think we can do better than that. Option 2 only introduces a new name for an existing type, while option 3 defines a completely new type. They both lead to the same function signature, though.

Let’s explore these typealias versus struct approaches to find out which one suits our needs better.

struct versus typealias

Taking a step back from the UnsafePointer<Byte> and just focusing on the Byte type itself, what are the implications of using a single field struct versus a typealias?

First up, typealias:

typealias Byte = UInt8

// Creating a `Byte` instance:
let x1 = Byte(42) // ✅ Using initializer.
let x2: Byte = 42 // ✅ Using IntegerLiteralConvertible conformance.

// Protocol conformances of `UInt8` are also available to `Byte`:
Byte(42) == Byte(42) // ✅ Equatable.
Byte(42) > Byte(100) // ✅ Comparable.

// Given an existing function that expects a `UInt8` in a library we can't modify,
// passing a `Byte` works just fine:
func foo(bar: UInt8) { ... }
foo(x1) // ✅
foo(x2) // ✅

Looks good so far. Next, struct:

struct Byte {
    var value: UInt8

    init(_ value: UInt8) {
        self.value = value
    }
}

// Creating a `Byte` instance:
let x = Byte(42) // ✅ Using initializer.
let x: Byte = 42 // ⛔️ "Cannot convert value of type 'Int' to specified type 'Byte'."

// Protocol conformances of `UInt8` are not available to `Byte`:
Byte(42) == Byte(42) // ⛔️ "Binary operator '==' cannot be applied to two 'Byte' operands."
Byte(42) > Byte(100) // ⛔️ "Binary operator '>' cannot be applied to two 'Byte' operands."

// Given an existing function that expects a `UInt8` in a library we can't modify,
// passing a `Byte` does not work:
func foo(bar: UInt8) { ... }
foo(x) // ⛔️ "Cannot convert value of type 'Byte' to expected argument type 'UInt8'."
foo(x.value) // ⚠️ Works but requires access to the internals of `Byte`.

Well, that doesn’t look so good. By introducing a new type, we start from scratch in terms of protocol conformances. It may be trivial to add conformances to protocols such as IntegerLiteralConvertible, Equatable, or Comparable, but it’s still code we need to write. Even worse, if we need to pass the actual value to another function, we have to access the struct’s field directly, which may not even be possible, e.g. if it is marked as private.

(Fun fact: The exact same problems would come up if this were written in C. The typealias would there be typedef uint8_t Byte or something like that.)

Are these implications a deal breaker? It depends. The thing we want to model for our socket API is a byte and comparing the value of one of those to that of other bytes or checking it for equality to a given value makes sense.

But if the thing we’re modelling were something else, perhaps a randomly generated unique ID named MyUniqueID that is represented by a 64 bit number (UInt64), then an equality check makes sense, while checking if one is greater than another might not make sense (YMMV). In such a scenario introducing a new type with all its implications could be precisely what is desired.

One other thing to consider: If we wanted to add a method to the thing we’re modelling, does it make sense to add that method to all instances of the existing type, too? To stick with the above example, if we added a computed property named MyUniqueID.isValid that checked whether the ID satisfies predefined criteria (e.g. whether a checksum matches), that would definitely not make sense on UInt64, especially not with such a generic name. Then, we’d want to use a struct instead of typealias for MyUniqueID.

By the way, the Socket type introduced in the previous blog post is a struct with only a single property (the file descriptor). The preceding two paragraphs here also explain the reason why Socket is a struct and not a typealias.

The Pointee End

Back to the issue at hand, what are the differences in these two approaches when we don’t use Byte values directly, but instead we only ever use UnsafePointer<Byte>?

As an example, let’s try to access a String as a C string and send its bytes over a socket using our Socket.send method:

let socket: Socket = ...
let message = "Valar Morghulis"
try message.withCString { (pointer: UnsafePointer<Int8>) in
    let bytePointer = UnsafePointer<Byte>(pointer) // Convert pointer.
    let byteCount = Int(strlen(pointer))
    let buffer = UnsafeBufferPointer<Byte>(start: bytePointer, count: byteCount)
    try socket.send(buffer)
}

I intentionally didn’t say whether Byte in the above code is of the typealias or the struct variant – because it doesn’t matter. Swift doesn’t even care that String’s C string representation uses Int8 as its internal representation while our Byte uses UInt8.

That effectively tells us what we wanted to know: Any UnsafePointer<T> can be converted to any other UnsafePointer<U> simply by passing the value of the one to the initializer of the other. The pointee doesn’t matter. Therefore, all of the issues of the typealias versus struct discussion above are irrelevant when we’re dealing with pointers to those entities.

Which one to choose

To summarize: When working with values of Byte, it matters a lot if it’s a struct or a typealias. But when we’re only dealing with UnsafePointer<Byte> it seems like it comes down to personal preference when deciding which one to pick. Maybe we should check the Swift standard library for precedents?

(All links to the Swift GitHub repository in this section point to the swift-2.2-RELEASE tag.)

Looking at the public interface, typealias comes up a lot, not just in relation to an associatedtype for a protocol conformance, but also for actually defining another name for an existing type at the global scope. Technically, one is even exactly what we’re looking for: typealias CUnsignedChar = UInt8. But I personally would prefer the type not be related to C by name. (Also, the whole char, int, long naming was always very unintuitive to me.) Regardless, case in point for going with typealias.

Then there are a couple of APIs related to dealing with C strings:

Those are two cases of using the concrete type directly and one of using a typealias.

(Why those do not uniformly use one type to represent the same thing and why that type isn’t the one that is designed to represent a UTF-8 code point, namely UTF8.CodeUnit – a typealias on the UTF8 struct – is a mystery to me.)

So far it looks like typealias is on the winning side. And if you’re in favor of struct Byte, there’s some good news and some bad news.

The good news is that after some digging around, there is something very similar buried in the Swift standard library. It’s even called RawByte, it is a struct and it has a single UInt8 field. Just like ours, awesome! That thing is used for pointer arithmetic on internals of the String implementation. Sounds like a good fit for what we’re doing.

Now the bad news: It’s deprecated and will be removed from the public interface in Swift 3. That doesn’t mean we can’t roll our own struct Byte but it’s a strong hint that this isn’t something that the Swift standard library wants to offer.

I took to the swift-users mailing list asking about the reason of the deprecation of RawByte and what should be used as the representation of a single byte instead. Dmitri Gribenko (Twitter, GitHub), the Swift standard library engineer at Apple who committed the change that removed RawByte from the public interface, answered:

Using UInt8 or Int8 is recommended.

[The deprecation] comes from a preview implementation of SE-0006.

The first part is pretty clear. I still prefer a separate name for it, but that’s the beauty of typealias: It’s just another name while still being the same thing.

The part about the deprecation is less clarifying. The proposal lists a lot of code changes and lumps the RawByte removal together with a lot of other changes under a point named “Miscellaneous changes.” Oh well.

The score: struct: -1, typealias: >0

Conclusion

In the end, I settled on typealias Byte = UInt8 for reasons that are hopefully clear by the end of this blog post. You may disagree and still prefer the struct Byte approach and that’s fine, too.

I hope the information presented here made for an interesting read. I know it’s a very long way of discussing something that could be answered pretty quickly, but I often find these kinds of fundamental questions fascinating. Also, I think most stories that involve a peek into the history of the Swift standard library’s source code and a mailing list answer by an Apple engineer can’t be that boring.