import Foundation
let data = Data([0xFF])
let i = data.withUnsafeBytes { (ptr: UnsafePointer<Int8>) in
ptr.pointee
print(i) // -1
Looking at the source , assumingMemoryBound(to:)
is used in order to get the typed pointer to the underlying bytes – however AFAIK that means that T
and UInt8
must be both be related types, and UInt8
must be layout compatible with T
(and for withUnsafeMutableBytes(_:)
this would need to extend to mutual layout compatibility).
From that I would conclude that the above example exhibits undefined behaviour due to the violation of strict aliasing, assuming that Int8
and UInt8
are unrelated types. Is this correct?
If so, it would be great if the documentation for both withUnsafeBytes(_:)
and withUnsafeMutableBytes(_:)
could be updated to make the user aware of this very easy to fall into pit (especially given the fact the user doesn't always have to spell out the type T
explicitly, it can be inferred).
I did notice a proposal from December 2016 that aimed to replace withUnsafeBytes(_:)
's UnsafePointer<T>
argument with an UnsafeRawBufferPointer
(note the link in the post is broken, you have to expand the body), which I would support – but it was deferred until sometime after Swift 4. Now that we're in the window for Swift 5, would now be a good time to re-visit this proposal?
Strictly speaking, Foundation should probably use withMemoryRebound(to:capacity:)
rather than assumingMemoryBound(to:)
, but that's on us, not the user. It's not supposed to be required for the user to write this, and it should not be documented as a pitfall.
(@Andrew_Trick , @Philippe_Hausler ?)
Thanks, though even with withMemoryRebound(to:capacity:)
, I believe something like this, which still looks fairly innocuous, would be undefined:
let data = Data([0xFF, 0xFF])
let j = data.withUnsafeBytes { (ptr: UnsafePointer<UInt16>) in
ptr.pointee
as withMemoryRebound
requires a type with the same size and alignment. Is it intentional for the above usage of withUnsafeBytes
to be undefined? (or is it another case of being an implementation detail?).
Regardless, I really do think the documentation in general for both withUnsafeBytes
and withUnsafeMutableBytes
needs to be updated specifying what preconditions are needed for the type T
in order for them to work in a well-defined manner.
Hamish is right. Thank you for pointing this out.
Data.withUnsafeBytes(_)
cannot use withMemoryReboundTo(to:capacity:)
because that requires a typed pointer to begin with. Data is a raw byte buffer so has no idea how the memory may have previously been typed.
Originally, the Data.withUnsafeBytes
used bindMemory
, which avoided undefined
behavior in the above example. (It still encouraged undefined behavior
when using Data(bytesNoCopy:)
though). For some reason, that was changed to assumingMemoryBound(to:)
. I either didn't notice that change or can't remember it.
FYI: here's an open bug on fixing the Foundation.Data API. Feel free to file more, particularly one to change the implementation back to bindMemory
!
github.com/apple/swift-corelibs-foundation
| | |
|------------------|-----------------|…
|Previous ID | SR-5363 |
|Radar | rdar://28201288 |
|Original Reporter | @atrick |
|Type | Bug |
|Status | Closed |
|Resolution | Duplicate |
<details>
<summary>Additional Detail from JIRA</summary>
| | |
|------------------|-----------------|
|Votes | 1 |
|Component/s | Foundation, Standard Library |
|Labels | Bug |
|Assignee | None |
|Priority | Medium |
md5: 47a2b35ea104f50c0451e7d810cc42b6
</details>
**cloned to**:
* [SR-9720](https://bugs.swift.org/browse/SR-9720) Foundation Data API: add Data.copyBytes methods taking UnsafeRawBufferPointer.
**Issue Description:**
Data's current API's that interoperate with unsafe pointers are awkward, and misleading. They should be updated now that we have UnsafeRawBufferPointer (SE-0183).
Prototype:
<https://github.com/atrick/swift/commit/796b08fa5635a7d61cbb6bf0718178e0ce326e4f>
``` java
commit 796b08fa5635a7d61cbb6bf0718178e0ce326e4f
Author: Andrew Trick <[email protected] >
Date: Wed Sep 7 20:07:21 2016
Add Foundation Data interop with UnsafeRawBufferPointer.
Adds these initializers to construct a data object from a slice of raw memory:
public init(buffer: UnsafeMutableRawBufferPointer)
public init(bufferNoCopy buffer: UnsafeMutableRawBufferPointer, deallocator: Deallocator)
Deprecates these initializers because they are redundant:
public init(bytes: UnsafeRawPointer, count: Int)
public init(bytesNoCopy bytes: UnsafeMutableRawPointer, count: Int, deallocator: Deallocator)
Adds these closure applications that expose Data's underlying raw memory:
public func withUnsafeBytes<ResultType>(
_ body: (UnsafeRawBufferPointer) throws -> ResultType
) rethrows -> ResultType
public mutating func withUnsafeMutableBytes<ResultType>(
_ body: (UnsafeMutableRawBufferPointer) throws -> ResultType
) rethrows -> ResultType
As explained in the comments, the existing closure taking method,
Data.withUnsafeBytes<ResultType, ContentType>, is unsafe for Data initialized
with bytesNoCopy because it changes the memory's semantic state behind the caller's back.
Adds these methods to copy bytes from data into a buffer:
public func copyBytes(to buffer: UnsafeMutableRawBufferPointer, count: Int)
public func copyBytes(to buffer: UnsafeMutableRawBufferPointer, from range: Range<Index>)
Adds this method to copy bytes into data from a buffer:
public mutating func append(_ buffer: UnsafeRawBufferPointer)
------------------------------------------------------------------------
I also suggest that we deprecate this existing API:
``` java
public func withUnsafeBytes<ResultType, ContentType>
And reintroduce it as:
``` java
public func withUnsafePointer<ResultType, ContentType>
It’s only purpose is to make it more convenient to call a C API passing a non-void pointer into Data.
``` java
data.withUnsafePointer {
strlen($0)
If the C api takes void\* and length, it would actually be safer to use the newly proposed:
``` java
data.withUnsafeBytes {
memcpy(dest, $0.baseAddress, $0.length)
because that doesn’t require knowledge of the type of values in memory.
9/9/16, 9:35 PM Andrew Trick:
As mentioned on the review thread, we should also deprecate these:
``` java
public func copyBytes(to pointer: UnsafeMutablePointer<UInt8>, count: Int)
public func copyBytes(to pointer: UnsafeMutablePointer<UInt8>, from range: Range<Index>)
The reason for deprecating them is that the current versions allow buffer overflow. They do not actually bind memory, so that isn’t the problem.
9/9/16, 9:45 PM Andrew Trick:
This is the method that unnecessarily binds memory to UInt8:
``` java
public func enumerateBytes(_ block: (_ buffer: UnsafeBufferPointer<UInt8>, _ byteIndex: Index, _ stop: inout Bool) -> Void)
We should deprecate it and add a safer overload:
``` java
public func enumerateBytes(_ block: (_ buffer: UnsafeRawBufferPointer, _ byteIndex: Index, _ stop: inout Bool) -> Void)
To summarize the strategy that I think makes sense: Anyone using Data.withUnsafeBytes() to manipulate the bytes should be reading/writing memory through an Unsafe[Mutable]RawPointer, unless either:
(a) they avoid the Data(bytesNoCopy:)
initializer.
(b) they have some special, certain knowledge of the how the non-copied memory
is used outside of the Data object.
I'm not sure where to best document these tips, but agree it should be explained somewhere. It seems worthy of a bug. I could help with the wording and review...
cc @nnnnnnnn @Philippe_Hausler
let data = Data([0xFF, 0xFF])
let j = data.withUnsafeBytes { (ptr: UnsafePointer<UInt16>) in
ptr.pointee
To be clear, this isn't as horrible as the example above makes it look. Data initializes it's own copy of the underlying storage using a raw pointer. So, the only way you get undefined behavior is if the user calls withUnsafeBytes
with different types on the same Data
object (and even that's ok if it's read-only):
import Foundation
let data = Data([0xFF])
let i = data.withUnsafeBytes { (ptr: UnsafePointer<Int8>) in
ptr.pointee
// Now *this* is bad.
data.withUnsafeBytes { (ptr: UnsafePointer<UInt8>) in
ptr.pointee = 3
print(i) // -1
Andrew_Trick:
Data initializes it's own copy of the underlying storage using a raw pointer.
Ah, I see. I didn't quite appreciate before that the underlying memory in Data([0xFF])
is completely untyped – I incorrectly presumed it would be bound to UInt8
in that case, but looking at the implementation I now see that malloc
and memmove
are used. Thanks for clarifying that point!
How then does this play with Data
's subscript that returns a UInt8
? Currently the implementation uses assumingMemoryBound(to: UInt8.self)
on the raw pointer to the underlying memory in order to access the byte at a given index (this is also used in handful of other places in the implementation in places such as appending and getting the hash).
Now that I think about it, if we changed the implementation of withUnsafeBytes
back to using bindMemory
, wouldn't we have undefined behaviour for those other places in the implementation if withUnsafeBytes
is used with an unrelated type? Or would we also need to change the implementation in those cases too? (e.g using load(as:)
and storeBytes(of:as:)
for the subscript logic).
Andrew_Trick:
So, the only way you get undefined behavior is if the user calls withUnsafeBytes
with different types on the same Data
object (and even that's ok if it's read-only):
import Foundation
let data = Data([0xFF])
let i = data.withUnsafeBytes { (ptr: UnsafePointer<Int8>) in
ptr.pointee
// Now *this* is bad.
data.withUnsafeBytes { (ptr: UnsafePointer<UInt8>) in
ptr.pointee = 3
print(i) // -1
I'm not sure I completely follow you here; are you saying that
let data = Data([0xFF])
let i = data.withUnsafeBytes { (ptr: UnsafePointer<Int8>) in
ptr.pointee
on its own is well-defined with the current implementation of withUnsafeBytes
(using assumingMemoryBound(to:)
, that is)? Is it not UB to perform typed operations on untyped memory?
hamishknight:
Ah, I see. I didn't quite appreciate before that the underlying memory in Data([0xFF])
is completely untyped – I incorrectly presumed it would be bound to UInt8
in that case, but looking at the implementation I now see that malloc
and memmove
are used. Thanks for clarifying that point!
Yes. Data is a wrapper around UnsafeMutableRawBufferPointer that provides interop with frameworks. Viewing it as strongly typed memory is just wrong unless we provide a typed wrapper on top of it. If all we care about is the convenience of using a typed pointer, we could easily introduce a typed buffer pointer that always accesses memory using raw memory operations (i.e. the API would be typed but the backing store would be raw bytes).
hamishknight:
How then does this play with Data
's subscript that returns a UInt8
? Currently the implementation uses assumingMemoryBound(to: UInt8.self)
on the raw pointer to the underlying memory in order to access the byte at a given index (this is also used in handful of other places in the implementation in places such as appending and getting the hash).
The implementation of Data's subscript, and most of the other uses of assumingMemoryBound(to:)
should just use the UnsafeRawPointer API internally. Here's where you say "should I file a bug" and I say "yes please". I think separate smaller bugs are more likely to get results than my old bug that seems to have died.
hamishknight:
Now that I think about it, if we changed the implementation of withUnsafeBytes
back to using bindMemory
, wouldn't we have undefined behaviour for those other places in the implementation if withUnsafeBytes
is used with an unrelated type? Or would we also need to change the implementation in those cases too? (e.g using load(as:)
and storeBytes(of:as:)
for the subscript logic).
Yes, exactly, the internal Data implementation should only work with raw memory.
hamishknight:
I'm not sure I completely follow you here; are you saying that
let data = Data([0xFF])
let i = data.withUnsafeBytes { (ptr: UnsafePointer<Int8>) in
ptr.pointee
on its own is well-defined with the current implementation of withUnsafeBytes
(using assumingMemoryBound(to:)
, that is)? Is it not UB to perform typed operations on untyped memory?
Well, I suppose we could write the spec that way. But I don't think that would be a useful rule. The goal of the memory model is to allow the compiler to make all of its usual, helpful assumptions about typed memory access while providing a set of rules that can be precisely verified at runtime. The rule can simply be that we don't have multiple accesses to the same memory location of unrelated types where at least one access modifies memory.
So, we want to strongly discourage using assumingMemoryBound
on untyped memory (for that matter we want to strongly discourage using assumingMemoryBound
ever )--it's a sign of a broken API. But we don't need to declare it undefined behavior.
Thanks Andrew, that greatly clarified things :) I've filed a few more bugs: SR-7849 , SR-7850 , SR-7851 .
Is there currently a timeline for the writing up a formal spec specifying the rules of Swift's memory model; in particular, the rules over what types are considered related and/or layout compatible?
Thank you for the bug reports Hamish!
No, there is no timeline for a formal spec (that would require a lot of careful attention from and deliberation among people who are busy with urgent work).
I'm afraid you'll have to make due with this informal writeup:
https://github.com/atrick/swift/blob/type-safe-mem-docs/docs/TypeSafeMemory.rst
It captures my own understanding as an implementer. It's decidedly not part of the language, so there's no future guarantee. And it hasn't seen much review, so it's likely incomplete in areas. But I think it's unlikely that we'll want to make more aggressive assumptions in the compiler than what I've already proposed there.