Hi, i've been instructed by the latest xcode (10.2) to migrate some code that used "withUnsafeMutableBytes" and "withUnsafeBytes" over the Data type with this message : "use
withUnsafeMutableBytes<R>(_: (UnsafeMutableRawBufferPointer) throws -> R) rethrows -> R
.
After some digging on the net i started using a mix of UnsafeMutablePointer.allocate(capacity) and "myData.withUnsafeBytes { (_ ptr:UnsafeRawBufferPointer) -> Void ... " which now compiles without warning.
However, i tried to look at the definition for those variants in the Data documentation (
Apple Developer Documentation
) to make sure i wasn't doing incorrect things with the memory, and found that pretty much every "withUnsafe" functions are deprecated (yet xcode gives no warning anymore), and not a single line explain what are now the official recommended way of accessing those data (except for the subscript operators, which don't suit my case).
Autocompletion also suggested "withContinuousStorageIfAvailable", which seems to indicate that there could be issues regarding the underlying storage continuity ? But there also, i couldn't find any documentation on that subject on the documentation (is this a sequence only thing that's not really relevant in the case of the Data ?).
I must admit i really don't know where to look at, nor which method are now supposed to be the correct way of accessing a Data bytes buffer (for context, the initial goal of the function was to use the commoncrypto CC_MD5 over the data buffer).
Data
conforms to Foundation’s
ContiguousBytes
protocol, which defines a single function you can use to get the underlying bytes. That’s the function you should call.
You may find it helpful to explicitly define the type of the argument your closure expects to an
UnsafeMutableRawBufferPointer
such that the compiler will select the correct function.
Thanks for the infos. I tried to look at the documentation for this method, and there's simply nothing. What's the return type for ? Since we're talking about "bytes" why is the closure parameter a RawPointer instead of a Something< UInt8 > (which would be convenient in my case since CC_MD5 uses a UnsafeMutablePointer< UInt8 > for the destination buffer) ? Would it work fine with the "memory rebound" apis in case i need to have it typed ?
Trying to do "data.withUnsafeMutableBytes { (_ ptr: UnsafeMutableRawBufferPointer< UInt8 >) -> Void in .. " results in a "Cannot specialize non generic type UnsafeMutableRawPointer"...
I had a look at the "Manual Memory Management" chapter of the documentation, which explains the "unsafe" types very well, but i'm starting to get the feeling there are still gaps in how those types are integrated in the stdlib (or at least in the stdlib documentation).
All the memory access functions are an extremely sensitive part of the api which most average developers (like myself) don't use on a daily basis, so i was a bit surprised by the lack of documentation, especially if xcode starts throwing new warnings...
Do you know if there there are any effort to get the official Swift documentation in the hands of the community ? I would glady contribute.
benjamin.g:
Thanks for the infos. I tried to look at the documentation for this method, and there's simply nothing. What's the return type for ? Since we're talking about "bytes" why is the closure parameter a RawPointer instead of a Something< UInt8 > (which would be convenient in my case since CC_MD5 uses a UnsafeMutablePointer< UInt8 > for the destination buffer) ? Would it work fine with the "memory rebound" apis in case i need to have it typed ?
The change here primarily had to do with the possibility of creating
Data
with
Data.init(bytesNoCopy:count:deallocator:)
, which allows someone to create a
Data
instance wrapping an already-existing buffer.
When someone does this, they can pass in a raw pointer to any buffer which they have created, which may or may not have been initialized with various types of data; specifically, the passed pointer could be bound to
Typed Memory
where the bound type is non-trivial.
Previously,
Data
presented an interface which returned an
UnsafeBufferPointer<UInt8>
, and did this by
rebinding
the memory on your behalf: this could implicitly trigger undefined behavior if the original buffer was one you didn't own, and have no control over how it was allocated and initialized.
The change here keeps underlying
Data
access entirely
untyped
via
Raw
pointers. With a
Raw
pointer, you can read the bytes directly (via
load(fromByteOffset:as:)
/
copyMemory(from:)
), without running the risk of implicit undefined behavior. If you did have control over how the buffer as initialized (specifically, you know the original buffer was either untyped, or bound to a trivial type like
UInt8
), then it is also safe to rebind the raw buffer to the type you want with
bindMemory(to:)
).
benjamin.g:
Trying to do "data.withUnsafeMutableBytes { (_ ptr: UnsafeMutableRawBufferPointer< UInt8 >) -> Void in .. " results in a "Cannot specialize non generic type UnsafeMutableRawPointer"...
Indeed, raw buffers differ from typed buffers, but there are various ways of reading directly out of a raw buffer, and hopefully the specific documentation on
UnsafeRawBufferPointer
and continued reading of the Manual Memory Management guide can help. (Also happy to answer specific questions to help guide you!)
benjamin.g:
I had a look at the "Manual Memory Management" chapter of the documentation, which explains the "unsafe" types very well, but i'm starting to get the feeling there are still gaps in how those types are integrated in the stdlib (or at least in the stdlib documentation).
All the memory access functions are an extremely sensitive part of the api which most average developers (like myself) don't use on a daily basis, so i was a bit surprised by the lack of documentation, especially if xcode starts throwing new warnings...
Do you know if there there are any effort to get the official Swift documentation in the hands of the community ? I would glady contribute.
Unfortunately, the documentation on
developer.apple.com
is not part of the open-source effort, but please do
file a Radar
for any unclear/missing documentation you find — we really do want the documentation on this to be clear, understandable, and easy to find.
func md5DigestA(of data: Data) -> Data {
precondition(!data.isEmpty)
var result = [UInt8](repeating: 0, count: Int(CC_MD5_DIGEST_LENGTH))
data.withUnsafeBytes { buffer in
_ = CC_MD5(buffer.baseAddress!, CC_LONG(buffer.count), &result)
return Data(result)
If you want to handle the empty data case [1], remove the precondition check on line 2 and the force unwrap on line 5. This works because CC_MD5
will handle a NULL
parameter if the count is 0.
If you want to handle the empty data case and you’re dealing with a C function that doesn’t allow a NULL
pointer when the count is 0, things get more complex (-:
Share and Enjoy
Quinn “The Eskimo!” @ DTS @ Apple
[1] A question that deserves serious consideration, at least in a crypto context.
I ended up doing
let nbBytes = Int(CC_MD5_DIGEST_LENGTH)
let digestBytes = UnsafeMutablePointer<UInt8>.allocate(capacity: nbBytes)
defer { digestBytes.deallocate() }
data.withUnsafeBytes { ptr in
guard let baseAddress = ptr.baseAddress else { return }
CC_MD5(baseAddress, CC_LONG(ptr.count), digestBytes)
return Data(bytes: digestBytes, count: nbBytes)
Because i wasn't sure about memory contiguity for any of the Foundation/stdlib structures (being arrays or Data). So using an explicitely allocated UnsafeMutablePointer structure seemed the safest bet.
Data(result)
here creates a copy, but it is possible to avoid this by creating a Data
instead of an array (with the right count) and writing into its buffer directly:
import Foundation
import CommonCrypto
func digest(_ data: Data) -> Data {
var md5 = Data(count: Int(CC_MD5_DIGEST_LENGTH))
md5.withUnsafeMutableBytes { md5Buffer in
data.withUnsafeBytes { buffer in
let _ = CC_MD5(buffer.baseAddress!, CC_LONG(buffer.count), md5Buffer.bindMemory(to: UInt8.self).baseAddress)
return md5
Thanks, this is helpful. I wanted to try a different approach, but it is not working (getting wrong result) and I can't figure out why. Do you mind having a look?
func digest2(_ data: Data) -> Data {
let size = Int(CC_MD5_DIGEST_LENGTH)
let md = UnsafeMutablePointer<UInt8>.allocate(capacity: size)
data.withUnsafeBytes {
CC_MD5($0.baseAddress!, UInt32(size), md)
return Data(bytesNoCopy: md, count: size, deallocator: .free)
In the second parameter of your call to CC_MD5
, you’re passing in the size of the digest not the size of the buffer. You want $0.count
.
Share and Enjoy
Quinn “The Eskimo!” @ DTS @ Apple
allocating again?
It’s allocating again. Should you be concerned about that? Only if you’re calling this a lot. Unless this code is very hot, that extra allocation just won’t matter.
Moreover, attempting to remove it can cause you grief. For example, the code you posted downthread has these lines:
let md = UnsafeMutablePointer<UInt8>.allocate(capacity: size)
return Data(bytesNoCopy: md, count: size, deallocator: .free)
which is not valid. Memory that you allocate with allocate(capacity:)
must be freed by deallocate
, but .free
causes it to be freed by free
. This happens to work on Apple platforms, but is not guaranteed by the API. For more details, see UnsafeMutablePointer allocation compatibility with C malloc/free.
Share and Enjoy
Quinn “The Eskimo!” @ DTS @ Apple
Oops, thanks for the size correction!
In this case allocations don't really matter, but I'm just using MD5 as an example, since this is what the OP used. Would this work then?
Data(bytesNoCopy: md, count: size, deallocator: .custom({ buf, _ in buf.deallocate() }))
Thanks again for your answers!
It's a serious usability bug that Data
and UnsafeRawPointer
do not interoperate with C functions that take char *
byte buffers. App programmers should never need to use any of the memory binding APIs just to call libraries.
This was a known issue, but I filed this anyway to make sure we're tracking it:
SR-10246 Support limited implicit pointer conversion when calling C functions.
I'm struggling with this whole memory management stuff, probably because I've never come across a good introduction to it. So I've basically relied on imitating other people's code. If someone can point me in the right direction for a good introduction (I've been programming for nearly 40 years in several languages, but seriously in Swift for only a few months), I'd be grateful.
Anyway, my issue comes up with the same recommendation from Xcode 10.2. I have a function to do a simple test of a block of data to confirm that it is likely to actually be icns data:
static func isIcns(data icnsData: Data) -> Bool {
let header = icnsData.withUnsafeBytes {
[UInt32](UnsafeBufferPointer(start: $0, count: 2))
let icnsHeader = header[0].byteSwapped
let icnsLength = header[1].byteSwapped
let expectedHeader = UnicodeScalar("i").value << 24 + UnicodeScalar("c").value << 16 + UnicodeScalar("n").value << 8 + UnicodeScalar("s").value
if icnsData.count == icnsLength && icnsHeader == expectedHeader {
return true
return false
How do I rewrite that first bit, to get the first eight bytes as two UInt32 references?
I'm struggling with this whole memory management stuff …
That’s understandable. Swift makes this challenging because:
The API details have changed quite a lot over the years.
Recent versions have strict rules about aliasing (in this sense of the word). These will yield long-term benefits, but they do take some getting used to.
With regards your specific issue, I’m a big fan of moving up a level of abstraction. In your case, I’d rethink this as a parsing problem rather than a structure access problem. The fact that Data
exposes its contents as a collection of bytes means you can take advantage of lots of functionality that’s available on collections. For example:
func isIcns(data icnsData: Data) -> Bool {
guard
icnsData.count >= 8,
icnsData.starts(with: "icns".utf8)
else {
return false
let embeddedCount32 = icnsData.dropFirst(4).prefix(4).reduce(0) { $0 << 8 | UInt32($1) }
return Int(exactly: embeddedCount32) == icnsData.count
One thing to note about this code it that, on a 32-bit machine, it avoids the trap you might encounter converting the length bytes of a maliciously crafted icns
to Int
.
Share and Enjoy
Quinn “The Eskimo!” @ DTS @ Apple
Thank you! The code is nice and avoids my byte-swapping with the neat trick of using reduce.
Any pointers to a good introduction to the memory management that is up to date for Swift 5?
My solution based in the snippets from here without force unwrap and setting all types possible:
func buildMD5(data: Data) -> String {
var md5: Data = Data(count: Int(CC_MD5_DIGEST_LENGTH))
md5.withUnsafeMutableBytes { (md5Buffer: UnsafeMutableRawBufferPointer) in
data.withUnsafeBytes { (buffer: UnsafeRawBufferPointer) in
guard let baseAddress: UnsafeRawPointer = buffer.baseAddress else {
return
_ = CC_MD5(baseAddress, CC_LONG(buffer.count), md5Buffer.bindMemory(to: UInt8.self).baseAddress)
return md5.base64EncodedString()