For performance reasons I would like a zero-copy cast of ByteString
(strict, for now) to a Vector
. Since Vector
is just a ByteArray#
under the hood, and ByteString
is a ForeignPtr
this might look something like:
caseBStoVector :: ByteString -> Vector a
caseBStoVector (BS fptr off len) =
withForeignPtr fptr $ \ptr -> do
let ptr' = plusPtr ptr off
p = alignPtr ptr' (alignment (undefined :: a))
barr = ptrToByteArray# p len -- I want this function, or something similar
barr' = ByteArray barr
alignI = minusPtr p ptr
size = (len-alignI) `div` sizeOf (undefined :: a)
return (Vector 0 size barr')
That certainly isn't right. Even with the missing function ptrToByteArray#
this seems to need to escape the ptr
outside of the withForeignPtr
scope. So my quesetions are:
This post probably advertises my primitive understanding of
ByteArray#
, if anyone can talk a bit aboutByteArray#
, it's representation, how it is managed (GCed), etc I'd be grateful.The fact that
ByteArray#
lives on the GCed heap andForeignPtr
is external seems to be a fundamental issue - all the access operations are different. Perhaps I should look at redefiningVector
from= ByteArray !Int !Int
to something with another indirection? Someing like= Location !Int !Int
wheredata Location = LocBA ByteArray | LocFPtr ForeignPtr
and provide wrapping operations for both those types? This indirection might hurt performance too much though.Failing to marry these two together, maybe I can just access arbitrary element types in a
ForeignPtr
in a more efficient manner. Does anyone know of a library that treatsForeignPtr
(orByteString
) as an array of arbitraryStorable
orPrimitive
types? This would still lose me the stream fusion and tuning from the Vector package.
Disclaimer: everything here is an implementation detail and specific to GHC and the internal representations of the libraries in question at the time of posting.
This response is a couple years after the fact, but it is indeed possible to get a pointer to bytearray contents. It's problematic as the GC likes to move data in the heap around, and things outside of the GC heap can leak, which isn't necessarily ideal. GHC solves this with:
newPinnedByteArray# :: Int# -> State# s -> (#State# s, MutableByteArray# s#)
Primitive bytearrays (internally typedef'd C char arrays) can be statically pinned to an address. The GC guarantees not to move them. You can convert a bytearray reference to a pointer with this function:
byteArrayContents# :: ByteArray# -> Addr#
The address type forms the basis of Ptr and ForeignPtr types. Ptrs are addresses marked with a phantom type and ForeignPtrs are that plus optional references to GHC memory and IORef finalizers.
Disclaimer: This will only work if your ByteString was built Haskell. Otherwise, you can't get a reference to the bytearray. You cannot dereference an arbitrary addr. Don't try to cast or coerce your way to a bytearray; that way lies segfaults. Example:
To get the bytearray from a ByteString, you need to import the constructor from Data.ByteString.Internal and pattern match.
Now we need to rip the goods out of the ForeignPtr. This part is entirely implementation-specific. For GHC, import from GHC.ForeignPtr.
In GHC, ByteString is built with PlainPtrs which are wrapped around pinned byte arrays. They carry no finalizers. They are GC'd like regular Haskell data when they fall out of scope. Addrs don't count, though. GHC assumes they point to things outside of the GC heap. If the bytearray itself falls out of the scope, you're left with a dangling pointer.
MutableByteArrays are identical to ByteArrays. If you want true zero-copy construction, make sure you either unsafeCoerce# or unsafeFreeze# to a bytearray. Otherwise, GHC creates a duplicate.
And now you have the raw contents of the ByteString ready to be turned into a vector.
Best Wishes,
You might be able to hack together something
:: ForeignPtr -> Maybe ByteArray#
, but there is nothing you can do in general.You should look at the
Data.Vector.Storable
module. It includes a functionunsafeFromForeignPtr :: ForeignPtr a -> Int -> Int -> Vector a
. It sounds like what you want.There is also a
Data.Vector.Storable.Mutable
variant.