Passing a JavaScript string to a Rust function com

2019-01-24 12:18发布

问题:

I have this simple Rust function:

#[no_mangle]
pub fn compute(operator: &str, n1: i32, n2: i32) -> i32 {
    match operator {
        "SUM" => n1 + n2,
        "DIFF" => n1 - n2,
        "MULT" => n1 * n2,
        "DIV" => n1 / n2,
        _ => 0
    }
}

I am compiling this to WebAssembly successfully, but don't manage to pass the operator parameter from JS to Rust.

The JS line which calls the Rust function looks like this:

instance.exports.compute(operator, n1, n2);

operator is a JS String and n1, n2 are JS Numbers.

n1 and n2 are passed properly and can be read inside the compiled function so I guess the problem is how I pass the string around. I imagine it is passed as a pointer from JS to WebAssembly but can't find evidence or material about how this works.

I am not using Emscripten and would like to keep it standalone (compilation target wasm32-unknown-unknown), but I see they wrap their compiled functions in Module.cwrap, maybe that could help?

回答1:

To transfer string data between JavaScript and Rust, you need to decide

  1. The encoding of the text: UTF-8 (Rust native) or UTF-16 (JS native).
  2. Who will own the memory buffer: the JS (caller) or Rust (callee).
  3. How to represent the strings data and length: NUL-terminated (C-style) or distinct length (Rust-style).
  4. How to communicate the data and length, if they are separate.

Solution 1

I decided:

  1. To convert JS strings to UTF-8, which means that the TextEncoder JS API is the best fit.
  2. The caller should own the memory buffer.
  3. To have the length be a separate value.
  4. Another struct and allocation should be made to hold the pointer and length.

lib/src.rs

// Inform Rust that memory will be provided 
#![feature(wasm_import_memory)]
#![wasm_import_memory]

// A struct with a known memory layout that we can pass string information in
#[repr(C)]
pub struct JsInteropString {
    data: *const u8,
    len: usize,
}

// Our FFI shim function    
#[no_mangle]
pub unsafe extern "C" fn compute(s: *const JsInteropString, n1: i32, n2: i32) -> i32 {
    // Check for NULL (see corresponding comment in JS)
    let s = match s.as_ref() {
        Some(s) => s,
        None => return -1,
    };

    // Convert the pointer and length to a `&[u8]`.
    let data = std::slice::from_raw_parts(s.data, s.len);

    // Convert the `&[u8]` to a `&str`    
    match std::str::from_utf8(data) {
        Ok(s) => real_code::compute(s, n1, n2),
        Err(_) => -2,
    }
}

// I advocate that you keep your interesting code in a different
// crate for easy development and testing. Have a separate crate
// with the FFI shims.
mod real_code {
    pub fn compute(operator: &str, n1: i32, n2: i32) -> i32 {
        match operator {
            "SUM"  => n1 + n2,
            "DIFF" => n1 - n2,
            "MULT" => n1 * n2,
            "DIV"  => n1 / n2,
            _ => 0,
        }
    }
}

It's important to build C dylibs for WASM to help them be smaller in size.

Cargo.toml

[package]
name = "quick-maths"
version = "0.1.0"
authors = ["An Devloper <an.devloper@example.com>"]

[lib]
crate-type = ["cdylib"]

For what it's worth, I'm running this code in Node, not in the browser.

index.js

const fs = require('fs-extra');
const { TextEncoder } = require('text-encoding');

// Allocate some memory.
const memory = new WebAssembly.Memory({ initial: 20, maximum: 100 });

// Connect these memory regions to the imported module
const importObject = {
  env: { memory }
};

// Create an object that handles converting our strings for us
const memoryManager = (memory) => {
  var base = 0;

  // NULL is conventionally at address 0, so we "use up" the first 4
  // bytes of address space to make our lives a bit simpler.
  base += 4;

  return {
    encodeString: (jsString) => {
      // Convert the JS String to UTF-8 data
      const encoder = new TextEncoder();
      const encodedString = encoder.encode(jsString);

      // Organize memory with space for the JsInteropString at the
      // beginning, followed by the UTF-8 string bytes.
      const asU32 = new Uint32Array(memory.buffer, base, 2);
      const asBytes = new Uint8Array(memory.buffer, asU32.byteOffset + asU32.byteLength, encodedString.length);

      // Copy the UTF-8 into the WASM memory.
      asBytes.set(encodedString);

      // Assign the data pointer and length values.
      asU32[0] = asBytes.byteOffset;
      asU32[1] = asBytes.length;

      // Update our memory allocator base address for the next call
      const originalBase = base;
      base += asBytes.byteOffset + asBytes.byteLength;

      return originalBase;
    }
  };
};

const myMemory = memoryManager(memory);

fs.readFile('./target/wasm32-unknown-unknown/release/quick_maths.wasm')
  .then(bytes => WebAssembly.instantiate(bytes, importObject))
  .then(({ instance }) => {
    const argString = "MULT";
    const argN1 = 42;
    const argN2 = 100;

    const s = myMemory.encodeString(argString);
    const result = instance.exports.compute(s, argN1, argN2);

    console.log(result);
  });

Solution 2

I decided:

  1. To convert JS strings to UTF-8, which means that the TextEncoder JS API is the best fit.
  2. The module should own the memory buffer.
  3. To have the length be a separate value.
  4. To use a Box<String> as the underlying data structure. This allows the allocation to be further used by Rust code.

src/lib.rs

#![feature(repr_transparent)]

// Very important to use `transparent` to prevent ABI issues 
#[repr(transparent)]
pub struct JsInteropString(*mut String);

impl JsInteropString {
    // Unsafe because we create a string and say it's full of valid
    // UTF-8 data, but it isn't!
    unsafe fn with_capacity(cap: usize) -> Self {
        let mut d = Vec::with_capacity(cap);
        d.set_len(cap);
        let s = Box::new(String::from_utf8_unchecked(d));
        JsInteropString(Box::into_raw(s))
    }

    unsafe fn as_string(&self) -> &String {
        &*self.0
    }

    unsafe fn as_mut_string(&mut self) -> &mut String {
        &mut *self.0
    }

    unsafe fn into_boxed_string(self) -> Box<String> {
        Box::from_raw(self.0)
    }

    unsafe fn as_mut_ptr(&mut self) -> *mut u8 {
        self.as_mut_string().as_mut_vec().as_mut_ptr()
    }
}

#[no_mangle]
pub unsafe extern "C" fn stringPrepare(cap: usize) -> JsInteropString {
    JsInteropString::with_capacity(cap)
}

#[no_mangle]
pub unsafe extern "C" fn stringData(mut s: JsInteropString) -> *mut u8 {
    s.as_mut_ptr()
}

#[no_mangle]
pub unsafe extern "C" fn stringLen(s: JsInteropString) -> usize {
    s.as_string().len()
}

#[no_mangle]
pub unsafe extern "C" fn compute(s: JsInteropString, n1: i32, n2: i32) -> i32 {
    let s = s.into_boxed_string();
    real_code::compute(&s, n1, n2)
}

mod real_code {
    pub fn compute(operator: &str, n1: i32, n2: i32) -> i32 {
        match operator {
            "SUM"  => n1 + n2,
            "DIFF" => n1 - n2,
            "MULT" => n1 * n2,
            "DIV"  => n1 / n2,
            _ => 0,
        }
    }
}

index.js

const fs = require('fs-extra');
const { TextEncoder } = require('text-encoding');

class QuickMaths {
  constructor(instance) {
    this.instance = instance;
  }

  difference(n1, n2) {
    const { compute } = this.instance.exports;
    const op = this.copyJsStringToRust("DIFF");
    return compute(op, n1, n2);
  }

  copyJsStringToRust(jsString) {
    const { memory, stringPrepare, stringData, stringLen } = this.instance.exports;

    const encoder = new TextEncoder();
    const encodedString = encoder.encode(jsString);

    // Ask Rust code to allocate a string inside of the module's memory
    const rustString = stringPrepare(encodedString.length);

    // Get a JS view of the string data
    const rustStringData = stringData(rustString);
    const asBytes = new Uint8Array(memory.buffer, rustStringData, encodedString.length);

    // Copy the UTF-8 into the WASM memory.
    asBytes.set(encodedString);

    return rustString;
  }
}

async function main() {
  const bytes = await fs.readFile('./target/wasm32-unknown-unknown/release/quick_maths.wasm');
  const { instance } = await WebAssembly.instantiate(bytes);
  const maffs = new QuickMaths(instance);

  console.log(maffs.difference(100, 201));
}

main();

Note that this process can be used for other types. You "just" have to decide how to represent data as a set of bytes that both sides agree on then send it across.

See also:

  • Using the WebAssembly JavaScript API
  • TextEncoder API
  • Uint8Array / Uint32Array / TypedArray
  • WebAssembly.Memory
  • Hello, Rust! — Import memory buffer
  • How to return a string (or similar) from Rust in WebAssembly?


回答2:

A WebAssembly program has it's own memory space. And this space is often managed by the WebAssembly program itself, with the help of an allocator library, such as the wee_alloc.

The JavaScript can see and modify that memory space, but it has no way of knowing how the allocator library structures are organized. So if we simply write to the WASM memory from the JavaScript then we'll likely overwrite something important and mess things up. Therefore the WebAssembly program itself must allocate the memory region first, pass it to JavaScript, and then the JavaScript can fill that region with the data.

In the following example we do just that: allocate a buffer in the WASM memory space, copy the UTF-8 bytes there, pass the buffer location to a Rust function, then free the buffer.

Rust:

#![feature(allocator_api)]

use std::heap::{Alloc, Heap, Layout};

#[no_mangle]
pub fn alloc(len: i32) -> *mut u8 {
    let mut heap = Heap;
    let layout = Layout::from_size_align(len as usize, 1).expect("!from_size_align");
    unsafe { heap.alloc(layout).expect("!alloc") }
}

#[no_mangle]
pub fn dealloc(ptr: *mut u8, len: i32) {
    let mut heap = Heap;
    let layout = Layout::from_size_align(len as usize, 1).expect("!from_size_align");
    unsafe { heap.dealloc(ptr, layout) }
}

#[no_mangle]
pub fn is_foobar(buf: *const u8, len: i32) -> i32 {
    let js = unsafe { std::slice::from_raw_parts(buf, len as usize) };
    let js = unsafe { std::str::from_utf8_unchecked(js) };
    if js == "foobar" {
        1
    } else {
        0
    }
}

TypeScript:

// cf. https://github.com/Microsoft/TypeScript/issues/18099
declare class TextEncoder {constructor (label?: string); encode (input?: string): Uint8Array}
declare class TextDecoder {constructor (utfLabel?: string); decode (input?: ArrayBufferView): string}
// https://github.com/DefinitelyTyped/DefinitelyTyped/blob/master/types/webassembly-js-api/index.d.ts
declare namespace WebAssembly {
  class Instance {readonly exports: any}
  interface ResultObject {instance: Instance}
  function instantiateStreaming (file: Promise<Response>, options?: any): Promise<ResultObject>}

var main: {
  memory: {readonly buffer: ArrayBuffer}
  alloc (size: number): number
  dealloc (ptr: number, len: number): void
  is_foobar (buf: number, len: number): number}

function withRustString (str: string, cb: (ptr: number, len: number) => any): any {
  // Convert the JavaScript string to an array of UTF-8 bytes.
  const utf8 = (new TextEncoder()).encode (str)
  // Reserve a WASM memory buffer for the UTF-8 array.
  const rsBuf = main.alloc (utf8.length)
  // Copy the UTF-8 array into the WASM memory.
  new Uint8Array (main.memory.buffer, rsBuf, utf8.length) .set (utf8)
  // Pass the WASM memory location and size into the callback.
  const ret = cb (rsBuf, utf8.length)
  // Free the WASM memory buffer.
  main.dealloc (rsBuf, utf8.length)
  return ret}

WebAssembly.instantiateStreaming (fetch ('main.wasm')) .then (results => {
  main = results.instance.exports
  // Prints "foobar is_foobar? 1".
  console.log ('foobar is_foobar? ' +
    withRustString ("foobar", function (buf, len) {return main.is_foobar (buf, len)}))
  // Prints "woot is_foobar? 0".
  console.log ('woot is_foobar? ' +
    withRustString ("woot", function (buf, len) {return main.is_foobar (buf, len)}))})

P.S. The Module._malloc in Emscripten might be semantically equivalent to the alloc function we implemented above. Under the "wasm32-unknown-emscripten" target you can use the Module._malloc with Rust.



回答3:

As pointed out by Shepmaster, only numbers can be passed to WebAssembly, so we need to convert the string into an Uint16Array.

To do so we can use this str2ab function found here:

function str2ab(str) {
  var buf = new ArrayBuffer(str.length*2); // 2 bytes for each char
  var bufView = new Uint16Array(buf);
  for (var i=0, strLen=str.length; i < strLen; i++) {
    bufView[i] = str.charCodeAt(i);
  }
  return buf;
}

This now works:

instance.exports.compute(
    str2ab(operator), 
    n1, n2
);

Because we're passing a reference to an array of unsigned integers.