I have a struct Foo
:
struct Foo {
v: String,
// Other data not important for the question
}
I want to handle a data stream and save the result into Vec<Foo>
and also create an index for this Vec<Foo>
on the field Foo::v
.
I want to use a HashMap<&str, usize>
for the index, where the keys will be &Foo::v
and the value is the position in the Vec<Foo>
, but I'm open to other suggestions.
I want to do the data stream handling as fast as possible, which requires not doing obvious things twice.
For example, I want to:
- allocate a
String
only once per one data stream reading - not search the index twice, once to check that the key does not exist, once for inserting new key.
- not increase the run time by using
Rc
orRefCell
.
The borrow checker does not allow this code:
let mut l = Vec::<Foo>::new();
{
let mut hash = HashMap::<&str, usize>::new();
//here is loop in real code, like:
//let mut s: String;
//while get_s(&mut s) {
let s = "aaa".to_string();
let idx: usize = match hash.entry(&s) { //a
Occupied(ent) => {
*ent.get()
}
Vacant(ent) => {
l.push(Foo { v: s }); //b
ent.insert(l.len() - 1);
l.len() - 1
}
};
// do something with idx
}
There are multiple problems:
hash.entry
borrows the key sos
must have a "bigger" lifetime thanhash
- I want to move
s
at line (b), while I have a read-only reference at line (a)
So how should I implement this simple algorithm without an extra call to String::clone
or calling HashMap::get
after calling HashMap::insert
?
@Shepmaster already demonstrated accomplishing this using
unsafe
, once you have I would encourage you to check how muchRc
actually would cost you. UnfortunatelyRc<str>
is not easy to create, as it would neatly solve the issue here.Nonetheless, just in case, here is a full version with
Rc
(the dumb one). It's safe and likely quite efficient. A more efficient version can be made by creating aRc<[u8]>
and transmuting it into aRc<str>
, with only a tiny bit ofunsafe
inFoo
's constructor.In general, what you are trying to accomplish is unsafe and Rust is correctly preventing you from doing something you shouldn't. For a simple example why, consider a
Vec<u8>
. If the vector has one item and a capacity of one, adding another value to the vector will cause a re-allocation and copying of all the values in the vector, invalidating any references into the vector. This would cause all of your keys in your index to point to arbitrary memory addresses, thus leading to unsafe behavior. The compiler prevents that.In this case, there's two extra pieces of information that the compiler is unaware of but the programmer isn't:
String
is heap-allocated, so moving the pointer to that heap allocation isn't really a problem.String
will never be changed. If it were, then it might reallocate, invalidating the referred-to address.In cases like this, it is OK to use
unsafe
code, so long as you properly document why it's not unsafe.However, my guess is that you don't want this code in your
main
method but want to return it from some function. That will be a problem, as you will quickly run into Why can't I store a value and a reference to that value in the same struct?.Honestly, there's styles of code that don't fit well within Rust's limitations. If you run into these, you could:
unsafe
code, preferably thoroughly tested and only exposing a safe API.For example, I'd probably rewrite the code to have the index be the primary owner of the key:
Alternatively, as shown in What's the idiomatic way to make a lookup table which uses field of the item as the key?, you could wrap your type and store it in a set container instead:
Guessing about performance characteristics without performing profiling is never a good idea. I honestly don't believe that there'd be a noticeable performance loss from incrementing an integer when a value is cloned or dropped. If the problem required both an index and a vector, then I would reach for some kind of shared ownership.
The error is:
The
note:
at the end is where the answer is.s
must outlivehash
because you are using&s
as a key in theHashMap
. This reference will become invalid whens
is dropped. But, as the note says,hash
will be dropped afters
. A quick fix is to swap the order of their declarations:But now you have another problem:
This one is more obvious.
s
is borrowed by theEntry
, which will live to the end of the block. Clonings
will fix that:But the type of
Foo.v
isString
, so it will own its own copy of thestr
anyway. Just that type means you have to copy thes
.You can replace it with a
&str
instead which will allow it to stay as a reference intos
:Note that, previously I had to move the declaration of
s
to beforehash
, so that it would outlive it. But now,l
holds a reference tos
, so it has to be declared even earlier, so that it outlivesl
.