28

I have the following problem: I have a have a data structure that is parsed from a buffer and contains some references into this buffer, so the parsing function looks something like

fn parse_bar<'a>(buf: &'a [u8]) -> Bar<'a>

So far, so good. However, to avoid certain lifetime issues I'd like to put the data structure and the underlying buffer into a struct as follows:

struct BarWithBuf<'a> {bar: Bar<'a>, buf: Box<[u8]>}
// not even sure if these lifetime annotations here make sense,
// but it won't compile unless I add some lifetime to Bar

However, now I don't know how to actually construct a BarWithBuf value.

fn make_bar_with_buf<'a>(buf: Box<[u8]>) -> BarWithBuf<'a> {
    let my_bar = parse_bar(&*buf);
    BarWithBuf {buf: buf, bar: my_bar}
}

doesn't work, since buf is moved in the construction of the BarWithBuf value, but we borrowed it for parsing.

I feel like it should be possible to do something along the lines of

fn make_bar_with_buf<'a>(buf: Box<[u8]>) -> BarWithBuf<'a> {

    let mut bwb = BarWithBuf {buf: buf};
    bwb.bar = parse_bar(&*bwb.buf);
    bwb
}

to avoid moving the buffer after parsing the Bar, but I can't do that because the whole BarWithBuf struct has to be initalised in one go. Now I suspect that I could use unsafe code to partially construct the struct, but I'd rather not do that. What would be the best way to solve this problem? Do I need unsafe code? If I do, would it be safe do to this here? Or am I completely on the wrong track here and there is a better way to tie a data structure and its underlying buffer together?

2
  • I never figured out if it is possible to have internal references to an other member of a struct without unsafe code. I can't see how the borrow checker could follow a borrow like this one...
    – Levans
    Commented Nov 23, 2014 at 19:10
  • 2
    This question is old enough that I don't want to close it as a duplicate, but most people visiting this should probably check out Why can't I store a value and a reference to that value in the same struct? instead.
    – Shepmaster
    Commented Feb 18, 2020 at 13:23

2 Answers 2

7

I think you're right in that it's not possible to do this without unsafe code. I would consider the following two options:

  1. Change the reference in Bar to an index. The contents of the box won't be protected by a borrow, so the index might become invalid if you're not careful. However, an index might convey the meaning of the reference in a clearer way.

  2. Move Box<[u8]> into Bar, and add a function buf() -> &[u8] to the implementation of Bar; instead of references, store indices in Bar. Now Bar is the owner of the buffer, so it can control its modification and keep the indices valid (thereby avoiding the problem of option #1).

  3. As per DK's suggestion below, store indices in BarWithBuf (or in a helper struct BarInternal) and add a function fn bar(&self) -> Bar to the implementation of BarWithBuf, which constructs a Bar on-the-fly.

Which of these options is the most appropriate one depends on the actual problem context. I agree that some form of "member-by-member construction" of structs would be immensely helpful in Rust.

4
  • 1
    I've considered the first options, but I don't particularly like it. I have trouble understanding your second suggestion, could you maybe elaborate on that?
    – fjh
    Commented Nov 23, 2014 at 22:15
  • 1
    @fjh I believe Adrian is suggesting that you wrap the Box in a type which has a method that returns a temporary Bar. The idea is that you store a "portable" form of the Bar (like indices into the boxed data), and the method constructs a Bar on-demand, using the lifetime of &self to keep things safe.
    – DK.
    Commented Nov 24, 2014 at 2:52
  • @DK. Not quite, but I clarified option #2 and added your suggestion as option #3. Commented Nov 24, 2014 at 8:22
  • Thank you! I'll probably just bite the bullet and use String and Vec instead of referring to the buffer to sidestep this whole issue. Index-based solutions feel a bit hacky and wouldn't really work for me since some of the references into the buffer are &str, so I'd have to repeatedly verify the the utf encoding or use unsafe code.
    – fjh
    Commented Nov 24, 2014 at 19:33
0

Here's an approach that will work through a little bit of unsafe code. This approach requires that you are okay with putting the referred-to thing (here, your [u8]) on the heap, so it won't work for direct reference of a sibling field.

Let's start with a toy Bar<'a> implementation:

struct Bar<'a> {
    refs: Vec<&'a [u8]>,
}

impl<'a> Bar<'a> {
    pub fn parse(src: &'a [u8]) -> Self {
        // placeholder for actually parsing goes here
        Self { refs: vec![src] }
    }
}

We'll make BarWithBuf that uses a Bar<'static>, as 'static is the only lifetime with an an accessible name. The buffer we store things in can be anything that doesn't move the target data around on us. I'm going to go with a Vec<u8>, but Box, Pin, whatever will work fine.

struct BarWithBuf {
    buf: Vec<u8>,
    bar: Bar<'static>,
}

The implementation requires a tiny bit of unsafe code.

impl BarWithBuf {
    pub fn new(buf: Vec<u8>) -> Self {
        // The `&'static [u8]` is inferred, but writing it here for demo
        let buf_slice: &'static [u8] = unsafe {
            // Going through a pointer is a "good" way to get around lifetime checks
            std::slice::from_raw_parts(&buf[0], buf.len())
        };
        let bar = Bar::parse(buf_slice);
        Self { buf, bar }
    }

    /// Access to Bar should always come through this function.
    pub fn bar(&self) -> &Bar {
        &self.bar
    }
}

The BarWithBuf::bar is an important function to re-associate the proper lifetimes to the references. Rust's lifetime elision rules make the function equivalent to pub fn bar<'a>(&'a self) -> &'a Bar<'a>, which turns out to be exactly what we want. The lifetime of the slices in BarWithBuf::bar::refs are tied to the lifetime of BarWithBuf.

WARNING: You have to be very careful with your implementation here. You cannot make #[derive(Clone)] for BarWithBuf, since the default clone implementation will clone buf, but the elements of bar.refs will still point to the original. It is only one line of unsafe code, but the safety is still off in the "safe" bits.


For larger bits of self-referencing structures, there's the ouroboros crate, which wraps up a lot of unsafe bits for you. The techniques are similar to the one I described above, but they live behind macros, which is a more pleasant experience if you find yourself making a number of self-references.

Not the answer you're looking for? Browse other questions tagged or ask your own question.