2

I'm writing a MQTT5 library. To send a packet, I need to know the size of the payload before writing the payload. My solution for determining the size has the following constraints order by importance:

  1. be easy to maintain
  2. should not create copies of the data
  3. should be fairly performant (avoid double calculations)

To determine the size I can do any of the following solutions:

  1. do the calculations by hand, which is fairly annoying
  2. hold a copy of the data to send in memory, which I want to avoid
  3. Build an std::iter::ExactSizeIterator for the payload which consists of std::iter::Chains itself, which leads to ugly typings fast, if you don't create wrapper types

I decided to go with version 3.

The example below shows my try on writing a MQTT String iterator. A MQTT String consists of two bytes which are the length of the string followed by the data as utf8.

use std::iter::*;
use std::slice::Iter;

pub struct MQTTString<'a> {
    chain: Chain<Iter<'a, u8>, Iter<'a, u8>>,
}

impl<'a> MQTTString<'a> {
    pub fn new(s: &'a str) -> Self {
        let u16_len = s.len() as u16;
        let len_bytes = u16_len.to_be_bytes();
        let len_iter = len_bytes.iter(); // len_bytes is borrowed here

        let s_bytes = s.as_bytes();
        let s_iter = s_bytes.iter();

        let chain = len_iter.chain(s_iter);

        MQTTString { chain }
    }
}

impl<'a> Iterator for MQTTString<'a> {
    type Item = &'a u8;
    fn next(&mut self) -> Option<&'a u8> {
        self.chain.next()
    }
}

impl<'a> ExactSizeIterator for MQTTString<'a> {}

pub struct MQTTStringPait<'a> {
    chain: Chain<std::slice::Iter<'a, u8>, std::slice::Iter<'a, u8>>,
}

This implementation doesn't compile because I borrow len_bytes instead of moving it, so it'd get dropped before the Chain can consume it:

error[E0515]: cannot return value referencing local variable `len_bytes`
  --> src/lib.rs:19:9
   |
12 |         let len_iter = len_bytes.iter(); // len_bytes is borrowed here
   |                        --------- `len_bytes` is borrowed here
...
19 |         MQTTString { chain }
   |         ^^^^^^^^^^^^^^^^^^^^ returns a value referencing data owned by the current function

Is there a nice way to do this? Adding len_bytes to the MQTTString struct doesn't help. Is there a better fourth option of solving the problem?

1 Answer 1

2

The root problem is that iter borrows the array. In nightly Rust, you can use array::IntoIter, but it does require that you change your iterator to return u8 instead of &u8:

#![feature(array_value_iter)]

use std::array::IntoIter;
use std::iter::*;
use std::slice::Iter;

pub struct MQTTString<'a> {
    chain: Chain<IntoIter<u8, 2_usize>, Copied<Iter<'a, u8>>>,
}

impl<'a> MQTTString<'a> {
    pub fn new(s: &'a str) -> Self {
        let u16_len = s.len() as u16;
        let len_bytes = u16_len.to_be_bytes();
        let len_iter = std::array::IntoIter::new(len_bytes);

        let s_bytes = s.as_bytes();
        let s_iter = s_bytes.iter().copied();

        let chain = len_iter.chain(s_iter);

        MQTTString { chain }
    }
}

impl<'a> Iterator for MQTTString<'a> {
    type Item = u8;
    fn next(&mut self) -> Option<u8> {
        self.chain.next()
    }
}

impl<'a> ExactSizeIterator for MQTTString<'a> {}

You could do the same thing in stable Rust by using a Vec, but that'd be a bit of overkill. Instead, since you know the exact size of the array, you could get the values and chain more:

use std::iter::{self, *};
use std::slice;

pub struct MQTTString<'a> {
    chain: Chain<Chain<Once<u8>, Once<u8>>, Copied<slice::Iter<'a, u8>>>,
}

impl<'a> MQTTString<'a> {
    pub fn new(s: &'a str) -> Self {
        let u16_len = s.len() as u16;
        let [a, b] = u16_len.to_be_bytes();

        let s_bytes = s.as_bytes();
        let s_iter = s_bytes.iter().copied();

        let chain = iter::once(a).chain(iter::once(b)).chain(s_iter);

        MQTTString { chain }
    }
}

impl<'a> Iterator for MQTTString<'a> {
    type Item = u8;
    fn next(&mut self) -> Option<u8> {
        self.chain.next()
    }
}

impl<'a> ExactSizeIterator for MQTTString<'a> {}

See also:


An iterator of &u8 is not a good idea from the point of view of pure efficiency. On a 64-bit system, &u8 takes up 64 bits, as opposed to the 8 bits that the u8 itself would take. Additionally, dealing with this data on a byte-by-byte basis will likely impede common optimizations around copying memory around.

Instead, I'd recommend creating something that can write itself to something implementing Write. One possible implementation:

use std::{
    convert::TryFrom,
    io::{self, Write},
};

pub struct MQTTString<'a>(&'a str);

impl MQTTString<'_> {
    pub fn write_to(&self, mut w: impl Write) -> io::Result<()> {
        let len = u16::try_from(self.0.len()).expect("length exceeded 16-bit");
        let len = len.to_be_bytes();
        w.write_all(&len)?;
        w.write_all(self.0.as_bytes())?;
        Ok(())
    }
}

See also:

4
  • This is really nice and helped a lot, but using a copied iterator creates a copy of all data if I'm not mistaken. I know there's a problem with u8 and &u8 and they can't be mixed, but a solution only referencing the original data would be nice. Commented Jan 15, 2021 at 16:48
  • I took a look at your linked answer and I think I will create a struct which converts a string into an iterator like you demonstrated there for the pixel. Commented Jan 15, 2021 at 16:50
  • 1
    @Snapstromegon I plan on addressing this in the post after my lunch, but a &u8 is a really poor choice. It’s 64bits big and requires an indirect memory access, and then the consumer of the iterator still has to copy it to the network buffer anyway.
    – Shepmaster
    Commented Jan 15, 2021 at 16:51
  • Your 64bit comment made me rethink. Storing the payload as chunks in a Vec<&[u8]> and using that as a stack would probably solve all issues... Commented Jan 15, 2021 at 17:30

Not the answer you're looking for? Browse other questions tagged or ask your own question.