An answer to How do I read the entire body of a Tokio-based Hyper request? suggests:
you may wish to establish some kind of cap on the number of bytes read [when using futures::Stream::concat2
]
How can I actually achieve this? For example, here's some code that mimics a malicious user who is sending my service an infinite amount of data:
extern crate futures; // 0.1.25
use futures::{prelude::*, stream};
fn some_bytes() -> impl Stream<Item = Vec<u8>, Error = ()> {
stream::repeat(b"0123456789ABCDEF".to_vec())
}
fn limited() -> impl Future<Item = Vec<u8>, Error = ()> {
some_bytes().concat2()
}
fn main() {
let v = limited().wait().unwrap();
println!("{}", v.len());
}
One solution is to create a stream combinator that ends the stream once some threshold of bytes has passed. Here's one possible implementation:
struct TakeBytes<S> {
inner: S,
seen: usize,
limit: usize,
}
impl<S> Stream for TakeBytes<S>
where
S: Stream<Item = Vec<u8>>,
{
type Item = Vec<u8>;
type Error = S::Error;
fn poll(&mut self) -> Poll<Option<Self::Item>, Self::Error> {
if self.seen >= self.limit {
return Ok(Async::Ready(None)); // Stream is over
}
let inner = self.inner.poll();
if let Ok(Async::Ready(Some(ref v))) = inner {
self.seen += v.len();
}
inner
}
}
trait TakeBytesExt: Sized {
fn take_bytes(self, limit: usize) -> TakeBytes<Self>;
}
impl<S> TakeBytesExt for S
where
S: Stream<Item = Vec<u8>>,
{
fn take_bytes(self, limit: usize) -> TakeBytes<Self> {
TakeBytes {
inner: self,
limit,
seen: 0,
}
}
}
This can then be chained onto the stream before concat2
:
fn limited() -> impl Future<Item = Vec<u8>, Error = ()> {
some_bytes().take_bytes(999).concat2()
}
This implementation has caveats:
- it only works for
Vec<u8>
. You can introduce generics to make it more broadly applicable, of course.
- it allows for more bytes than the limit to come in, it just stops the stream after that point. Those types of decisions are application-dependent.
Another thing to keep in mind is that you want to attempt to tackle this problem as low as you can — if the source of the data has already allocated a gigabyte of memory, placing a limit won't help as much.