Commentari | How Rust reads stdin() line by line -- en trois façons

This post was written in the order of my work. Skip section two and three for a faster read.

reading stdin by lines

Learning std::io has taken me about three, maybe four separate attempts. The Read, Write, and BufRead traits are dense. A simple task like reading a stream line by line becomes a thorough chore. Let's look at stdin.

Skip to the end for benchmarking results on how three typical stdin-by-line approaches compare.

Consider a simple echo program:

1// echo.rs
2use std::io;
3
4fn main() {
5    for line in io::stdin().lines() {
6        println!("{}", line.unwrap());
7    }
8}

Connect to stdin, get a Lines iterator over the stdin stream, and iterate. Simple.

Even a simple cargo run has a lot going on.

./echo in a terminal attaches the keystrokes of your keyboard to the stdin stream of the program
the iterator calls next() which leads to a blocking read system call; all this usually happens much faster than a human operator has time to press the first key
a human (you!) types characters which are buffer by your terminal's discipline
hitting <ENTER> sends this line of characters from the discipline into a kernel buffer for your process
the kernel buffer holds characters until they are consumed by a blocking read (remember, the read is often invoked before the first data arrives)
the read unblocks and consumes data from the kernel buffer
the rust loop continues with the print statement^[1]
iterate until a plain EOF (input with ^D on posix shells) is read as the totality of a line; sending a nonempty line that also contains sequence ^D will not close the stream^[2]

As for the code itself, std::io::Stdin is defined as a handle to the standard input stream of a process [docs]. In echo.rs, lines(self) consumes Stdin and produces an iterator Lines<StdinLock<'static>> over individual lines.

reading stdin by lines ... again

Fun fact, a ChatGPT (5.2 no less) session in 2026 may tell you the code I wrote above does not work. Historically, Rust recommended a recipe involving a explicit lock() call and BufRead trait definition.^[3]

1use std::io;
2use std::io::BufRead;
3
4fn main() {
5    let stdin = io::stdin();
6    for line in stdin.lock().lines() {
7        println!("{}", line.unwrap());
8    }
9}

From the quill of George Lucas, "This is getting out of hand! Now, there are two of them!"

At a glance, this can appear more expensive, which raises an obvious question: is stdin().lines() doing something cheaper under the hood? Is the difference purely syntactic? (Later we'll peek into source)

creating a benchmark

I wanted to compare these two methods using a trial harness.

1use std::io;
2use std::time;
3
4fn old_style() -> time::Duration {
5    use std::io::BufRead;
6
7    let now = time::Instant::now();
8    let lines = io::stdin().lock().lines();
9    for line in lines {
10        println!("got a line: {}", line.unwrap());
11    }
12
13    now.elapsed()
14}
15
16fn new_style() -> time::Duration {
17    let now = time::Instant::now();
18    let lines = io::stdin().lines();
19    for line in lines {
20        println!("got a line: {}", line.unwrap());
21    }
22
23    now.elapsed()
24}
25
26fn tabulate(times: Vec<time::Duration>) -> f64 {
27    let num_trials = times.len();
28    let total = times
29        .into_iter()
30        .fold(0, |acc, x: time::Duration| acc + x.as_nanos());
31    total as f64 / (num_trials as f64 * 10_i32.pow(6) as f64)
32}
33
34fn main() {
35    let results: Vec<time::Duration> = (0..5).map(|_| old_style()).collect();
36    let results2: Vec<time::Duration> = (0..5).map(|_| new_style()).collect();
37
38    // finish by tabulating and reporting results to an out stream
39}
40

My input file consists of tens of thousands of lines of asdfkj;1234567890. I was very excited.

cargo r < 1mb_ascii_file.txt > out
[src/main.rs:45:5] &results = [
    89.156916ms,
    416ns,
    291ns,
    333ns,
    334ns,
]
[src/main.rs:46:5] &results2 = [
    334ns,
    292ns,
    334ns,
    333ns,
    333ns,
]

Right. So, this is wrong. Only the first iteration appears to be doing any real processing.

Now, what if I told you that redirecting standard input to read a file has different semantics.

Today's central lesson: io::Stdin is a reader. It implements io::Read. It does not io::Seek, however. It is closer to a stream than a file handle, even when the stdin file descriptor is in fact a file. Stdin is only read forward.

In particular, issuing read() system calls--including to stdin--requests bytes from the kernel. The kernel copes bytes from the file to the requesting process. Note that in a BufRead-backed operation, Rust minimizes syscalls by issuing fewer, larger reads(). Even when only one line is requested, other bytes are buffered in memory. Iteration takes place over that buffered payload in memory. Meanwhile, the file descriptor tracks a single offset parameter for "the number of bytes read so far." It updates whenever any read call is made. When the file's end is reached, the kernel hands no more bytes. In the Rust process' space, subsequent read calls return 0 bytes. The Lines reports this as None.

The numbers capture this. The first iteration reads stdin. Subsequent iterations are already finding EOF and return instantly.

Benchmark pros would also ask for removal of the println! calls. They add noise and dominate the elapsed time.

creating a benchmark take 2

To repeatedly test stdin, you must operate from a higher layer. Bash it is.

1#!/bin/bash
2
3set -e
4
5# add -O as needed to these commands
6rustc vers1.rs
7rustc vers2.rs
8
9tabulation1=0
10tabulation2=0
11
12TRIALS=10 # warning bash can't parameterize brace expansion
13for _ in {0..9}; do
14  time1="$(./vers1 < a.txt)"
15  time2="$(./vers2 < a.txt)"
16
17  tabulation1="$(( tabulation1 + time1))"
18  tabulation2="$(( tabulation2 + time2))"
19done
20
21# Done due to bash not having float division
22printf "Average time for BufRead + stdin().lock().lines(): "
23bc <<< "scale=10; $tabulation1 / ($TRIALS * 1000000)"
24
25printf "Average time for stdin().lines(): "
26bc <<< "scale=10; $tabulation2 / ($TRIALS * 1000000)"

with the rust scripts being

1// vers1.rs
2use std::io;
3use std::time;
4use std::io::BufRead;
5
6fn main() {
7    let now = time::Instant::now();
8    let lines = io::stdin().lock().lines();
9    for line in lines {
10        let _ = line.unwrap();
11    }
12
13    println!("{}", now.elapsed().as_nanos())
14}

and

1// vers2.rs
2use std::io;
3use std::time;
4
5fn main() {
6    let now = time::Instant::now();
7    let lines = io::stdin().lines();
8    for line in lines {
9        let _ = line.unwrap();
10    }
11
12    println!("{}", now.elapsed().as_nanos())
13}

The results of 10 trials:

Average time ms for BufRead + stdin().lock().lines(): 5.5714832000
Average time ms for stdin().lines(): 5.2496541000

The explicit lock and BufRead method runs about 3-11% faster, over subsequent trial runs. Even reversing the order of these two scripts did not alter this. But, higher trial count completely smooths out this difference.

Average time ms for BufRead + stdin().lock().lines(): 3.9031291500
Average time ms for stdin().lines(): 3.8735687600

This is when I read the source. Here's Stdin::lines()'s source:

pub fn lines(self) -> Lines<StdinLock<'static>> {
    self.lock().lines()
}

The BufRead is referenced through the prelude. So, this is just a thin abstraction for convenience. There was no substantive difference after all.

What explains the explicit BufRead method running faster at lower trial counts when compiled in debug mode? The stdin().lines convenience wrapper creates a tiny bit of call-path overhead. The difference of 3-11% vanishes at higher (e.g. 100+) trial counts.

Oh, fun fact, when compiled with -O, the 3-11% difference flips in favor of the lines() api, whether for 10-run trials or 100-run trials. It might be due to generic churn and compiler inlining. Without a more sanitary environment, it isn't fair to categorize this further. These two methods are within the same reasonable epsilon.

So, much for this benchmark.

reading stdin by lines ... again again

Oh, yeah, there is one other canonical method for reading input. In fact, this is the style I first learned.

1use std::time;
2use std::io::{self, BufRead};
3
4fn main() {
5    let stdin = io::stdin();
6    let mut stdin = stdin.lock();
7
8    let now = time::Instant::now();
9    let mut line = String::new();
10    // alternatively, use unwrap() to propagate panic from stdin errors
11    while let Ok(n) = stdin.read_line(&mut line) {
12        if n == 0 {
13            break;
14        }
15        line.clear();
16    }
17    println!("{}", now.elapsed().as_nanos())
18}

Here, we create a buffer that read_line appends to.^[4] clear()-ing the buffer between iterations enables reuse.

We benchmark this by using -O and obtaining the average over 100 trials:

Average time ms for BufRead + stdin().lock().lines(): 1.6897650300
Average time ms for stdin().lines(): 1.6459237700
Average time ms for stdin.read_line(): .7772113000

This finding, that read_line is twice as fast, appears for both smaller and larger trial sets. For debug builds, this method is up to four times faster. This can be attributed most clearly to the implicit String allocations and deallocations taking place with the iterator methods that are hidden inside iterator-based APIs. For a file with similar size lines, there is even less pressure to reallocate the buffer String's heap payload. It is allocated only on overflow. When cleared, the len value merely returns to 0.

some concluding thoughts

One millisecond there, half a millisecond there--call it noise. It is on the scale of human speed. But, dismissiveness precludes deeper understanding. There are several ways to operate this very basic task in Rust's standard library. We have characterized tangible differences between them. The iterator APIs layer syntactic sugar. However, allocating a buffer directly can see meaningful performance boosts for processing input. Indeed, string allocations and deallocations are a frequent topic on my team for this reason.

Another reason I wished to write this: the idea of standard input has always been slippery to me. On the one hand, it's file descriptor 0. Sometimes, it is a file. Sometimes, it is backed by a PTY or even a TTY. In college, I found you can trick C++ into reading from stdin like a file and seek over it by masking it with a macro. In Python, you just do input(). All these interfaces to just grab lines. Rust traits at least make its API concrete.

I take away from this the insight that standard input is not actually quite that standard. There is a range of expectations. A language exposes capabilities. Better to understand the runtime's POV than to arrive with another language's worldview.

TIL, flubs, and other minutia

Precision loss in fold()

I reached for fold in my original tabulate() function.

fn tabulate(times: Vec<time::Duration>) -> f64 {
    let num_trials = times.len();
    let total = times
        .into_iter()
        .fold(0, |acc, x: time::Duration| acc + x.as_nanos());
    total as f64 / (num_trials as f64 * 10_i32.pow(6) as f64)
}

In plain words, sum durations to find the average runtime and convert to millisecond scale.

Notice the mistake?^[5] We are losing precision by coercing a u128 to f64. I assumed without thinking that f64 would be a safe target type.

It is subtle what information is lost. I placed a syntactically untyped 0 as the init parameter of fold. Type inference is in play. I assumed it'd be treated as 0i32 or even 0u32. Hence, f64 should express a 4-byte integer safely. However, Duration::as_nanos^(source) returns u128. The conversion to f64 must lose more than half the precision information of a 16-byte integer (precisely 75 bits for a 53-bit mantissa).

It's easy to miss this. At first, I wondered if this was a quirk of generics when using something as flexible as fold. Its source is simple. It rules out my hypothesis and indicates the real answer.

    fn fold<B, F>(mut self, init: B, mut f: F) -> B
    where
        Self: Sized,
        F: FnMut(B, Self::Item) -> B,
    {
        let mut accum = init;
        while let Some(x) = self.next() {
            accum = f(accum, x);
        }
        accum
    }

All fold does is apply our closure. When init is an untyped {integer}, the typed term (given by as_nanos()) constrains it. 0 is thereby constrained to u128. Now, since f64 only has a 53-bit mantissa, this code creates a truncation risk when as f64 is invoked.

truncation and precision

What does as do on numeric types? It's straightforward that a wider type extends the data by infilling the high bits with zeros or sign bits (two's complement compliant). Do you know how truncation acts? A few tests answer this.

// Simple untyped integer coercion
0u16 --> 0u8
1u16 --> 1u8

127u16 --> 127u8
128u16 --> 128u8

255u16 --> 255u8
256u16 --> 0u8
257u16 --> 1u8

// Showing the same behavior with twos complement
0u16 --> 0i8
1u16 --> 1i8

127u16 --> 127i8
128u16 --> -128i8

255u16 --> -1i8
256u16 --> 0i8
257u16 --> 1i8

// A u128 to f64 for good measure
123456789022345678931234567890u128 --> 123456789022345670000000000000f64

For integers, coercion downward preserves the LSB bits as is. For integer-to-float conversion, the mantissa preserves only the MSB bits that fit. The float's exponent encodes scale. Precision is traded for magnitude.

endnotes

Which is its own can of worms.
stdin may close before hitting EOF due to system failure.
In this version, lines() consumes the StdinLock guard; once the iterator is dropped, the lock is released and you can lock stdin again.
Note, there are many variations on how to handle read_line's return type of std::io::Result<usize>. This is just my favourite all purpose method.
Some would argue there's a second mistake due to possible overflow of u128. One mitigation is to divide each term by the number of trials before summing, reducing the peak magnitude of the accumulator (or, more honestly, to stop worrying and accept that I am not benchmarking for geological time scales).