logo

utils-std

Collection of commonly available Unix tools git clone https://anongit.hacktivis.me/git/utils-std.git/
commit: e8e526bdb58ea53fafea5fd82e3535b7fe218b23
parent a146d09884ac03cd824fbb4b32f35d2f4d17a683
Author: Haelwenn (lanodan) Monnier <contact@hacktivis.me>
Date:   Fri, 31 Oct 2025 15:03:27 +0100

cmd/shuf.1: document shuffling method

Diffstat:

Mcmd/shuf.134+++++++++++++++++++++++++++-------
Mcmd/shuf.c1+
2 files changed, 28 insertions(+), 7 deletions(-)

diff --git a/cmd/shuf.1 b/cmd/shuf.1 @@ -1,7 +1,7 @@ .\" utils-std: Collection of commonly available Unix tools .\" Copyright 2017 Haelwenn (lanodan) Monnier <contact+utils@hacktivis.me> .\" SPDX-License-Identifier: MPL-2.0 -.Dd January 17, 2025 +.Dd October 31, 2025 .Dt SHUF 1 .Os .Sh NAME @@ -18,17 +18,23 @@ .Op Fl n Ar num .Op Ar string... .Sh DESCRIPTION +In it's first form, .Nm -reads each +reads lines from each .Ar file -in sequence and writes it on the standard output with some shuffling applied to each line. -If no +or if unspecified or when .Ar file -is given or if -.Ar file is +is .Qq - , +lines are read from standard input. +And are then shuffled and printed using a reservoir shuffle, see +.Sx SHUFFLING +for details. +.Pp +In it's second form, .Nm -reads from the standard input. +uses a Fisher-Yates shuffle to swap-shuffle all the strings, +and then prints them out as lines. .Sh OPTIONS .Bl -tag -width _n_num .It Fl e @@ -42,8 +48,22 @@ lines. .It Fl z Use NULL as line delimiter, not newline. .El +.Sh SHUFFLING +In it's first form, +.Nm +.\" LINES_LEN +uses a reservoir of 512 lines. +It picks a random location, +prints a line if present, +then inserts a newly read line. +Once all lines are read it prints the lines still present in the reservoir. +.br +While this isn't truly a random sort as lines beyond 512 won't be printed first, +it allows to use a bounded amount of memory. .Sh EXIT STATUS .Ex -std +.Sh SEE ALSO +.Xr sort 1 .Sh HISTORY An .Nm diff --git a/cmd/shuf.c b/cmd/shuf.c @@ -19,6 +19,7 @@ // Not a full shuffle, if there is more than 512 lines then last lines are never going to be printed first. // But this allows bounded memory usage. +// /!\ Make sure to modify the manpage as well if this gets changed /!\ // FIXME: handle newline-less lines