As of version 1.8.10[1], which includes my merge request[2] to add an '--output' option, it has even completely replaced my use of 'dd' for writing disk images: 'sudo pv -Yo /dev/mmcblk0 whatever.img' is nicer, has much better progress indication, automatically selects a more sensible buffer size, and begets fewer groans from UNIX neckbeards, than the old 'sudo dd of=/dev/mmcblk0 if=whatever.img'. (The '-Y' causes pv to sync after each write, which greatly improves progress indication in Linux.)
Though it's useful for much more of course. I use it for progress when compressing files ('pv blah | gzip ...'), when uploading files to the web ('pv blah | curl --upload-file - ...' — curl doesn't show progress when uploading for whatever reason), or just when I wanna see that something is happening with an operation which would otherwise take a while (even things like a slow 'du -h /some/path | sort -h' benefits from a 'pv' squeezed in the middle just to indicate that something is happening).
There's also `progress` which works for tools mainly operating on a single file, but unlike `pv`, you don't have to start the tool differently. It'd e.g. work nicely for the `gzip` example. Just call `progress` on a different terminal while `gzip` is running.
dividuum 62 days ago [-]
I was curious on how that’s supposed to work, so I took a quick look: It scans /proc for known commands, then looks up file descriptor information via their fd/fdinfo directory to get size/seek positions and then shows a percentage of the largest file.
fwip 62 days ago [-]
pv also allows this, but you have to look up the process id manually, and pass it to the -d flag.
arghwhat 62 days ago [-]
> This example shows that the access.log file is being read at the speed of 37.4MB/s but gzip is writing data at only 1.74MB/s. We can immediately calculate the compression rate. It's 37.4/1.74 = 21x!
No you cannot:
1. Compression algorithms buffer a lot and will have have tendencies to perform large burst writes and in particular large writes on final flush. Instantaneous measurements will therefore not be particularly useful.
2. Compression ratio refers to the average across an entire file, as entropy is not uniform across the input unless you are compressing just noise or just zeros. Some sections might compress extremely well while others do not and end up consuming more space than the input due to overhead.
Narishma 62 days ago [-]
3. Your hardware may not have similar performance characteristics for read and write operations.
jcul 62 days ago [-]
pv is great.
It has a limit parameter so you can limit the speed.
Great if you don't want to saturate some link or have additional costs for uploading above a certain rate per hour/day.
Also useful for testing behaviour on slow filesystem / connections.
It can take a pid argument too, -d IIRC, which will get it to display progress info for all the open file descriptors of a running process.
Really useful as a quick way to check what a IO process is doing if appears to be stuck.
darkwater 62 days ago [-]
Pipe viewer? What's that? Let me check the post...oh, it's good old pv! Never noticed it had a full name, damn Unix utilities with their short names!
fuzztester 62 days ago [-]
There is a shell built-in called alias.
You can use it to map the short name to the long name if you prefer, although people usually do it the other way around, to save on typing.
;)
remram 62 days ago [-]
I didn't understand, how would that help with discovering the full name of "pv"?
fuzztester 60 days ago [-]
it wouldn't. it was a joke.
fuzztester 62 days ago [-]
[flagged]
NelsonMinar 62 days ago [-]
I love pv but how much does adding the pipe affect overhead? I feel like most of my big jobs I want to measure are on things where you want the program to have direct access to the underlying file or storage. `pv somefile | dd` is going to be slower than `dd somefile`. At least I think so? I have no idea what modern Linux I/O can optimize.
Also does pv necessitate doing single threaded I/O?
codetrotter 62 days ago [-]
I used to love pv, but I had ZFS send and recv hang many times when I was using pv in the pipeline and I was never sure why but after I stopped using pv in the pipeline and started using the verbose flag of the ZFS command on the receive side instead which provides enough output for me to see that it’s progressing and haven’t had those kinds of problems since.
Searching now it seems indeed that up until recently this was a problem other people have been having too. For example the recent forum thread https://forums.freebsd.org/threads/zfs-send-hanging.94994/ where they were discussing which version this was fixed in and someone saying that the latest available version to them from the packages was still a bit too old.
throwaway127482 62 days ago [-]
I like to use pv as a quick and dirty operations per second counter. Sometimes I will write a program or script that does a bunch of things in parallel (e.g. RPCs to a service I'm working on), and prints one line of output for every operation completed. Then I pipe that output to pv using the --lines option to count only lines. It shows how many lines are being printed per second, which roughly counts operations per second. (IIRC, also need to pipe to /dev/null to prevent pv's fancy output from clobbering the tool's output).
Fun stuff! Especially when combined with GNU parallel, in cases where the thing I'm measuring isn't already parallelized, and I want to be lazy.
6c696e7578 62 days ago [-]
A little more typing, but I find dd present on most systems already, so I tend to do this:
tar ... | dd status=progress | ...
Aachen 62 days ago [-]
I've used pv longer than dd had this option for, but that's fair! I also don't use find options, for example, since find piped into the tool everyone already knows anyway - grep - is much easier
Sadly, dd will not give you an estimated time or allow you to limit the transfer rate, which are two features I use a lot in pv
flyinghamster 62 days ago [-]
One problem I've noticed with status=progress is that systems can sometimes have gigabytes of buffer space waiting to be filled, so the transfer spends most of its time in a "nearly done" state while (for instance) the SD card gets slowly filled at its real speed.
fulafel 62 days ago [-]
That's slowish, bottlenecking disk based IO. (yes you can improve it with dd options, if you are versed in the language...)
fuzztester 62 days ago [-]
dd conv=swab is a cool and useful option. swab stands for swap bytes, iirc. guess what it is used for, those who don't already know.
codetrotter 62 days ago [-]
> guess what it is used for, those who don't already know.
Changing the endianness of the data?
fuzztester 62 days ago [-]
Exactly!
That's just what I used it for, early in my career, in a software troubleshooting case.
I was a newbie dev, tasked with converting some data from another machine / OS format on tape to a Unix machine format.
Was fairly new to Unix too.
Looked through man pages, found dd, could read and copy the data from tape drive to disk using it. But the data, while in ASCII, and English letters, seemed garbled. Seemed like actual words, but not quite, if you know what I mean. Something seemed off a bit.
I puzzled over it for a while, then checked the dd man page again. I saw the conv=swab option and its meaning, which is to swap adjacent bytes. (Typical Unix cryptic syntax, and I say that as a long term Unix guy). I probably was new to endianess at that time. This was in a pre-Internet time, so I could not google to check what endianness meant.
I added that option to my dd command line, and what was my shock and pleasure to see all the words coming out as proper English words, now, on the screen!
Then I knew that the tape contained actual data, not garbage. I saved it to disk under some file name.
I mean, if you’re the type of person who considers using tar and nc to be the obvious way to transfer a directory between two computers…
vbezhenar 62 days ago [-]
I might be weird, but for me the most obvious way to transfer a small directory is to do
tar -cz dir | base64
Copy output into clipboard
base64 -d | tar -xz
Paste from clipboard into input
Works flawlessly to move configs and stuff between servers.
I actually love the blend between terminal and GUI. For this example I'm using CLI tools to produce text and I'm using GUI to scroll, select and copy&paste the text between two terminal tabs. I wish developers put more emphasis on empowering terminal with GUI capabilities.
Towaway69 62 days ago [-]
On Mac you can use pbcopy to copy something your pasteboard aka clipboard.
So the first command becomes:
tar -cz dir | base64 | pbcopy
zoky 62 days ago [-]
And of course you can use pbpaste for the inverse. Doesn’t work over SSH though, which is almost certainly how you’d be using this.
Towaway69 61 days ago [-]
True that unless ssh tunnels the pasteboard nowadays ;)
GP was doing this using GUI and TUI so it's a case of having two terminals open to two different machines I assume.
zoky 61 days ago [-]
> GP was doing this using GUI and TUI so it's a case of having two terminals open to two different machines I assume.
Yeah, but that’s exactly the case where it won’t work. pbcopy and pbpaste run locally, so if you’re connected to a remote Mac they won’t work properly (they will just cut/paste to the remote machine’s local pasteboard), and if you’re connected to a non-Mac the commands won’t exist at all. The only case where they would work is, say, connecting from one Mac to another via something like VNC that does do pasteboard sharing, and you’re trying to copy a file structure from a remote system to a local one. If you’ve got two terminal windows open to two different machines, you’ll have to cut and paste manually.
zoky 62 days ago [-]
Under certain circumstances, like where machines are not directly accessible to each other and you’re only able to connect by Remote Desktop or something, that’s not actually bad way to do that. But for machines that are on the same network, using a weird invocation of tar and nc instead of just using, say, rsync is an odd choice. And for machines connected to each other over the Internet, it’s positively insane.
Yes, moreutils has a lot to offer.
vidir to quickly reorganize files and folders (even delete files),
chronic to run commands silently... unless they fail,
sponge to be able to pipe from a file and back to that file without a temporary one
ts to add a timestamp to the ouput
...
ElevenLathe 61 days ago [-]
I find myself using sponge, ts, and pee (tee for pipes: split stdin and send it to two or more different command lines) the most (and in that order).
sirjaz 62 days ago [-]
We have that in powershell also show-progress
gerdesj 62 days ago [-]
Ah yes, PowerShell. Never have so many owed so much to ... tab completion 8)
thanatos519 62 days ago [-]
Yes! My `,pv` is approximately: (probably a better way to make the k, but I stop once something is adequate; maybe I just need to make a `,,kof`)
As of version 1.8.10[1], which includes my merge request[2] to add an '--output' option, it has even completely replaced my use of 'dd' for writing disk images: 'sudo pv -Yo /dev/mmcblk0 whatever.img' is nicer, has much better progress indication, automatically selects a more sensible buffer size, and begets fewer groans from UNIX neckbeards, than the old 'sudo dd of=/dev/mmcblk0 if=whatever.img'. (The '-Y' causes pv to sync after each write, which greatly improves progress indication in Linux.)
Though it's useful for much more of course. I use it for progress when compressing files ('pv blah | gzip ...'), when uploading files to the web ('pv blah | curl --upload-file - ...' — curl doesn't show progress when uploading for whatever reason), or just when I wanna see that something is happening with an operation which would otherwise take a while (even things like a slow 'du -h /some/path | sort -h' benefits from a 'pv' squeezed in the middle just to indicate that something is happening).
[1] https://codeberg.org/a-j-wood/pv/releases/tag/v1.8.10
[2] https://codeberg.org/a-j-wood/pv/pulls/90
No you cannot:
1. Compression algorithms buffer a lot and will have have tendencies to perform large burst writes and in particular large writes on final flush. Instantaneous measurements will therefore not be particularly useful.
2. Compression ratio refers to the average across an entire file, as entropy is not uniform across the input unless you are compressing just noise or just zeros. Some sections might compress extremely well while others do not and end up consuming more space than the input due to overhead.
It has a limit parameter so you can limit the speed. Great if you don't want to saturate some link or have additional costs for uploading above a certain rate per hour/day.
Also useful for testing behaviour on slow filesystem / connections.
It can take a pid argument too, -d IIRC, which will get it to display progress info for all the open file descriptors of a running process.
Really useful as a quick way to check what a IO process is doing if appears to be stuck.
You can use it to map the short name to the long name if you prefer, although people usually do it the other way around, to save on typing.
;)
Also does pv necessitate doing single threaded I/O?
Searching now it seems indeed that up until recently this was a problem other people have been having too. For example the recent forum thread https://forums.freebsd.org/threads/zfs-send-hanging.94994/ where they were discussing which version this was fixed in and someone saying that the latest available version to them from the packages was still a bit too old.
Fun stuff! Especially when combined with GNU parallel, in cases where the thing I'm measuring isn't already parallelized, and I want to be lazy.
Sadly, dd will not give you an estimated time or allow you to limit the transfer rate, which are two features I use a lot in pv
Changing the endianness of the data?
That's just what I used it for, early in my career, in a software troubleshooting case.
I was a newbie dev, tasked with converting some data from another machine / OS format on tape to a Unix machine format.
Was fairly new to Unix too.
Looked through man pages, found dd, could read and copy the data from tape drive to disk using it. But the data, while in ASCII, and English letters, seemed garbled. Seemed like actual words, but not quite, if you know what I mean. Something seemed off a bit.
I puzzled over it for a while, then checked the dd man page again. I saw the conv=swab option and its meaning, which is to swap adjacent bytes. (Typical Unix cryptic syntax, and I say that as a long term Unix guy). I probably was new to endianess at that time. This was in a pre-Internet time, so I could not google to check what endianness meant.
I added that option to my dd command line, and what was my shock and pleasure to see all the words coming out as proper English words, now, on the screen!
Then I knew that the tape contained actual data, not garbage. I saved it to disk under some file name.
Job done.
$ echo "ehll,ow rodl" | dd conv=swab
# Should give:
hello, world
A bit like rot13, but with 2 as the length :)
https://en.m.wikipedia.org/wiki/ROT13
> $ gzip -c access.log > access.log.gz
Is it?
Works flawlessly to move configs and stuff between servers.
I actually love the blend between terminal and GUI. For this example I'm using CLI tools to produce text and I'm using GUI to scroll, select and copy&paste the text between two terminal tabs. I wish developers put more emphasis on empowering terminal with GUI capabilities.
So the first command becomes:
GP was doing this using GUI and TUI so it's a case of having two terminals open to two different machines I assume.
Yeah, but that’s exactly the case where it won’t work. pbcopy and pbpaste run locally, so if you’re connected to a remote Mac they won’t work properly (they will just cut/paste to the remote machine’s local pasteboard), and if you’re connected to a non-Mac the commands won’t exist at all. The only case where they would work is, say, connecting from one Mac to another via something like VNC that does do pasteboard sharing, and you’re trying to copy a file structure from a remote system to a local one. If you’ve got two terminal windows open to two different machines, you’ll have to cut and paste manually.
https://www.kylheku.com/cgit/pw/about
[1]: https://linux.die.net/man/1/watch
https://gitlab.com/mac3n/pipeleak