#StandWithUkraine

Search and filter large text files with CLI find command

I’ve picked up find command when looking into disabling hardware with shortcut. Since then it came handy few times. Getting filtered version of text sounds like an easy task, but surprisingly few text editors handle it well and with acceptable performance.

What it does

find is Windows native command line utility that searches through text file or stream and outputs all lines that match (or optionally don’t match) search text string.

Being command line utility it isn’t usability marvel, but has no graphical interface overhead. Works snappy and is easily scriptable.

For info on command run:

find /?

How to use

Let’s take server logs for example. I took archived logs for last ten days from server, that ended up as rarst.net-Jul-2010 file of 46MB in size and ~190,000 lines. Few editors will open this reliably and even less will help to make sense or filter it of it.

Let’s say I want to check for 404 errors. Thus lines that contain 404, surrounded by spaces.

find " 404 " rarst.net-Jul-2010

Really long list and some things I am not interested in. Like requests for icons by Apple devices that assume that all world should maintain separate icons for them. Meh.

Since find also accept streams it means that it can be chained with itself or other commands. So I want to further filter lines that don’t contain apple-touch-icon requests.

find " 404 " rarst.net-Jul-2010 | find /v "apple-touch-icon"

Pipe symbols streams output of our first find to one more find. Second has /v key that reverses logic – only lines that don’t contain string will pass.

And there can be more finds. Also not interested in comment spammers.

find " 404 " rarst.net-Jul-2010 | find /v "apple-touch-icon" | find /v "wp-comments-post.php"

Got my result, on other hand console window is hardly convenient viewing area. Luckily with console power results are easily directed into text file instead with > directive at the end..

find " 404 " rarst.net-Jul-2010 | find /v "apple-touch-icon" | find /v "wp-comments-post.php" > 404.log

And in single line command my ~190,000 lines log is reduced to ~800 lines, I am interested in.

More complex findstr version

If find is not enough there is also similar findstr utility that does same thing, but supports regular expressions. And pretty much anything is better with regular expressions. :)

Overall

Not flashy, but solid, scriptable and high-performance method to filter large text files.

Manuals and examples at SS64

Link http://ss64.com/nt/find.html

Link http://ss64.com/nt/findstr.html

Related Posts

2 Comments

  • Simakuutio #

    Still quite far away from that usability *nix shells have... but good to have at least something, better than nothing, right? :)
  • Rarst #

    @Simakuutio Nixes are about headless servers and stuff, so evolutionary they have tricked out command line. I think more fair Windows analogue would be Power Shell. Still even basic Windows command line can perform some highly useful stuff. We just tend to forget that and go with GUI out of habit. :)