Counting strings in a file: Ruby vs Windows Command shell

This is not the usual material that I put up, but I’d like to immortalize an event that demonstrated yet again the beauty of Ruby for basic file manipulation, especially in contrast to doing the same in a Windows command shell. Here goes:

“Nat, I need a script that displays a count of the number of instances of a string in a file. The output must be a number and nothing else.”
“No worries, that won’t take 2 seconds.”
“Stop right there – I don’t want any of your ruby nonsense – it must be a batch file.”
“Hmmm… Can the batch file call a ruby script?”
“No.”
“Err… ok… I’ll see what I can do.”

So off I went trawling google, stackoverflow, random blogs, and websites which can’t have seen hits since 1995. One hour, some frustration, and several cups of tea later, this is what I came up with:

findstr /C:"search string" "c:\my\file.txt" | find /C /V "nonsense"

And that, ladies and gentlemen, works! Let me explain what’s going on… The script uses 2 commands: findstr and find. findstr is used for finding strings in files, and find is also used for finding strings in files. It of course makes perfect sense to have two commands that do the same thing – the very definition of the word “intuitive”. In the above example, findstr returns lines from the file that contain the search string. These lines are piped to find which then displays the number of lines that don’t contain a particular string, in the above case: "nonsense". That will return a number. It’s the only way you can get find, findstr or a combination of the two to return a-number-and-only-a-number of the instances of a string in a file. I would love to see this improved – leave a comment if you know a better way to do it.

To demonstrate to myself why doing the above in DOS is crazy, I wrote the same line in ruby:

File.open("c:/my/file.txt").read.scan(/search string/).count

It doesn’t take much explanation: It opens a file, reads it, scans it for a search string and then returns the number of instances it found.

Now. Can we all start using the right tool for the right job please? I know it may involve a bit of learning, but that never hurt anyone. That is all.

3 thoughts on “Counting strings in a file: Ruby vs Windows Command shell

  1. Was using Powershell an acceptable solution? If so you could probably have had a similar looking script (to the Ruby one) using .net-iness. It would probably work out longer than the crazy batch file (and the ruby one too), but infinitely more readable (than the batch file), therefore more maintainable when the requirement changes.

  2. So, in the end I moved to PowerShell… which wasn’t as painful as using cmd but is quite clunky. A classic example: when you call ‘get-childitem’ on a directory you get null if there are no items, a file object if the dir contains only one file, and an array of files if the dir contains more than one file. I know that you can wrap the whole call in @(get-childitem…) to make it return an array regardless of how many files there are (or aren’t), but isn’t that what get-childitem should do in the first place? That sort of unintuitive and clunky thing left a bad taste.

    Ruby still wins. By a long way.

Leave a Reply

Your email address will not be published. Required fields are marked *

*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>