This is not the usual material that I put up, but I’d like to immortalize an event that demonstrated yet again the beauty of Ruby for basic file manipulation, especially in contrast to doing the same in a Windows command shell. Here goes:
“Nat, I need a script that displays a count of the number of instances of a string in a file. The output must be a number and nothing else.”
“No worries, that won’t take 2 seconds.”
“Stop right there – I don’t want any of your ruby nonsense – it must be a batch file.”
“Hmmm… Can the batch file call a ruby script?”
“No.”
“Err… ok… I’ll see what I can do.”
So off I went trawling google, stackoverflow, random blogs, and websites which can’t have seen hits since 1995. One hour, some frustration, and several cups of tea later, this is what I came up with:
findstr /C:"search string" "c:\my\file.txt" | find /C /V "nonsense"
And that, ladies and gentlemen, works! Let me explain what’s going on… The script uses 2 commands: findstr and find. findstr is used for finding strings in files, and find is also used for finding strings in files. It of course makes perfect sense to have two commands that do the same thing – the very definition of the word “intuitive”. In the above example, findstr returns lines from the file that contain the search string. These lines are piped to find which then displays the number of lines that don’t contain a particular string, in the above case: "nonsense". That will return a number. It’s the only way you can get find, findstr or a combination of the two to return a-number-and-only-a-number of the instances of a string in a file. I would love to see this improved – leave a comment if you know a better way to do it.
To demonstrate to myself why doing the above in DOS is crazy, I wrote the same line in ruby:
File.open("c:/my/file.txt").read.scan(/search string/).count
It doesn’t take much explanation: It opens a file, reads it, scans it for a search string and then returns the number of instances it found.
Now. Can we all start using the right tool for the right job please? I know it may involve a bit of learning, but that never hurt anyone. That is all.
Was using Powershell an acceptable solution? If so you could probably have had a similar looking script (to the Ruby one) using .net-iness. It would probably work out longer than the crazy batch file (and the ruby one too), but infinitely more readable (than the batch file), therefore more maintainable when the requirement changes.
So, in the end I moved to PowerShell… which wasn’t as painful as using cmd but is quite clunky. A classic example: when you call ‘get-childitem’ on a directory you get null if there are no items, a file object if the dir contains only one file, and an array of files if the dir contains more than one file. I know that you can wrap the whole call in @(get-childitem…) to make it return an array regardless of how many files there are (or aren’t), but isn’t that what get-childitem should do in the first place? That sort of unintuitive and clunky thing left a bad taste.
Ruby still wins. By a long way.
I install unixutils for windows on every machine as a matter of course, and then I have access to wc and grep.
Your code works fine if the text file contains strings to be searched in different lines. What if I want to get the number of occurances of a particular string, throughout my text file, irrespective of whether they are in same line or different ones ?
For example, if I want to get the number of times “abc” occurs in this file:
“I am abc doing abc going abc and am amused
abc on the something something of abc
blah blah abc blah abc”
Mohsin,
Are you talking about the DOS stuff, or the ruby one-liner? The ruby version will give you what you want; but about the DOS thing… well… the less I know about DOS, the happier I am
Thank you for posting this. Windows Command Shell is kind of sucky but if you find yourself in a situation where you have no other options it is good to have in a back pocket.
Hello everybody,
Nice post and site. But I am still wondering how to count a particular string (it has an special character that is “<") occurrence in a bunch of files with a Batch script. I have XML files, I want to count ocurrences, log this result and then make statistics nourrishing an excel file.
Greetings from France,
Ric