Counting strings in a file: Ruby vs Windows Command shell

This is not the usual material that I put up, but I’d like to immortalize an event that demonstrated yet again the beauty of Ruby for basic file manipulation, especially in contrast to doing the same in a Windows command shell. Here goes:

“Nat, I need a script that displays a count of the number of instances of a string in a file. The output must be a number and nothing else.”
“No worries, that won’t take 2 seconds.”
“Stop right there – I don’t want any of your ruby nonsense – it must be a batch file.”
“Hmmm… Can the batch file call a ruby script?”
“No.”
“Err… ok… I’ll see what I can do.”

So off I went trawling google, stackoverflow, random blogs, and websites which can’t have seen hits since 1995. One hour, some frustration, and several cups of tea later, this is what I came up with:

findstr /C:"search string" "c:\my\file.txt" | find /C /V "nonsense"

And that, ladies and gentlemen, works! Let me explain what’s going on… The script uses 2 commands: findstr and find. findstr is used for finding strings in files, and find is also used for finding strings in files. It of course makes perfect sense to have two commands that do the same thing – the very definition of the word “intuitive”. In the above example, findstr returns lines from the file that contain the search string. These lines are piped to find which then displays the number of lines that don’t contain a particular string, in the above case: "nonsense". That will return a number. It’s the only way you can get find, findstr or a combination of the two to return a-number-and-only-a-number of the instances of a string in a file. I would love to see this improved – leave a comment if you know a better way to do it.

To demonstrate to myself why doing the above in DOS is crazy, I wrote the same line in ruby:

File.open("c:/my/file.txt").read.scan(/search string/).count

It doesn’t take much explanation: It opens a file, reads it, scans it for a search string and then returns the number of instances it found.

Now. Can we all start using the right tool for the right job please? I know it may involve a bit of learning, but that never hurt anyone. That is all.

Finding the balance between hacky and over-engineered UI test automation frameworks

There are very few real requirements for a UI test automation framework:

  1. It should provide accurate test results
  2. It should provide accurate test results every time you ask it for results
  3. It should make it easy to write tests
  4. It should require little maintenance – time should be spent writing tests and analyzing results, not on coding the framework
  5. It should be easy to tweak in order to deal with last minute changes to the app being tested.

Projects rarely have the time or patience to deal with “we can’t run the tests just now, we’ve got a framework issue”, so frameworks tend to get built alongside the tests; and unless you’re careful, frameworks written under these (quite common) conditions normally die in one of 2 ways:

  1. Due to time constraints, any changes that have to be made to the test framework tend to be band-aids/hacks. “oh,-didn’t-we-tell-you-about-[insert-new-feature-that-will-break-lots-of-tests]-oh-and-can-you-kick-off-a-run-in-5-minutes?-Just-make-it-work!”, etc. That’s just the nature of the job. But, as the many dead UI frameworks that litter IT shops will attest to, there’re only so many band-aids you can stick onto a framework before it collapses under it’s own weight. Eventually, a change comes along that can’t be fixed just by “adding another band-aid” – a big refactor is required to deal with the new feature which in turn causes other framework instability problems. Test runs become unreliable resulting in the framework being abandoned.
  2. The other way frameworks die is when the test automation team are given time and money and are told to come back with a test automation framework… they have lots of time, so they spend lots of it on making things super-abstract, modeling business entities, writing test parsers etc. The tests that are written using the framework are all ‘semantic’, but they can’t deal with those “oh,-didn’t-we-tell-you…” changes to the app being tested. The super-abstracted nature of the framework makes it difficult to “just make it work” – there’s no one place to stick the band-aid, it needs to be spread across the framework. Many files need updating, the beautiful (but ultimately useless) business model is broken, and major refactors are required to ‘fix’ the model. During this time the tests can’t run. The framework ends up on a shelf gathering dust.

Like most things, a middle ground needs to be found:

  • A framework should be flexible and simple enough to be able to deal with last minute changes in the application under test. But, small chunks of time should then be given to allow small refactors of the framework do deal with the change ‘properly’ so that the quick hack can be removed. This way, the framework stays lean and can deal with new changes on a whim.
  • The framework shouldn’t be over-engineered – simplicity is key. Abstraction for abstraction’s sake is an utter waste of time. Modeling business entities in classes usually isn’t required, and when it is, only small elements of the model are usually needed for testing purposes. Doubtless, often it makes sense to model fundamental things like users, but rarely have I needed to keep track of more than the username, password and a few other simple fields. Keep business model classes simple – that way they’ll deal with application changes without much work.

Hacks for hacks’ sake aren’t good. Abstractions for abstraction’s sake aren’t good either. Write what needs to be written, don’t write what doesn’t need to be written, keep things simple, and tidy up after yourself when things get hacky.

Test Case Interdependency

One of the most common ways of structuring a series of test cases is to make one test case dependent on the outcome of another. For example, Test Case ‘A’ verifies the functionality surrounding the ability to create an account. Test Case ‘B’ verifies functionality surrounding account deletion, but instead of stating that the required data is and account in a particular state, it states that the account generated by test case ‘A’ should be the one to test for deletion. The mistake cascades through the test cycle: in execution of the test suite, if test case ‘A’ fails then test case ‘B’ cannot be executed and so it is marked as ‘failed’.

This test case interdependency causes problems for automation. It’s also a bad thing to do in general. Why?

In the above example, when it comes down to it, test case ‘B’ is not dependent on test case ‘A’ at all. If ‘B’ is testing deletion, it should test deletion. Deletion is dependent on an account, not necessarily a specific one (i.e. the one generated by test case ‘A’). OK, the account to test deletion against may need to be in a specific state (e.g. not already marked for deletion, etc…) but that’s not the same as dictating a specific account number.

As well as being, er, “philosophically” wrong, interdependency of test cases leads to testers incorrectly failing tests. Marking test case ‘B’ as failed just because test case ‘A’ did produces incorrect data in the test report. Why? Marking a test as failed when it hasn’t been executed is wrong, no matter what the reason is. The tester executing test case ‘B’ should have picked one of the (possibly) large number of valid accounts to use instead of being limited to test case ‘A’s account. That way, the ‘delete’ functionality can be tested even if the ‘create’ functionality is broken.

How is this a problem for automation? Well, an automated test should be just that: an automatic version of a manual test. Hard-wiring data into automated tests is common (and sold as a ‘feature’ of many packages), but makes the tests very fragile. If the data doesn’t exist (due to other tests failing), some tests won’t be able to run even though there may be plenty of valid data to use!

An easy fix is to make a slight modification to your tests: change them to be dependent on data in a particular state rather than specific data. Subtle difference with a large impact on test case management and execution. You’ll still be testing the same functionality, but the tests are much less interdependent. You’ll be able to execute all your tests (not just a subset) and your automated tests will be much more reliable and maintainable.