Engineering Village 2 - Minor Anomalies With Truncation
:: In a previous post, I waxed eloquent about the new changes to EV2, unveiled last month by Engineering Information Inc. One of the (very welcome) upgrades I noted was the ability to truncate to a single character, using the wildcard character, "?", i.e., search equation?, and EV2 would return results with equation or equations in the records.
Upon closer inspection, however, I discovered that the wildcard is used to replace a single character only, rather than allow for zero-to-one character replacement. From the EV2 site:
Use wildcard (?) to replace a single character.This morning, I was helping a chem eng student search for the phrase, osmotic virial equation, using Easy Search. The phrase returned 117 records in a combined Compendex and Inspec search. Assuming the "?" would return both equation and equations in the results, we searched the phrase, osmotic virial equation?, in Easy Search. The resulting set was 80 records, much to my surprise. We checked the 80 records, and discovered that each of them had the word "equations" somewhere in the record. However, the remaining 37 records did not, confirming that the "?" is always searching for one extra character, not zero or one extra character.
To add to this equation (no pun intended, I think!), EV2 has an autostemming feature that can be turned on or off - it defaults to on - which seems to mimic the truncation symbol, "*". EV2 describes the truncation function as:
Use truncation (*) to search for words that begin with the same letters.EV2 describes the autostemming function as:
comput* returns computer, computers, computerize, computerization
Terms are automatically stemmed, except in the author field, unless the "Autostemming off" feature is checked.I don't see a difference between the two functions, which I believe could cause some confusion for the user.
management returns manage, managed, manager, managers, managing, management
Using a different example, consider the word cat, which can cause all kinds of problems for the searcher. In a db where the "?" truncates zero or one character, a search on cat? would return cat or cats. Where the asterisk returns zero-to-unlimited characters, search cat*, and the results would include cat, cats, cathode, catalysis, catastrophe, catch, catalogue, catatonic, cattle, etc.
I searched cat on EV2 (Quick Search, Compendex/Inspec combined) in the following ways, with the following results:
- cat - Autostemming on: 166108 records found in Compendex & Inspec for: ((cat) WN All fields), 1969-2005
- cat - Autostemming off: 164303 records found in Compendex & Inspec for: ((cat) WN All fields), 1969-2005
- cat? - Autostemming on: 9893 records found in Compendex & Inspec for: ((cat?) WN All fields), 1969-2005
- cat? - Autostemming off: 9893 records found in Compendex & Inspec for: ((cat?) WN All fields), 1969-2005
- cat* - Autostemming on: 818875 records found in Compendex & Inspec for: ((cat?) WN All fields), 1969-2005
- cat* - Autostemming off: 818875 records found in Compendex & Inspec for: ((cat*) WN All fields), 1969-2005
Of note is that the use of the wildcard function as a single character replacement, rather than a zero-to-one character replacement, is not endemic to EV2. CSA – Cambridge Scientific Abstracts – uses it the same way, as does Web of Science. However, Web of Science allows for all three options:
The asterisk (*) represents zero to multiple characters.The SilverPlatter WebSPIRS platform uses "*" for zero-to-unlimited truncation, and the "?" for zero-to-one character truncation. The OVID platform also allows for the three options, but with a different character set (dollar sign, question mark, hash mark.)
The question mark stands for one character. The dollar sign stands for one character or no characters.
Comments: Truncation and wildcard functionality are important options for searchers. In my experience though, most students and researchers seldom use truncation, because generally they aren't thinking of plurals or variant spellings of words, or are not aware the option exists in the database they are searching. As such, I'd like to see a simplification of truncation/wildcard functionality in EV2, and by extention, in most if not all databases. (I know, that is truly wishful thinking!)
Options to consider for EV2:
- Allow the wildcard symbol, “?”, to work as a zero-to-one character function, or introduce a third symbol to do this, if it is considered important to retain single-character truncation;
- Reconsider the autostemming function. How valuable is it to the user if the user does not know it is working, or does not know what its function is from the outset of the search? I don't believe the average user twigs to this option, even if it is already on;
- Eliminate left-side, or prefix truncation. It would never occur to me that $catal would return catalyis, catalyses, catalytic, etc.
- Allow for the use of the same truncation/wildcard functions across all three EV2 search options, Easy, Quick and Expert.
Despite the foregoing observations, I very much like the new changes to EV2, especially the faceted searching, which will expand to Quick Search and Expert Search sometime in the near future. I demonstrated faceted searching yesterday afternoon to 70+ graduates and faculty in Chemical and Materials Engineering on campus, and they were suitably impressed. I have more suggestions for improvements to the search function on EV2, but that can wait for another post sometime soon.