Hi All
I read a post of Hilary's somewhere that proposed changing hypens in reference codes to HYPHEN so the indexing could treat them as a phrase (as long as the search also did this replace). This would mean changing data and then having to put it back to hypens on output to any of our applications, which is a fairly serious amount of work to do.
I wondered if a workable approach could be to stop the word breakers from using a hyphen to break words in certain cases? I see that word breakers are classes which could be rewritten or overidden, so it might be possible to prevent breaking when the format x-x is found? It is only with single characters in front of or behind that the phrase search ("x-x") fails.
Here's an example of the type of refence we need to search for:
326 IAC 2-6.1-7 (this is reference to a law)
It is important that a single character change does what is expected and doesn't find the above reference. At the moment, the search looks for everything up to the hyphen and therefore returns many incorrect results. (Note that a Google search finds this without problems)
I am moving from a Google Search Appliance (gsa) that indexes these references correctly, so I don't want to lose functionality in this changeover. The gsa is 3 years old now and I don't want to re-licence it at the current price.