Hi Eric,
Thanks very much (and thanks to Han for the original response too!).
I’ll have to play around with variable match. I hadn’t noticed that new feature (that seems to be happening a lot lately! Many new features). Is it greedy? i.e. if I search across > N multiple annotations, can I make it stop searching for any subsequent $1 if something else comes up in the meantime (e.g. a ‘.’)?
The actual search I am trying to do is a little more complex. I just simplified it for the question, because I was primarily concerned about the functionality. I am actually looking for instances of what’s called `tail-head linkage’. So for instance in a narrative text, the end of one sentence starts the next e.g.:
"He ties the rope and tests it. Having tested it, he climbs down and…’
The words I am matching might be slightly different (e.g. ‘tests’ versus ‘tested’) but predictable in the language I am matching. And so I’m not just blind matching words at the end of sentences, and in fact I may be looking for portions of words with predictable mutations. And what gets repeated and where is a little variable too, so actually it could be a number of complex searches.
So your second trick doesn’t quite work for what I’m after either, but I can see how it works and might think of uses for it, thanks. Am I right in thinking that variable mode basically allows me equivalence/non-equivalence matching, and then I can fine tune this using the options in the drop down boxes? I can’t at any point also search for a string? The simplest version of the search that I actually want is that the last word from one sentence is the second last word from the next. Even better if I can actually specify that last word as it’s always the same. I can’t see how to do this using this trick.
As for the interface, thats a tricky one. It is already quite busy. And I’m no fan of right-clicking to turn features on and off. It would be a bit of a kludge, but what about changing the input fields to combo boxes with text input where the drop down menu gave you options to set the match type?
Being a regex user, I’m happy enough only using regexps! But I realise that it’s not for everyone. It would be better in my opinion if ‘variable’ matches were incorporated into regexps, but I admit it would make simple equivalence matching pretty complex for the average user.
I can see that carrying over regex matched variables to other fields would be a nightmare to program too. e.g. if there are capturing parentheses in multiple fields then where do you start numbering the matches? And could a match in one field apply in another while a match in that other field apply in the first? Hence my suggestion of treating a single tier as being separated by newline characters (or null characters?) - that way it’s an non-complex search domain and you wouldn’t need to extract portions of one match and insert them into others. Of course, it would be a problem if people added newlines to their annotations… and you wouldn’t be able to search for aligned annotations on multiple tiers (or it would be a pain to code matching up the annotations again).
But what about using regex named capturing groups? e.g. (?<NAME>X). That way the user could explicitly design the search with respect to matching between fields. And only named groups would be accessible across fields? I think it’s a java 7 feature only though. I think you’d probably have to restrict named variables to one field and make the match available in others, or it would be too complex to program, and at any rate it could be very very slow.