Page 1 of 1

Boosting Search Terms

PostPosted: Thu Dec 30, 2010 8:59 pm
by johnjhufnagle
I have a search query and am calling SearchPrx.byFullText().
I get an error when I try to use the '^' for boosting a term.
I have 2 index fields:

cil.image_description
ncbi_id

and am using them both in the query with values for each. I would like to boost the runtime search importance of the ncbi_id since that is more specific than just the cil.image_description text description.

The query I pass into byFullText is:

( (cil.image_description:Nephrotoma) || (ncbi_id:NCBITaxon\:46210^) )

so the value I'm searching for for the "cil.image_description" field is "Nephrotoma"
and the value i'm searching for for the "ncbi_id" field is "NCBITaxon:46210" (I escape out the ':' in that value string)
Then I add on the '^' to boost the ncbi_id term. If I submit the query without the boosting caret '^' it works fine.
I've tried putting quotes around that NCBITaxon value and lots of different permutations but they all produce a parse exception:

aused by: omero.ApiUsageException
serverStackTrace = "ome.conditions.ApiUsageException: ((cil.image_description:Nephrotoma)||(ncbi_id:NCBITaxon\:46210^)) caused a parse exception.
at ome.services.search.FullText.<init>(FullText.java:93)
at ome.services.SearchBean.byFullText(SearchBean.java:156)
at sun.reflect.GeneratedMethodAccessor995.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.springframework.aop.support.AopUtils.invokeJoinpointUsingReflection(AopUtils.java:310)
at org.springframework.aop.framework.ReflectiveMethodInvocation.invokeJoinpoint(ReflectiveMethodInvocation.java:182)
at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:149)
at ome.services.util.ServiceHandler.invoke(ServiceHandler.java:106)
at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:171)
at org.springframework.aop.framework.JdkDynamicAopProxy.invoke(JdkDynamicAopProxy.java:204)
at $Proxy69.byFullText(Unknown Source)
at sun.reflect.GeneratedMethodAccessor995.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.springframework.aop.support.AopUtils.invokeJoinpointUsingReflection(AopUtils.java:310)
at org.springframework.aop.framework.ReflectiveMethodInvocation.invokeJoinpoint(ReflectiveMethodInvocation.java:182)
at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:149)
at ome.security.basic.BasicSecurityWiring.invoke(BasicSecurityWiring.java:79)
at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:171)
at ome.services.blitz.fire.AopContextInitializer.invoke(AopContextInitializer.java:35)
at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:171)
at org.springframework.aop.framework.JdkDynamicAopProxy.invoke(JdkDynamicAopProxy.java:204)
at $Proxy69.byFullText(Unknown Source)
at sun.reflect.GeneratedMethodAccessor1012.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at ome.services.blitz.util.IceMethodInvoker.invoke(IceMethodInvoker.java:179)
at ome.services.throttling.Callback.run(Callback.java:55)
at ome.services.throttling.InThreadThrottlingStrategy.callInvokerOnRawArgs(InThreadThrottlingStrategy.java:37)
at ome.services.blitz.impl.AbstractAmdServant.callInvokerOnRawArgs(AbstractAmdServant.java:126)
at ome.services.blitz.impl.SearchI.byFullText_async(SearchI.java:129)
at omero.api._SearchTie.byFullText_async(_SearchTie.java:106)
at omero.api._SearchDisp.___byFullText(_SearchDisp.java:1115)
at omero.api._SearchDisp.__dispatch(_SearchDisp.java:1487)
at IceInternal.Incoming.invoke(Incoming.java:159)
at Ice.ConnectionI.invokeAll(ConnectionI.java:2037)
at Ice.ConnectionI.message(ConnectionI.java:972)
at IceInternal.ThreadPool.run(ThreadPool.java:577)
at IceInternal.ThreadPool.access$100(ThreadPool.java:12)
at IceInternal.ThreadPool$EventHandlerThread.run(ThreadPool.java:971)
"
serverExceptionClass = "ome.conditions.ApiUsageException"
message = "((cil.image_description:Nephrotoma)||(ncbi_id:NCBITaxon\:46210^)) caused a parse exception."
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
at java.lang.Class.newInstance0(Class.java:355)
at java.lang.Class.newInstance(Class.java:308)
at IceInternal.BasicStream$DynamicUserExceptionFactory.createAndThrow(BasicStream.java:2243)
at IceInternal.BasicStream.throwException(BasicStream.java:1632)
at IceInternal.Outgoing.throwUserException(Outgoing.java:442)
at omero.api._SearchDelM.byFullText(_SearchDelM.java:307)
at omero.api.SearchPrxHelper.byFullText(SearchPrxHelper.java:461)
at omero.api.SearchPrxHelper.byFullText(SearchPrxHelper.java:433)
at edu.mbl.omeroSearch.ImageUtil.search(ImageUtil.java:362)
... 29 more

Re: Boosting Search Terms

PostPosted: Fri Dec 31, 2010 5:29 pm
by jmoore
John,

I think the problem is that you need a boost factor after the query. From http://lucene.apache.org/java/2_3_2/queryparsersyntax.html#Boosting%20a%20Term:

To boost a term use the caret, "^", symbol with a boost factor (a number) at the end of the term you are searching. The higher the boost factor, the more relevant the term will be.


Looking in the server logs, you should see something like:
Code: Select all
Cannot parse '( (cil.image_description:Nephrotoma) || (ncbi_id:NCBITaxon\:46210^) )': Lexical error at line 1, column 67.  Encountered: ")" (41), after : ""

for the query as you wrote it.

Adding an integer after the caret:
( (cil.image_description:Nephrotoma) || (ncbi_id:NCBITaxon\:46210^1) )

seems to work fine and returns 2 entries on my copy of the ASCB FullText directory.

As a tip, you might like to look at Luke for testing queries. I copied OMERO/FullText and then pointed Luke at it:
Code: Select all
java -jar lukeall-0.9.9.jar -index /tmp/FullText

Under the "Search" tab, I can click on "Explain structure" and get a break down of the query.

Hope that helps & a happy new year.
~Josh

Re: Boosting Search Terms

PostPosted: Tue Jan 04, 2011 3:58 pm
by johnjhufnagle
Thanks Josh!
Worked like a charm...sorry I didn't read that more carefully.

I have run into a different problem though.

Using Luke if I run the query:

cellular_component_id:GO\:n1

I get back 13 image documents

if I run

cellular_component_id:GO\:n2 (a different id)

I get back 2 unrelated image documents

Both work as expected.

If I then combine those two queries as an OR

I get the set of 15 with the 2 from the n2 result coming in first.
If I do a boost to force the 13 to come in first by doing ^4 to the n1 term and ^1 to the n2 term then that works fine...in Luke only.
When I run the same final boosted query against OMERO the boosting doesn't seem to matter.
I still get the default ordering rather than the boosted ordering.
I looked in the Blitz log and saw my query with the boosting carets but unfortunately it doesn't show the ordering of the results.

THanks
John

Re: Boosting Search Terms

PostPosted: Tue Jan 04, 2011 7:48 pm
by jmoore
Hi John,

appears you've now found a true-blue bug: http://trac.openmicroscopy.org.uk/omero/ticket/3721

At some point, changes to the search logic began re-ordering results. I have a fix which I'll commit to trunk momentarily, and we'll see what we can do about back-porting it.

As always, thanks for the heads up.
~Josh.