Lucene and null values

Mar 21, 2012 at 3:49 PM

I'm curious if anyone has had trouble searching for non-existent values using the Lucene.Net implementation in Simol?

I have a lucene query that might look something like this:

+Segments:test +IsContact:true +(Address:"200 sierra vista" City:"200 sierra vista" Region:"200 sierra vista") -ParentId:[* TO *]

I was under the impression that -ParentId:[* TO *] should exclude any documents that don't have a ParentId (we use a parentId to hide related contacts in this case, so we want to exclude them from search results).

I can fire up Luke (http://www.getopt.org/luke/) and get the expected results, but running the same query from Simol yields results that have a value in ParentId.

I did not that when I switch from the  KeywordAnalyzer (in Luke) to the StandardAnalyzer I get the same results as our Simol implementation. However, if I switch the query parser to KeywordAnalyzer in Simol (using a custom SearchIndexer implmentation), I still get the results with the ParentId.

I'm probably doing something stupid, but before I spend hours on it, I thought I would pose the question.

Thanks a ton for such a great project and I know (Ashley) you are headlong into DynamoDB, I'm eager to hear how it's going.

Best regards,

 

Hal

Coordinator
Mar 22, 2012 at 1:59 PM

Hi Hal,

I'm not much of a Lucene expert (learned just enough to do a basic Simol/SimpleDB integration), but here are a couple of thoughts:

  • Are you using the latest Lucene release? The version bundled with Simol is getting a bit old.
  • Have you logged the search/indexing calls to be sure your queries or indexed text aren't getting munged up in transit?
  • Is it possible there are functional differences (bugs?) between the .NET version of Lucene and the Java version used by Luke?

I'm glad to hear you are getting some use out of the full-text indexing. That's one area of Simol I haven't gotten a lot of feedback on and wasn't sure how many folks actually found it useful.

- Ashley