Searching data on GIN

The GIN data search provides you with a set of tools to search the gin-index (gindex). Gindex analyses repositories for content types it knows (Text, XML, PDF, JSON, odML, NEV to name a few ) and that are not too big. From there, it builds a representation that is good for (full-text) searching.

There are several ways to search the gindex.

  • Match search: which tries to find the terms you provide (exactly) and returns them sorted by a score that depends on the number of matches and the size of the document the match is in. This is probably the best for most people.

  • Fuzzy term matching: which is a way to find matches to individual terms and things that look close to the term provided (eg. bla also matches blu).

  • Wildcard term matching: Matches individual terms with the possibility to provide wildcards. ? can be used to replace a single character, and * to replace zero or more characters (eg. bl* matches on blu, blo, and bl etc.).

  • Query String: This is the most powerful search and kind of a combination of the three above. You can provide a full query string using among other things, the wildcard introduced above, and also the fuzzy operator ~ which uses a Damerau-Levenshtein distance of 2 to find all terms that match (eg. "Sp?ke Sortung~" does match Spike Sorting and a lot of other things ). In query strings, you can also use the boolean operators AND OR and NOT. For example, "Spike NOT Sorting" would match documents that have Spike but not Spike Sorting". If you want you can even group those "(Spike NOT Sorting) AND train".

It's kind of obvious that wildcard and fuzzy searching can be computationally pretty costly. Therefore give gindex some time to retrieve the results.

Achilleas Koutsou edited this page 2 years ago