Scheduled service maintenance on November 22


On Friday, November 22, 2024, between 06:00 CET and 18:00 CET, GIN services will undergo planned maintenance. Extended service interruptions should be expected. We will try to keep downtimes to a minimum, but recommend that users avoid critical tasks, large data uploads, or DOI requests during this time.

We apologize for any inconvenience.

lsh.m 3.2 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687
  1. function T = lsh(type,l,k,d,x,varargin)
  2. % T = LSH(TYPE,L,K,D,...)
  3. %
  4. % Initializes the LSH dat structure and returns it in T
  5. % TYPE defines the LSH scheme to use: 'lsh' / 'e2lsh'
  6. % D is the dimension of the data
  7. % L,K - LSH params (# of tables and length of keys)
  8. %
  9. % NOTE: LSH will only index the data - not store it! You need to keep
  10. % around the original data, in order to go back from indices to actual
  11. % points, if that's what you want to do.
  12. %
  13. % Optional args (pairs of name/value):
  14. % 'B' - the maximum bucket capacity. If given (default is infinite
  15. % capacity), no bucket is allowed to contain more than B
  16. % elements. When indexing data, once B elements are inserted in a
  17. % bucket, any further elements that would be added to it are simply discarded.
  18. % 'range' - range description (see below)
  19. % 'W' - parameter of e2lsh scheme (see below)
  20. % 'verb' - verbosity level (can be changed later)
  21. % 'data' - if given, the columns of data will be inserted in the table.
  22. % 'ind' - indices of the examples in data (only relevant if 'data'
  23. % argument is given). I.e., if the 'data' argument has 5 columns, and
  24. % 'ind' is [1,2,8,10,55], then the index entered for the 4-th column
  25. % of data will be 10, for the 5-th it will be 55, etc.
  26. %
  27. % Output:
  28. % a struct array T, with L tables, each with the following fields:
  29. % type: string (currently 'lsh' or 'e2lsh'
  30. % Args: arguments for hash function
  31. % I: struct containing hash functions (see lshfunc.m)
  32. % B: integer limit on maximum bucket capacity (could be inf)
  33. % count: # of distinct elements indexed by the table
  34. % buckets: matrix with hash keys (row per occupied bucket)
  35. % Index: for each bucket i, Index{i} contains indices of data occupying
  36. % the bucket
  37. % verbose: verbosity value
  38. % bhash: secondary hash table used for quick access to buckets (see lshhash.m)
  39. %
  40. %
  41. % The hashing works as follows for the two schemes (the description
  42. % refers to a single hash table; each of the L tables is created
  43. % independently)
  44. %
  45. % **** 'LSH':
  46. % older algorithm, described in Gionis et al. paper
  47. % http://theory.csail.mit.edu/~indyk/vldb99.ps
  48. % For k=1,...,K, a single dimension of the data is chosen uniformly at random,
  49. % and a single threshold value is drawn uniformly over the data range in
  50. % that dimension.
  51. %
  52. % **** 'E2LSH':
  53. % more recent algorithm, described in
  54. % http://theory.lcs.mit.edu/~indyk/nips-nn.ps
  55. % For each k, a random line is drawn (with independent, normally
  56. % distributed coefficients); data are projected to the line, and shifted
  57. % by a value drawn between 0 and W. Then, the range is divided into
  58. % "cells" of width W; the hash value is determined by the cell into which
  59. % a projected+shifted value falls.
  60. %
  61. % (C) Greg Shakhnarovich, TTI-Chicago (2008)
  62. b=inf;
  63. range=[];
  64. verb=0;
  65. ind=1:size(x,2);
  66. for a=1:2:length(varargin)
  67. eval(sprintf('%s = varargin{a+1};',lower(varargin{a})));
  68. end
  69. % make sure the range in a convenient, 2 x d format
  70. range = processRange(d,range);
  71. % create the LSH functions (functions that compute L K-bit hash keys)
  72. Is = lshfunc(type,l,k,d,varargin{:});
  73. % index the data in X using these LSH functions
  74. T = lshprep(type,Is,b);
  75. if (~isempty(x))
  76. T = lshins(T,x,ind);
  77. end
  78. for i=1:length(T)
  79. T(i).verbose=verb;
  80. end