Inverted index in Scala
07 Dec 2015Last week I had to build an inverted index to speed up (a lot) a program doing parallel document identification. And I was quite impressed by how simple it was in Scala!
Inverted index?
It’s simple, let’s say you have this index:
For each line you have a document identifier followed by the words appearing in the document. Here, word1
is in document0.txt
, but not in document1.txt
.
And you want to turn this index into:
For each word you have the documents in which it appears. Here, word0
appears in document0.txt
and document1.txt
.
This is called an inverted index. And it’s pretty useful.
Let it code
Don’t waste more time. Here is the code.
There is literally more comments than code ;)
All you have to do is: InvertedIndex("path-to-my-index.txt")
I love Scala <3