Brice Thomas dev, code, nlp and stuff.

Inverted index in Scala

Last week I had to build an inverted index to speed up (a lot) a program doing parallel document identification. And I was quite impressed by how simple it was in Scala!

Inverted index?

It’s simple, let’s say you have this index:

For each line you have a document identifier followed by the words appearing in the document. Here, word1 is in document0.txt, but not in document1.txt.

And you want to turn this index into:

For each word you have the documents in which it appears. Here, word0 appears in document0.txt and document1.txt.

This is called an inverted index. And it’s pretty useful.

Let it code

Don’t waste more time. Here is the code.
There is literally more comments than code ;)

All you have to do is: InvertedIndex("path-to-my-index.txt")

I love Scala <3