Currently, highlighting is always switched on : This results in all terms being stored in lucene. We never use highlighting. Turning it off results in that similarity searches do not work any more.
1) default highlighting set to false
2) make sure similarity works with highlighting
Further impr: Never store pdf fields!