We have been chasing our tails with a server that has ruin really well for a number of years and suddenly started to crash over the last few weeks.
We scoured the logs, domlog, server stats etc.. and couldn’t find a pattern. It turns out it was a 4 year old bug that IBM choose not to fix. ( I have raised a PMR and sent an example )
Just like in classic notes a lookup can get too big – the 64k limit. Obviously it is good coding practice not to create such errors but why of why does such an error crash the server ?????
The original bug report was here => http://www-01.ibm.com/support/docview.wss?uid=swg1LO59373
There is a Stack Overflow article with some good solutions by Knut Herrmann here => http://stackoverflow.com/questions/23264890/how-to-avoid-the-64k-limit-when-retrieving-data-from-a-view-column
The bug is nasty in that the usual log entries are not written before the crash making it very difficult to track it down. The NSD logs didn’t seem to work with the Lotus LND tool, I need to check if this is still valid.
The problem does occur in 9.01 FP2 FP3 on Linux but not in XPinC 9.01

Update 2 : It appears to have been fixed in 9.01 FP3 under and I can confirm that it does not fail in 9.01 FP4

YSAI9CCBYGFixes Domino Server crash when executing notesView.getColumnValues method, if there are many documents in a database.

Update 1 : NSD logs did point to the problem

Looking at the console_acme.com_2015_06_22@15_13_49.log I can see the following line in the “Stack Backtrace” section. This gives a good hint to the problem although the individual XPage is not listed. There may be other debug parameters that would help with that.
[03093:00012-97270640] 22.06.2015 15:15:52   HTTP JVM: Java_lotus_domino_local_View_NgetColumnValues+0x205 (0x03BF8241 [liblsxbe.so+0xc7241])