Overflowing OAuth token cache

I had a weird error last week. My main client is an international company with offices in multiple countries. The Czech office was complaining that some of their users were getting empty widgets where text and images were supposed to be. On trying to edit the empty blocks an error would appear: The page could not be created due to an error

Neither me nor any of my colleagues were able to reproduce the error. My contact user however was able to reproduce it on my machine when logging in to Connections over a remote connection with me watching the logs. This showed that her error showed up in 2 logs:

The log of the Common application:

[11/07/19 13:50:58:627 CEST] 000002d6 OAuth20Endpoi E security.oauth20.token.limit.error
[11/07/19 13:50:58:627 CEST] 000002d6 webapp E com.ibm.ws.webcontainer.webapp.WebApp logServletError SRVE0293E: [Servlet Error]-[OAuth20EndpointServlet]: com.ibm.ws.webcontainer.webapp.WebAppErrorReport: SRVE0295E: Error reported: 500

and the log of the RichTextEditors application:

[11/07/19 13:50:58:642 CEST] 000001a1 LocalTranCoor E WLTC0017E: Resources rolled back due to setRollbackOnly() being called.
[11/07/19 13:50:58:642 CEST] 000001a1 webapp E com.ibm.ws.webcontainer.webapp.WebApp logServletError SRVE0293E: [Servlet Error]-[rteServlet]: org.springframework.web.client.HttpServerErrorException: 500 Internal Server Error

Also the browser of the user would show the 500 internal server errors when looking in the debugger. So I knew now that this was a Connections problem and not browser/OS/whatever related. I was not much closer to solving it however. This is where the Connections-on-prem community showed it immeasurable value again as one user had had the same problem before. Therefore huge thanks to Martin Schmidt for helping me to solve this one!

It turns out that the clou of the solution was in this little piece: security.oauth20.token.limit.error. The users were running into a limit. The OAuth token limit is set in the <DMGR profile path>/config/cells/<yourCell>/oauth20/connectionsProvider.xml file:

<!-- optional limit for the number of tokens a user/client/provider combination can be issued -->
<parameter name="oauth20.token.userClientTokenLimit" type="ws" customizable="true">
<value>250</value>
</parameter>

The default value is ‘250’. The recommended value is ‘500’. You can see if your users were indeed running into this limit by running the following SQL query (query based on MS SQL server, but pretty similar in DB2 or Oracle):

select count(hc.LOOKUPKEY) as count, hc.USERNAME, emp.PROF_MAIL from HOMEPAGE.HOMEPAGE.OH2P_CACHE As hc 
LEFT JOIN PEOPLEDB.EMPINST.EMPLOYEE emp on hc.USERNAME=emp.PROF_GUID
group by hc.USERNAME, emp.PROF_MAIL order by count desc;

It turned out that I had 35 users who had crossed the limit. Upping the token limit would need a restart of the WebSphereOauth20SP application. As I couldn’t do that during business hours, I instead opted for removing the cache for the affected users:

DELETE FROM HOMEPAGE.HOMEPAGE.OH2P_CACHE WHERE USERNAME='<affected username>'

This did not have any negative side effect. These actions indeed solved the problem for my Czech users. Now the remaining question is, why did this problem specifically target my Czech population? The answer to this question is, I think, in the relatively large amount of Rich Text widgets (3) the Czech communication department was using on their prime community which every Czech user would open multiple times a day. This is however just a hunch. I did not investigate it further to try and prove that hunch.