Overflowing OAuth token cache
I had a weird error last week. My main client is an international company with offices in multiple countries. The Czech office was complaining that some of their users were getting empty widgets where text and images were supposed to be. On trying to edit the empty blocks an error would appear: The page could not be created due to an error
Neither me nor any of my colleagues were able to reproduce the error. My contact user however was able to reproduce it on my machine when logging in to Connections over a remote connection with me watching the logs. This showed that her error showed up in 2 logs:
The log of the Common application:
[11/07/19 13:50:58:627 CEST] 000002d6 OAuth20Endpoi E security.oauth20.token.limit.error [11/07/19 13:50:58:627 CEST] 000002d6 webapp E com.ibm.ws.webcontainer.webapp.WebApp logServletError SRVE0293E: [Servlet Error]-[OAuth20EndpointServlet]: com.ibm.ws.webcontainer.webapp.WebAppErrorReport: SRVE0295E: Error reported: 500
and the log of the RichTextEditors application:
[11/07/19 13:50:58:642 CEST] 000001a1 LocalTranCoor E WLTC0017E: Resources rolled back due to setRollbackOnly() being called. [11/07/19 13:50:58:642 CEST] 000001a1 webapp E com.ibm.ws.webcontainer.webapp.WebApp logServletError SRVE0293E: [Servlet Error]-[rteServlet]: org.springframework.web.client.HttpServerErrorException: 500 Internal Server Error
Also the browser of the user would show the 500 internal server errors when looking in the debugger. So I knew now that this was a Connections problem and not browser/OS/whatever related. I was not much closer to solving it however. This is where the Connections-on-prem community showed it immeasurable value again as one user had had the same problem before. Therefore huge thanks to Martin Schmidt for helping me to solve this one!
It turns out that the clou of the solution was in this little piece: security.oauth20.token.limit.error. The users were running into a limit. The OAuth token limit is set in the <DMGR profile path>/config/cells/<yourCell>/oauth20/connectionsProvider.xml file:
<!-- optional limit for the number of tokens a user/client/provider combination can be issued -->
<parameter name="oauth20.token.userClientTokenLimit" type="ws" customizable="true">
<value>250</value>
</parameter>
The default value is ‘250’. The recommended value is ‘500’. You can see if your users were indeed running into this limit by running the following SQL query (query based on MS SQL server, but pretty similar in DB2 or Oracle):
select count(hc.LOOKUPKEY) as count, hc.USERNAME, emp.PROF_MAIL from HOMEPAGE.HOMEPAGE.OH2P_CACHE As hc LEFT JOIN PEOPLEDB.EMPINST.EMPLOYEE emp on hc.USERNAME=emp.PROF_GUID group by hc.USERNAME, emp.PROF_MAIL order by count desc;
It turned out that I had 35 users who had crossed the limit. Upping the token limit would need a restart of the WebSphereOauth20SP application. As I couldn’t do that during business hours, I instead opted for removing the cache for the affected users:
DELETE FROM HOMEPAGE.HOMEPAGE.OH2P_CACHE WHERE USERNAME='<affected username>'
This did not have any negative side effect. These actions indeed solved the problem for my Czech users. Now the remaining question is, why did this problem specifically target my Czech population? The answer to this question is, I think, in the relatively large amount of Rich Text widgets (3) the Czech communication department was using on their prime community which every Czech user would open multiple times a day. This is however just a hunch. I did not investigate it further to try and prove that hunch.
Well I’m going to have to thank you and Martin Schmidt. Because I’ve been trying to solve this is exact issue off and on for a bit. Then I stumbled across your blog.
I have only some users affected… and I have not been able to understand why. I knew it was something to do with the RTE and Common as well because I tailed every log while attempting to load the community. I also knew it was on the server side because if the user went to the URL the widget was configured to use in another tab, it pulled up just fine. So it was something in the processing of this on the back end. The error was just not descriptive enough for me to figure it out. My limit was set at 350 and I found 350 entries for the affected user. My question is how sure are you that is safe to delete these cache entries with using SQL in the manner you did? Did you later find there were any adverse affects from doing this?
Thank you for your reaction Bryan. I didn’t get any incidents from those users afterwards and they would have contacted me if they would have had any problems after. Based on that, I think it’s safe to delete those entries straight from the SQL tables.
Thank you.
We are getting similar error, however it is not for limited users but communities, wherever it is having Rich Text.
There was no issue before, it suddenly popped up today, can someone help here.
Do you mean you get the “the page could not be created” error in all rich text widgets in communities? I have not seen that problem. Best to try in the Connections forum or open a case with HCL