Removing old revisions from IBM Connections

Today a question popped up in the IBM Connections on premises Community on whether it is possible to remove old revisions from IBM Connections. There’s no easy way to do this, but the various suggestions set me thinking. Does it make sense to remove old revisions? Not sure. It can be handy to look back at the history of a wiki article or a shared file. But if we leave that discussion for now, how should you do this and how much storage would you actually save?

To answer this question, we have to check the SQL database. Both the WIKIS and FILES databases have a table called MEDIA_REVISION, which contains the information of the several revisions per article. This query will give you an indication on whether you have a lot of revisions in your environment regarding files:

SELECT COUNT(ID) FROM FILES.FILES.MEDIA_REVISION WHERE IS_CURRENT_REVISION='0'

This counts the number of revisions which are no longer visible (not the current revision). In the environment where I ran the query (22K users. about 6 years old) that led to 71829 file revisions. We could also calculate how much space we’re talking about:

SELECT SUM(CAST(STORAGE_FILE_SIZE AS BIGINT)) AS TOTAL_STORAGE FROM FILES.FILES.MEDIA_REVISION WHERE IS_CURRENT_REVISION='0'

That returned a number of roughly 91GB of storage space. That number in itself doesn’t say much if we don’t know how much space in total is used. That’s about 1,2TB in this environment, so the 91GB is about 7,5% of the total space. Not that much, but also not that little that it’s irrelevant. Do we want to risk removing files that people are working on though? Let’s lower the risk by just taking files which haven’t been updated in the past 3 months:

SELECT SUM(CAST(STORAGE_FILE_SIZE AS BIGINT)) AS TOTAL_STORAGE FROM FILES.FILES.MEDIA_REVISION WHERE IS_CURRENT_REVISION='0' AND LAST_UPDATE < '2018-07-01'

That leaves 84,5GB of old revisions which will probably never be looked at.

We could do the same trick for the wiki articles:

SELECT COUNT(ID) FROM WIKIS.WIKIS.MEDIA_REVISION WHERE IS_CURRENT_REVISION='0' AND SYS_LAST_UPDATE < '2018-07-01'

This query gave over 212K of wiki revisions that are not the current version. That’s a lot, but if we look at the space that takes, it turns out to just 3,6GB of storage. That’s 0,3% of the total. I would say that’s far too little to put in any effort in recovering that space.

 

So how do we get to the files we want to target? For this we need the MEDIA_FILE_ID of the revision files. We can get this from the SQL database. However, the database will contain this ID in binary format, while we need to have this ID as a UUID. Luckily you can convert the one to the other. This command will give you a list of all UUIDs of the old revisions when you have MS SQL as your backend:

SELECT CONVERT(UNIQUEIDENTIFIER, CONVERT(BINARY(16),MEDIA_FILE_ID)) AS MEDIA_FILE_ID,MEDIA_LABEL FROM FILES.FILES.MEDIA_REVISION WHERE IS_CURRENT_REVISION='0' AND LAST_UPDATE < '2018-07-01'

I’ve added the MEDIA_LABEL here as it’s nice to have an idea of what file you’re looking at, but of course if you want to automate this a bit, you omit this column.

On DB2 this would look like:

SELECT VARCHAR_FORMAT_BIT(CAST(MEDIA_FILE_ID AS CHAR(16) FOR BIT DATA),'XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX') FROM FILES.MEDIA_REVISION WHERE IS_CURRENT_REVISION='0' AND LAST_UPDATE < '2018-07-01'

To get the actual path of this file, we need to run a wsadmin script:

FilesUtilService.getFileById("fileId", "isRendition", "checkExistence")

which would look something like:

wsadmin>FilesUtilService.getFileById("5765E347-97BB-2143-91A4-C34B08840705", "true", "false")
'C:\\IBM\\Connections\\data\\shared\\files\\upload\\files\\67\\123\\5765e347-97bb-2143-91a4-c34b08840705'

(if you’re still on Connections 5.5, the command is without FilesUtilService.getFileById(“fileId”) )

Now that you know the location, you can decide what to do. You could either replace the file by a small file telling the user that the file was removed by the administrator, or you can simply delete the file in which case you should also delete the row for the SQL database as described in this article. When you’ve made your decision, you can easily automate the above commands to execute this action on all files from your query, but I’ll leave that up to you. If you do write a script for this, sharing is caring… 🙂

Thanks to Jan Valdman, Andreas Ponte and Nico Meisenzahl for the inspiration and information for this article.

Resources:
Disabling file versioning
Finding the location of a stored file
Deleting user files from the system (for the sake of housecleaning, though not directly related to this article)