Friday, March 4, 2011

Purge Alfresco archived nodes

I was looking for a way to automatically purge the Alfresco trashcan and, after a while I think I came out to what looks like a decent solution.

DISCLAIMER: The procedure described in this article has not been tested intensively and comes without any implied warranty of fitness for a particular purpose. You should check the code, test it and decide yourself if fits your needs, saving all your data before any experiment.

The problem
After some time, deleting contents can fill the Alfresco's trashcan and removing nodes manually with the UI can be unpractical (users always forget about this). Alfresco does not actually delete content, but moves deleted nodes into the archive store, which is like a trashcan. Deleted contents can stay there forever, until users decide to clean-up the trashcan. In a big repository this could lead to a huge waste of resources.

I need a service I can invoke programmatically to empty the trashcan, for example by scheduling a task with an external job. I don't like to deploy into Alfresco a scheduled task controlled by the embedded Quartz, I think it's cleaner to move the scheduling outside and deploy into Alfresco always the bare minimum.

Even after the trashcan has been emptied, this just means nodes are only marked as "orphans", moved into alf_data/contentstore.deleted and can be phisically removed by a contentStoreCleaner asynchronous task. So there is a safety net in Alfresco to avoid at all costs accidental deletions.


Cleaning-up archived nodes
I have developed a simple Java-backed Web Script for Alfresco 3.4 (It should work with Alfresco 3.2+) which can be invoked to clean-up the archived nodes. Below its major components:

purge.get.desc.xml
Web Script descriptor

    Purge all
    Purge all archived nodes
    /purge
    user
    none

purge.get.html.ftl
Freemarker template

purge-context.xml
Spring bean's configuration

I created it/alfresco/utils folders under /Company Home/Data Dictionary/Web Scripts Extensions where I created both purge.get.desc.xml and purge.get.html.ftl.

The Spring context file purge-context.xml goes under /tomcat/shared/classes/alfresco/extension in the main alfresco installation folder.

Our bean makes use of nodeArchiveService.

Here's the Java Code:


The Purge project under Netbeans 6.9.1


The bean is injected with nodeArchiveService and calls method purgeAllArchivedNodes.

The single most important line of code is:
this.nodeArchiveService.purgeAllArchivedNodes(StoreRef.STORE_REF_WORKSPACE_SPACESSTORE);

We are passing the STORE_REF_WORKSPACE_SPACESSTORE constant, which is "the store that the items originally came from", as per JavaDocs:

purgeAllArchivedNodes

void purgeAllArchivedNodes(org.alfresco.service.cmr.repository.StoreRef originalStoreRef)
Permanently delete all archived nodes.
Parameters:
originalStoreRef - the store that the items originally came from
Calling the WebScript
After starting Alfresco, to get a list of available Web Scripts and check if this new one has been installed correctly, point the browser to http://localhost:8080/alfresco/service/index and then press link "Browse all Web Scripts". Remember to authenticate as admin, so that the Web Script can be ran with administrator privileges.

The "Purge" Web Script should be the first one

To invoke its execution and clean-up the trashcan you can call:
http://localhost:8080/alfresco/service/purge

If everything went fine you should see the following response page:
Alfresco Community Edition v3.4.0 (c 3335) :
Purged all archived nodes. Elapsed time: 438 ms.
Then verify all users' trashcans are now empty:


As we now have our RESTful purge Web Script in place, it's easy to call it from an external script, maybe scheduled via a cron job for a periodical clean-up. In alternative it's possible to use the Quartz engine embedded into Alfresco, but my personal preference is to avoid putting into Alfresco too many responsibilities: if you need to change the scheduling it's easier for maintenance to have an external scheduler.