Friday, March 4, 2011

Purge Alfresco archived nodes

I was looking for a way to automatically purge the Alfresco trashcan and, after a while I think I came out to what looks like a decent solution.

DISCLAIMER: The procedure described in this article has not been tested intensively and comes without any implied warranty of fitness for a particular purpose. You should check the code, test it and decide yourself if fits your needs, saving all your data before any experiment.

The problem
After some time, deleting contents can fill the Alfresco's trashcan and removing nodes manually with the UI can be unpractical (users always forget about this). Alfresco does not actually delete content, but moves deleted nodes into the archive store, which is like a trashcan. Deleted contents can stay there forever, until users decide to clean-up the trashcan. In a big repository this could lead to a huge waste of resources.

I need a service I can invoke programmatically to empty the trashcan, for example by scheduling a task with an external job. I don't like to deploy into Alfresco a scheduled task controlled by the embedded Quartz, I think it's cleaner to move the scheduling outside and deploy into Alfresco always the bare minimum.

Even after the trashcan has been emptied, this just means nodes are only marked as "orphans", moved into alf_data/contentstore.deleted and can be phisically removed by a contentStoreCleaner asynchronous task. So there is a safety net in Alfresco to avoid at all costs accidental deletions.


Cleaning-up archived nodes
I have developed a simple Java-backed Web Script for Alfresco 3.4 (It should work with Alfresco 3.2+) which can be invoked to clean-up the archived nodes. Below its major components:

purge.get.desc.xml
Web Script descriptor

    Purge all
    Purge all archived nodes
    /purge
    user
    none

purge.get.html.ftl
Freemarker template

purge-context.xml
Spring bean's configuration

I created it/alfresco/utils folders under /Company Home/Data Dictionary/Web Scripts Extensions where I created both purge.get.desc.xml and purge.get.html.ftl.

The Spring context file purge-context.xml goes under /tomcat/shared/classes/alfresco/extension in the main alfresco installation folder.

Our bean makes use of nodeArchiveService.

Here's the Java Code:


The Purge project under Netbeans 6.9.1


The bean is injected with nodeArchiveService and calls method purgeAllArchivedNodes.

The single most important line of code is:
this.nodeArchiveService.purgeAllArchivedNodes(StoreRef.STORE_REF_WORKSPACE_SPACESSTORE);

We are passing the STORE_REF_WORKSPACE_SPACESSTORE constant, which is "the store that the items originally came from", as per JavaDocs:

purgeAllArchivedNodes

void purgeAllArchivedNodes(org.alfresco.service.cmr.repository.StoreRef originalStoreRef)
Permanently delete all archived nodes.
Parameters:
originalStoreRef - the store that the items originally came from
Calling the WebScript
After starting Alfresco, to get a list of available Web Scripts and check if this new one has been installed correctly, point the browser to http://localhost:8080/alfresco/service/index and then press link "Browse all Web Scripts". Remember to authenticate as admin, so that the Web Script can be ran with administrator privileges.

The "Purge" Web Script should be the first one

To invoke its execution and clean-up the trashcan you can call:
http://localhost:8080/alfresco/service/purge

If everything went fine you should see the following response page:
Alfresco Community Edition v3.4.0 (c 3335) :
Purged all archived nodes. Elapsed time: 438 ms.
Then verify all users' trashcans are now empty:


As we now have our RESTful purge Web Script in place, it's easy to call it from an external script, maybe scheduled via a cron job for a periodical clean-up. In alternative it's possible to use the Quartz engine embedded into Alfresco, but my personal preference is to avoid putting into Alfresco too many responsibilities: if you need to change the scheduling it's easier for maintenance to have an external scheduler.

Thursday, February 10, 2011

My Career Path

At least twice a year day I'm thinking about my career path, I know it's a masochist attitude I can't prevent. I have changed many companies in the last few years and I have spent a lot of time traveling and consulting abroad, somehow until complete exhaustion. This year I'm settling down a little and this can help thinking clearly and planning for the next move.

However, you don't have many options if you leave in a mid-technical environment like Italy, where my natural attitude for freelance consulting is not easily sustainable. On the other hand, taking an airplane each Monday morning and sleeping in hotels five days out of seven is no more in my top list of wet dreams... I realized that after suddenly awaking in the middle of the night without knowing in what city or even nation I was: if you did consulting for more than few weeks per year, you know what I mean.

At the end, even if one is in the middle of a transition, the most important thing is to know who you are and where you want to go in the next following years. I then was re-reading an old, beautiful article from Dan North when, in his blog, I have found the best possibile definition for what I want to go (back) next:
[...] In particular I found I had moved away from the things I really enjoyed – writing software that matters and building high-performing software teams – more towards big organisational change, which, while it arguably has a bigger impact on an organisation, isn’t really where I wanted to be. So my criteria for what to do next came down to: writing business-critical software in a small, high-performing team, in an organisation that trusts its people and encourages them to excel. Having a great relationship with the consumers of that software and having them closely engaged with its delivery would be a huge plus.
The above bold sentence should fly into my CV under the section "Career Objectives".

Sunday, January 16, 2011

It's time to apply consumer's design models to Enterprise systems

I'm an happy Apple Mac user since one year now. I have never used a Mac before one year ago, when I decided to make the quantum leap, also because most of my Alfresco colleagues run a Mac and I have been always curious to try it. Before my idea was that Macs were cool but too much of a closed platform, now I think Macs are just the best way to have my work done. I do not want to judge some Apple's very restrictive policies here, everybody has a different opinion (I think the idea of forcing Objective-C on the iPhone is moving millions of Java developers toward Google Android, for example).

I had my MacBook Pro stolen few months ago but I also have an older PC-compatible laptop as a backup: after two days using it I felt the urgency to run to the very first Apple store to buy a new Mac. Back home, I opened the box, switch on it and after a few seconds it was able to recognize there is a NAS with Time-Machine backups: it offered me to restore the last backup and, after few hours, it was like my previous Mac was never stolen, as every piece of software was in its place, even the same desktop wallpaper and icons.

I never have to run anti-virus software, I never have to run any disk defragmentation or registry maintenance. I do not waste time in maintenance of hardware drivers: Apple design both hardware and software to work together, so that the user experience is always the smoothest. I'm also an happy user of Ubuntu Linux, which in my opinion is by far the best desktop Linux distribution. Ubuntu also runs daily in my Mac through a VirtualBox image, when I have to run some Linux software for my work. Despite the fact Ubuntu software is (almost) as easy as a Mac to install, configure and run, the fact is that Canonical - the company behind Ubuntu - is a software firm only, so they have to run their beautiful O.S. on a multitude of different hardware devices and software device drivers. This is the same problem Windows has. Each computer comes with different internal devices (motherboard, CPU, hard disk, network cards, Wifi cards, graphic card, ....), so there always a combination of hardware devices and software drivers which could cause a compatibility issue. It is always a race for Canonical, Microsoft and the others to prove their software on a plethora of hardware and sometimes broken drivers. So, when we put the blame on Microsoft in cases when Windows can be occasionally unstable, actually we should blame the real source, which usually is some sub-component of our PC and quite always a buggy software driver, which is out of control of Microsoft and others.

Of course, the above was especially true in the early days of personal computing, when enthusiasts like me were used to assemble their PC by hand. Now, if you buy especially a laptop from, say, Dell, HP, IBM, etc... you can be pretty sure all components are pre-tested, so everything will (almost) work just fine. However, nobody designs and produces beautiful hardware and client software like Apple does so, despite all competitors' efforts, my position is that the overall Apple user experience is still by far superior, exactly because every single component, together with communication, is designed with integration and user's experience in mind. It is interaction design at its best.

This positive and viable integration, by contrast, is very hard to be experienced when moving from consumer computing to Enterprise systems nowadays. Let's be honest: distributed enterprise systems are a total mess of hardware and software combinations. The level of incompatibility that we could have in a single consumer devices is multiplied by a magnitude factor. The major problems with Enterprise systems implemented, for example, in JEE or .NET, are due to inter-application and external systems integration issues. The applicative layer usually runs into different application servers, which runs over different operating systems and usually must connect to different relational databases, external information systems (ERP, CRM, ECM, etc....) and messaging systems. There is an explosive combinatorial matrix of configurations in need for testing and QA. At the end each customer needs to run on the existing infrastructure with few variations, so most of the time spent maintaining enterprise applications is actually spent trying to force a software into a specific and unique infrastructure: pitfalls are everywhere. Support centers spend most of their time just running after specific software + hardware configuration issues. No wonder if today some crucial business processes are still running into very old but well HW + SW integrated mainframes.

The positive Apple's integration model should work also for enterprise-class solutions. In my mind Sun Microsystems was one of the companies going closer to applying the Apple's consumer integrated design model to the enterprise world. Sun designed both hardware and software (Solaris O.S., Java, etc were tuned to specific Sun's servers and components). Buying a Sun server was usually a positive integrated experience, running Oracle on Sun servers was a rock-solid decision. The main problem, in my humble opinion, is that Sun was not ready or not willing to go the extra mile: moving the software stack up to applicative layer and moving their business out of the commoditized server business. I mean, Sun's management perceived they had to, as the late open-source initiatives testifies, but the company has never been structured to make this vital step forward executing the process in the right way (McNealy almost acknowledges this), so most of the ideas remained on paper or stuck without execution until Oracle's acquisition was the only way to save the company.

I think strong infrastructural integration is now not enough for a vendor to exit from commoditization trends and for a customer to solve integration's madeness. Some Cloud Computing initiatives are a clear step in the direction of moving the stack up from Infrastructure as a Service to actual Software as a Service. At the end, moving in the cloud the infrastructure only is not going to solve most of the applicative-level problems of moder enterprise solutions: if, for example, I run my software into Amazon EC2, I can solve many infrastructural provisioning problems in a row, but at the applicative layer my J2EE solution is still running into different O.S and application servers which need to be integrated and tested for compatibility and performances.

I guess, even if existing platforms are maybe too young and still immature to move all solutions there, the right direction for better isolating applications and application development from infrastructural problems will be to provide a complete, uniform applicative platform where most reliability and scalability problems are solved by model's definition. I think initiatives like the Google App Engine, VMWare's Cloud Application Framework, Salesforce Platfom are good examples of what could bring Apple's consumer-side positive experience into the enterprise applications ecosystem soon. Companies should eventually show some perspective, intelligence and courage and start experimenting as soon as possible.

Update (20 Jan 2011): Amazon has just announced the new AWS Elastic Beanstalk, which I think should be included in the above list of advanced PaaS solution. It looks much like GAP and at first sight Amazon's new platform shows an interesting degree of flexibility for developers. First released version is for Java.

Thursday, December 16, 2010

Do you mean I should work for free?

Ok, so you want to drive a luxurious BMW, because it's charming, fast and reliable. But then you want to pay the same price of a Fiat Panda, asking to have same maintenance costs and fuel consumption. Are you kidding?

Not really, this is what happens daily in the software industry. Customers want the same level of services, regardless of the price. So they buy a cheaper, open-source solution from an emerging technology shop, but then they ask for a free PoC, like they are used to get from big vendors.

In terms of consulting, it's more or less the same (or even worst). You should have years of experience, able to manage a team and drive the development of a project from its foundations to the end. But your rates should be the same as those of a kid just out of college.

This policy has almost destroyed the freelance market in Italy, but I can see the symptoms in other countries.

Somebody once wrote: "if you pay peanuts you'll get monkeys". That's valid for both software products and developers.

The Vendor Client Relationship In Real World Situations [VIDEO]

Good luck.

Thursday, October 21, 2010

XA Transactions

Reading MySQL documentation I have found a good description of how XA distributed transactions and two-phases commit work. I like to share it because it's short, clear and applies in general situations. It also teaches us why distributed transaction, being much more complex, should be managed very carefully to avoid severe performance penalties.
Applications that use global transactions involve one or more Resource Managers and a Transaction Manager:

A Resource Manager (RM) provides access to transactional resources. A database server is one kind of resource manager. It must be possible to either commit or roll back transactions managed by the RM.

A Transaction Manager (TM) coordinates the transactions that are part of a global transaction. It communicates with the RMs that handle each of these transactions. The individual transactions within a global transaction are “branches” of the global transaction. Global transactions and their branches are identified by a naming scheme described later.

The MySQL implementation of XA MySQL enables a MySQL server to act as a Resource Manager that handles XA transactions within a global transaction. A client program that connects to the MySQL server acts as the Transaction Manager.

To carry out a global transaction, it is necessary to know which components are involved, and bring each component to a point when it can be committed or rolled back. Depending on what each component reports about its ability to succeed, they must all commit or roll back as an atomic group. That is, either all components must commit, or all components musts roll back. To manage a global transaction, it is necessary to take into account that any component or the connecting network might fail.

The process for executing a global transaction uses two-phase commit (2PC). This takes place after the actions performed by the branches of the global transaction have been executed.

In the first phase, all branches are prepared. That is, they are told by the TM to get ready to commit. Typically, this means each RM that manages a branch records the actions for the branch in stable storage. The branches indicate whether they are able to do this, and these results are used for the second phase.

In the second phase, the TM tells the RMs whether to commit or roll back. If all branches indicated when they were prepared that they will be able to commit, all branches are told to commit. If any branch indicated when it was prepared that it will not be able to commit, all branches are told to roll back.

In some cases, a global transaction might use one-phase commit (1PC). For example, when a Transaction Manager finds that a global transaction consists of only one transactional resource (that is, a single branch), that resource can be told to prepare and commit at the same time.

Sunday, October 3, 2010

Mounting Alfresco as a WebDAV Network Folder

I like Alfresco Share's beautiful UI and I like Alfresco's CIFS capability of being mounted as a remote SMB/CIFS network drive. Anyway, one of the few shortcomings of Share is that you cannot download multiple files from the Web UI (you can upload multiple files), something you can easily do by mounting Alfresco as a SMB/CIFS drive, dragging & dropping files in and out.

Note: I think one of the coolest features we should add to Share's UI would be the ability to download a selection of folders as a ZIP file in one shot.

CIFS is not enabled by default, so there are Alfresco deployments where you cannot mount it locally on your workstation, for security reasons or because of a lazy sys admin. However, to interact with Alfresco easily you always have an option: mount it as a WebDAV network folder, which is easy to do from any operating system and it's enabled by default in Alfresco (unless your sys admin disabled it on purpose).

On the Alfresco Wiki you can find instructions for mounting WebDAV on Windows. Here I want to quickly show you the same thing on  Mac OSX and Finder, which is even easier (of course, it's a Mac....).

First of all, Alfresco's WebDAV is available from address: http(s)://hostname:port/alfresco/webdav/
in my case I'm using a local Alfresco server, within my home network, so it is: http://myalfresco.it:8080/alfresco/webdav
Beware that in most cases Alfresco is configured to be accessible to the external world via HTTPS and not plain HTTP, so write your URL accordingly.

Now open the Finder and press ⌘K to open the server connection dialog, adding you Alfresco's WebDAV URL like this:


After pressing "connect" (sorry, my screenshots are in Italian....) and entering your own Alfresco's username and password, you'll get this:


Now you could go to your Alfresco's User Home, in my case it's mturatti, and start dragging and dropping file into Alfresco, or from Alfresco into your desktop.

Thursday, September 23, 2010

BPMN 2.0 process modeling on the iPad

Have a look at this blog post and video, Signavio BPMN modeler can be used from iPad as well. The modeler works with Activiti, as Signavio donated it to the open source project.