Thursday, November 24, 2011

Starting with CMIS and Maven

This post aims to be an short how-to for setting up a CMIS development environment based on Maven and Apache Chemistry, specifically the OpenCMIS Java API, part of the Chemistry project.

I won't cover Maven installation and configuration here, so I assume you have Maven 2 or 3 up and running. With Maven you'll be independent from any specific IDE, so that you can manage your development cycle from the command line only.

Glossary
  • CMIS (Content Management Interoperability Services) =>"is a specification for improving interoperability between Enterprise Content Management systems. OASIS, a web standards consortium, approved CMIS as an OASIS Specification on May 1, 2010. CMIS provides a common data model covering typed files, folders with generic properties that can be set or read. In addition there may be an access control system, and a checkout and version control facility, and the ability to define generic relations. There is a set of generic services for modifying and querying the data model, and several protocol bindings for these services, including SOAP and Representational State Transfer (REST), using the Atom convention. The model is based on common architectures of document management systems."
  • Apache Chemistry => "Apache Chemistry provides open source implementations of the Content Management Interoperability Services (CMIS) specification.
  • OpenCMIS => "Apache Chemistry OpenCMIS is a collection of Java libraries, frameworks and tools around the CMIS specification. The goal of OpenCMIS is to make CMIS simple for Java client and server developers. It hides the binding details and provides APIs and SPIs on different abstraction levels. It also includes test tools for content repository developers and client application developers."
  • Apache Maven => "Apache Maven is a software project management and comprehension tool. Based on the concept of a project object model (POM), Maven can manage a project's build, reporting and documentation from a central piece of information."
Ingredients
  1. A simple text editor or any decent Java IDE 
  2. Maven 2 or 3
  3. A CMIS server for real-world testing
In my case I'm using IntelliJ IDEA, which is excellent. I'm an old Netbeans guy and both IDEs offer superior Maven integration, but it happens that I'm just having a look at IntelliJ these days.

To cover point # 3 I have selected the reference CMIS server implementation so far, which is Alfresco.  OpenCMIS offers a basic CMIS server implementation for your self-contained unit tests, but for end-to-end integration testing I prefer to link to a real ECM system.

You can download the latest Alfresco Community Edition for free from here. At present the brand new 4.0 is available.

Setup

Note: I won't cover Alfresco's installation and configuration here because it's not in the scope of this post. You can already find plenty of excellent online resources for that.

Just open a shell, place into a folder and run the following Maven command, to create a very basic Java project through the quickstart archetype:

mvn archetype:generate -DgroupId=com.myapps \
                       -DartifactId=my-first-cmis \
                       -Dversion=1.0-SNAPSHOT \
                       -DarchetypeArtifactId=maven-archetype-quickstart \
                       -DinteractiveMode=false

You'll end having the following usual project structure:

project
|-- pom.xml
`-- src
    |-- main
    |   `-- java
    |       `-- App.java
    `-- test
        `-- java
            `-- AppTest.java

The pom.xml file is the center of the Maven's universe. We need to edit it for adding a few lines of XML so that we can build with OpenCMIS libraries.
Here is the default pom.xml created by the archetype:

<project xmlns="http://maven.apache.org/POM/4.0.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd">
  <modelVersion>4.0.0</modelVersion>
  <groupId>com.myapps</groupId>
  <artifactId>my-first-cmis</artifactId>
  <packaging>jar</packaging>
  <version>1.0-SNAPSHOT</version>
  <name>my-first-cmis</name>
  <url>http://maven.apache.org</url>
  <dependencies>
    <dependency>
      <groupId>junit</groupId>
      <artifactId>junit</artifactId>
      <version>3.8.1</version>
      <scope>test</scope>
    </dependency>
  </dependencies>
</project>

Now we need to put the following XML snippet into pom.xml to activate the OpenCMIS libraries:

<dependency>
   <groupId>org.apache.chemistry.opencmis</groupId>
   <artifactId>chemistry-opencmis-client-impl</artifactId>
   <version>0.6.0</version>
</dependency>

At present the latest stable OpenCMIS release is 0.6.0, you can modify the pom.xml file accordingly whenever a new version is released.

This is the final POM file:
<project xmlns="http://maven.apache.org/POM/4.0.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd">
  <modelVersion>4.0.0</modelVersion>
  <groupId>com.myapps</groupId>
  <artifactId>my-first-cmis</artifactId>
  <packaging>jar</packaging>
  <version>1.0-SNAPSHOT</version>
  <name>my-first-cmis</name>
  <url>http://maven.apache.org</url>
  <dependencies>
    <dependency>
      <groupId>org.apache.chemistry.opencmis</groupId>
      <artifactId>chemistry-opencmis-client-impl</artifactId>
      <version>0.6.0</version>
    </dependency>
    <dependency>
      <groupId>junit</groupId>
      <artifactId>junit</artifactId>
      <version>3.8.1</version>
      <scope>test</scope>
    </dependency>
  </dependencies>
</project>

Now that your development environment is ready and you can build both from the shell and the IDE, you can start exploring some examples.

Issuing a mvn clean compile command in your shell will start the process.
If it's the first time you run Maven then it will try to download many dependencies, but don't worry and be patient, all successive runs will be very fast.


Friday, March 4, 2011

Purge Alfresco archived nodes

I was looking for a way to automatically purge the Alfresco trashcan and, after a while I think I came out to what looks like a decent solution.

DISCLAIMER: The procedure described in this article has not been tested intensively and comes without any implied warranty of fitness for a particular purpose. You should check the code, test it and decide yourself if fits your needs, saving all your data before any experiment.

The problem
After some time, deleting contents can fill the Alfresco's trashcan and removing nodes manually with the UI can be unpractical (users always forget about this). Alfresco does not actually delete content, but moves deleted nodes into the archive store, which is like a trashcan. Deleted contents can stay there forever, until users decide to clean-up the trashcan. In a big repository this could lead to a huge waste of resources.

I need a service I can invoke programmatically to empty the trashcan, for example by scheduling a task with an external job. I don't like to deploy into Alfresco a scheduled task controlled by the embedded Quartz, I think it's cleaner to move the scheduling outside and deploy into Alfresco always the bare minimum.

Even after the trashcan has been emptied, this just means nodes are only marked as "orphans", moved into alf_data/contentstore.deleted and can be phisically removed by a contentStoreCleaner asynchronous task. So there is a safety net in Alfresco to avoid at all costs accidental deletions.


Cleaning-up archived nodes
I have developed a simple Java-backed Web Script for Alfresco 3.4 (It should work with Alfresco 3.2+) which can be invoked to clean-up the archived nodes. Below its major components:

purge.get.desc.xml
Web Script descriptor

    Purge all
    Purge all archived nodes
    /purge
    user
    none

purge.get.html.ftl
Freemarker template

purge-context.xml
Spring bean's configuration

I created it/alfresco/utils folders under /Company Home/Data Dictionary/Web Scripts Extensions where I created both purge.get.desc.xml and purge.get.html.ftl.

The Spring context file purge-context.xml goes under /tomcat/shared/classes/alfresco/extension in the main alfresco installation folder.

Our bean makes use of nodeArchiveService.

Here's the Java Code:


The Purge project under Netbeans 6.9.1


The bean is injected with nodeArchiveService and calls method purgeAllArchivedNodes.

The single most important line of code is:
this.nodeArchiveService.purgeAllArchivedNodes(StoreRef.STORE_REF_WORKSPACE_SPACESSTORE);

We are passing the STORE_REF_WORKSPACE_SPACESSTORE constant, which is "the store that the items originally came from", as per JavaDocs:

purgeAllArchivedNodes

void purgeAllArchivedNodes(org.alfresco.service.cmr.repository.StoreRef originalStoreRef)
Permanently delete all archived nodes.
Parameters:
originalStoreRef - the store that the items originally came from
Calling the WebScript
After starting Alfresco, to get a list of available Web Scripts and check if this new one has been installed correctly, point the browser to http://localhost:8080/alfresco/service/index and then press link "Browse all Web Scripts". Remember to authenticate as admin, so that the Web Script can be ran with administrator privileges.

The "Purge" Web Script should be the first one

To invoke its execution and clean-up the trashcan you can call:
http://localhost:8080/alfresco/service/purge

If everything went fine you should see the following response page:
Alfresco Community Edition v3.4.0 (c 3335) :
Purged all archived nodes. Elapsed time: 438 ms.
Then verify all users' trashcans are now empty:


As we now have our RESTful purge Web Script in place, it's easy to call it from an external script, maybe scheduled via a cron job for a periodical clean-up. In alternative it's possible to use the Quartz engine embedded into Alfresco, but my personal preference is to avoid putting into Alfresco too many responsibilities: if you need to change the scheduling it's easier for maintenance to have an external scheduler.

Thursday, February 10, 2011

My Career Path

At least twice a year day I'm thinking about my career path, I know it's a masochist attitude I can't prevent. I have changed many companies in the last few years and I have spent a lot of time traveling and consulting abroad, somehow until complete exhaustion. This year I'm settling down a little and this can help thinking clearly and planning for the next move.

However, you don't have many options if you leave in a mid-technical environment like Italy, where my natural attitude for freelance consulting is not easily sustainable. On the other hand, taking an airplane each Monday morning and sleeping in hotels five days out of seven is no more in my top list of wet dreams... I realized that after suddenly awaking in the middle of the night without knowing in what city or even nation I was: if you did consulting for more than few weeks per year, you know what I mean.

At the end, even if one is in the middle of a transition, the most important thing is to know who you are and where you want to go in the next following years. I then was re-reading an old, beautiful article from Dan North when, in his blog, I have found the best possibile definition for what I want to go (back) next:
[...] In particular I found I had moved away from the things I really enjoyed – writing software that matters and building high-performing software teams – more towards big organisational change, which, while it arguably has a bigger impact on an organisation, isn’t really where I wanted to be. So my criteria for what to do next came down to: writing business-critical software in a small, high-performing team, in an organisation that trusts its people and encourages them to excel. Having a great relationship with the consumers of that software and having them closely engaged with its delivery would be a huge plus.
The above bold sentence should fly into my CV under the section "Career Objectives".

Sunday, January 16, 2011

It's time to apply consumer's design models to Enterprise systems

I'm an happy Apple Mac user since one year now. I have never used a Mac before one year ago, when I decided to make the quantum leap, also because most of my Alfresco colleagues run a Mac and I have been always curious to try it. Before my idea was that Macs were cool but too much of a closed platform, now I think Macs are just the best way to have my work done. I do not want to judge some Apple's very restrictive policies here, everybody has a different opinion (I think the idea of forcing Objective-C on the iPhone is moving millions of Java developers toward Google Android, for example).

I had my MacBook Pro stolen few months ago but I also have an older PC-compatible laptop as a backup: after two days using it I felt the urgency to run to the very first Apple store to buy a new Mac. Back home, I opened the box, switch on it and after a few seconds it was able to recognize there is a NAS with Time-Machine backups: it offered me to restore the last backup and, after few hours, it was like my previous Mac was never stolen, as every piece of software was in its place, even the same desktop wallpaper and icons.

I never have to run anti-virus software, I never have to run any disk defragmentation or registry maintenance. I do not waste time in maintenance of hardware drivers: Apple design both hardware and software to work together, so that the user experience is always the smoothest. I'm also an happy user of Ubuntu Linux, which in my opinion is by far the best desktop Linux distribution. Ubuntu also runs daily in my Mac through a VirtualBox image, when I have to run some Linux software for my work. Despite the fact Ubuntu software is (almost) as easy as a Mac to install, configure and run, the fact is that Canonical - the company behind Ubuntu - is a software firm only, so they have to run their beautiful O.S. on a multitude of different hardware devices and software device drivers. This is the same problem Windows has. Each computer comes with different internal devices (motherboard, CPU, hard disk, network cards, Wifi cards, graphic card, ....), so there always a combination of hardware devices and software drivers which could cause a compatibility issue. It is always a race for Canonical, Microsoft and the others to prove their software on a plethora of hardware and sometimes broken drivers. So, when we put the blame on Microsoft in cases when Windows can be occasionally unstable, actually we should blame the real source, which usually is some sub-component of our PC and quite always a buggy software driver, which is out of control of Microsoft and others.

Of course, the above was especially true in the early days of personal computing, when enthusiasts like me were used to assemble their PC by hand. Now, if you buy especially a laptop from, say, Dell, HP, IBM, etc... you can be pretty sure all components are pre-tested, so everything will (almost) work just fine. However, nobody designs and produces beautiful hardware and client software like Apple does so, despite all competitors' efforts, my position is that the overall Apple user experience is still by far superior, exactly because every single component, together with communication, is designed with integration and user's experience in mind. It is interaction design at its best.

This positive and viable integration, by contrast, is very hard to be experienced when moving from consumer computing to Enterprise systems nowadays. Let's be honest: distributed enterprise systems are a total mess of hardware and software combinations. The level of incompatibility that we could have in a single consumer devices is multiplied by a magnitude factor. The major problems with Enterprise systems implemented, for example, in JEE or .NET, are due to inter-application and external systems integration issues. The applicative layer usually runs into different application servers, which runs over different operating systems and usually must connect to different relational databases, external information systems (ERP, CRM, ECM, etc....) and messaging systems. There is an explosive combinatorial matrix of configurations in need for testing and QA. At the end each customer needs to run on the existing infrastructure with few variations, so most of the time spent maintaining enterprise applications is actually spent trying to force a software into a specific and unique infrastructure: pitfalls are everywhere. Support centers spend most of their time just running after specific software + hardware configuration issues. No wonder if today some crucial business processes are still running into very old but well HW + SW integrated mainframes.

The positive Apple's integration model should work also for enterprise-class solutions. In my mind Sun Microsystems was one of the companies going closer to applying the Apple's consumer integrated design model to the enterprise world. Sun designed both hardware and software (Solaris O.S., Java, etc were tuned to specific Sun's servers and components). Buying a Sun server was usually a positive integrated experience, running Oracle on Sun servers was a rock-solid decision. The main problem, in my humble opinion, is that Sun was not ready or not willing to go the extra mile: moving the software stack up to applicative layer and moving their business out of the commoditized server business. I mean, Sun's management perceived they had to, as the late open-source initiatives testifies, but the company has never been structured to make this vital step forward executing the process in the right way (McNealy almost acknowledges this), so most of the ideas remained on paper or stuck without execution until Oracle's acquisition was the only way to save the company.

I think strong infrastructural integration is now not enough for a vendor to exit from commoditization trends and for a customer to solve integration's madeness. Some Cloud Computing initiatives are a clear step in the direction of moving the stack up from Infrastructure as a Service to actual Software as a Service. At the end, moving in the cloud the infrastructure only is not going to solve most of the applicative-level problems of moder enterprise solutions: if, for example, I run my software into Amazon EC2, I can solve many infrastructural provisioning problems in a row, but at the applicative layer my J2EE solution is still running into different O.S and application servers which need to be integrated and tested for compatibility and performances.

I guess, even if existing platforms are maybe too young and still immature to move all solutions there, the right direction for better isolating applications and application development from infrastructural problems will be to provide a complete, uniform applicative platform where most reliability and scalability problems are solved by model's definition. I think initiatives like the Google App Engine, VMWare's Cloud Application Framework, Salesforce Platfom are good examples of what could bring Apple's consumer-side positive experience into the enterprise applications ecosystem soon. Companies should eventually show some perspective, intelligence and courage and start experimenting as soon as possible.

Update (20 Jan 2011): Amazon has just announced the new AWS Elastic Beanstalk, which I think should be included in the above list of advanced PaaS solution. It looks much like GAP and at first sight Amazon's new platform shows an interesting degree of flexibility for developers. First released version is for Java.