Saturday, August 25, 2007

java.lang.OutOfMemoryError: unable to create new native thread

Symptoms


Some days ago a flow running into JCAPS 5.1 produced this exception:
java.lang.OutOfMemoryError: unable to create new native thread

The first attempt, especially if you are used to ICAN 5.0, would be to add memory to the JVM's heap with the -Xmx flag. But forget for a while about the misleading "OutOfMemoryError" and focus to the rest of the message: it is clearly telling that the JVM was asking the O.S. to create a native thread, but that was not possible. It does not mean you don't have enough heap. In fact, the mentioned flow was already running into a Logicalhost with 1024 Mb of memory and there was no sign that it was not enough.

Diagnosis


Depending on your operating system and JVM version, you can have a pretty different per-thread stack size which affects both the maximum number of native threads you can start and the overall consumed memory. See the Java HotSpot VM Options:

Thread Stack Size (in Kbytes). (0 means use default stack size)
Sparc: 512;
Solaris x86: 320 (was 256 prior in 5.0 and earlier);
Sparc 64 bit: 1024;
Linux amd64: 1024 (was 0 in 5.0 and earlier);
all others 0.

When your application is trying to start too many threads you might need to decrease the default stack assigned to each thread using the -Xss parameter, so that each single thread has less stack but you can create more of them. For some operating systems this is not enough, you should decrease the O.S stack size using the "ulimit -s" command.


Currently, some stack sizes are:

ThreadSS VMThreadSS CompilerThreadSS default_stack_size

SPARC 32 512K 0 C2:2048K C1:0 not used
SPARC 64 1024K 0 C2:2048K C1:0 not used
Solaris i486 256K 0 C2:2048K C1:0 not used

Linux i486 0 0 0 512K
Linux ia64 0 0 0 1024K

Win32 i486 0 0 0 ASSERTs:1024K 0
Win32 ia64 0 0 0 ASSERTs:1024K 0

Notes:
1) 0 for VMThreadSS and CompilerThreadSS implies
use of ThreadSS value

2) 0 for ThreadSS implies use of default_stack_size

Generally speaking you should usually start testing your flows with a little JVM heap, the default JCAPS 512 Mb is normally enough. Then you should increase step by step this value only and only if your processes are allocating big data structures requiring more heap. A good step size could be 256 Mb. It is never a brilliant idea to set your domain's heap to, say, 1.5 Gb by default only because this seems to let you sleep well. Additionally to the mentioned thread problem a bigger heap will lead to more complex and longer garbage collection cycles, penalizing performances in the medium term. It is then a good idea to set the same value for -Xms and -Xmx to help the GC.

Tuesday, August 14, 2007

Java CAPS: Processing Large XML Payloads Using a SAX Parser

Introduction


In Java CAPS the standard way to deal with XML files is to parse them through Object Type Definitions (OTD). An OTD represents a XML file as a Java object, it provides marshal and unmarshal methods, plus setters and getters for each XML document's element.
The OTD is a smart way to create a DOM tree in memory, starting from the XML source document. However, if the XML file is large loading it entirely in memory through a DOM representation is generally not a great idea. In this case is common to use a SAX parser, which allows to process the XML file as a stream instead of loading the entire object in memory. SAX parsing is of course easily implementable in Java CAPS, as this article will briefly show.

Implementation


The implementation is straightforward, it is just plain Java code. In this example an the eGate flow is triggered by an event in the form of a JMS message containing the filename. As the XML file we'd like to process is assumably large (otherwise why bother us with SAX...) it probably resides in some filesystem, so in this case a BatchLocalFile (part of the optional Batch eWay) can be used to read it. You are not doing such a stupid thing like sending multiple megabytes payloads through your JMS server, aren't you? As a general rule of thumb, it is a wise idea to keep your JMS payloads below 1 Mb, to avoid overloading your JMS server. As already explained in other posts, I think moving bigger payloads through JMS is a clear indicator of some flaws in your process' design and, sooner or later, it will drive to troubles.

Connectivity Map


Below the simple CM for this example:

The queIn channel receives triggering events for the svcSaxParser service, which makes use of a BatchLocalFile external application to read the file from disk. The JCD, as described below, is really trivial and logs some elements using the standard logger.

Java Collaboration Definition


the SAX parsing service is implemented through a JCD called jcdSaxParser. It receives the input JMS message containing the filename, opens the InputStream from disk and assign it to the SAX parser. A SAX's DefaultHandler inner class, called (with some lack of fantasy...) MyHandler, is defined and used to intercept SAX events:

package SamplesprjSAXJCD;

import java.io.InputStream;
import org.xml.sax.helpers.DefaultHandler;
import javax.xml.parsers.SAXParserFactory;
import org.xml.sax.Attributes;
import org.xml.sax.SAXException;

public class jcdSaxParser
{
public com.stc.codegen.logger.Logger logger;
public com.stc.codegen.alerter.Alerter alerter;
public com.stc.codegen.util.CollaborationContext collabContext;
public com.stc.codegen.util.TypeConverter typeConverter;

public void receive( com.stc.connectors.jms.Message input, com.stc.eways.batchext.BatchLocal BatchLocalFile_1 )
throws Throwable
{
try {
BatchLocalFile_1.getConfiguration().setTargetDirectoryName( "D:\\Projects" );
BatchLocalFile_1.getConfiguration().setTargetFileName( input.getTextMessage() );
InputStream istream = BatchLocalFile_1.getClient().getInputStreamAdapter().requestInputStream();
// Create a handler to handle SAX events
DefaultHandler handler = new MyHandler( logger );
// Parse the stream
parseXmlStream( istream, handler, false );
BatchLocalFile_1.getClient().getInputStreamAdapter().releaseInputStream( true );
} catch ( Exception ex ) {
logger.error( "@@@ ", ex );
}
}

// Parses an XML stream using a SAX parser.
public static void parseXmlStream( InputStream istream, DefaultHandler handler, boolean validating )
throws Exception
{
SAXParserFactory factory = SAXParserFactory.newInstance();
factory.setValidating( validating );
factory.newSAXParser().parse( istream, handler );
}

// DefaultHandler contain no-op implementations for all SAX events.
// This class should override methods to capture the events of interest.
static class MyHandler extends DefaultHandler
{
private final com.stc.codegen.logger.Logger _logger;
private final StringBuffer _buff = new StringBuffer( 1024 );

public MyHandler( com.stc.codegen.logger.Logger logger )
{
_logger = logger;
}

public void startElement( String uri, String localName, String qName, Attributes attributes )
throws SAXException
{
_buff.append( "startElement: uri=" ).append( uri ).append( ", localName=" ).append( localName ).append( ", qName=" ).append( qName ).append( "\n" );
}

public void characters( char[] cbuf, int start, int len )
throws SAXException
{
_buff.append( "Characters: " ).append( new String( cbuf, start, len ) );
}

public void endElement( String uri, String localName, String qName )
throws SAXException
{
if (_buff.length() > 0) {
_logger.info( "@@@ " + _buff.toString() );
_buff.delete( 0, _buff.length() );
}
}
}
}

After creating a proper Deployment Profile you can run this flow by sending a JMS message containing the filename into queue queIn (you can use the eManager for that). Then you just need to add to the MyHandler class some more useful functionality.

The source stream was obtained from the InputStramAdapter of the BatchLocalFile:
InputStream istream = BatchLocalFile_1.getClient().getInputStreamAdapter().requestInputStream();
Then the parsing is done by passing both the InputStream and the handler to the SAXPArser's parse method:
factory.newSAXParser().parse( istream, handler );

Conclusions


If you were struggling with 100 Mb big XML files and using OTD you've got plenty of OutOfMemory errors, you could try to implement a SAX parsing process as described in this article. Before implementing this technique ask yourself why the hell you are producing so big XML files and then try to fix your data model or your process, because to me you are using XML the wrong way.
A typical case where dealing with large XML files could be unavoidable is for HL7 v.3.0 XML messages: specs define huge XML Schemas for that standard, it could be even impossible to generate an OTD with the eDesigner.

Friday, August 10, 2007

The CAP Theorem

In this InfoQ video presentation Amazon's CTO Dr Werner Vogels discuss about availability and consistency for distributed systems. The central item is the "CAP theorem", Dr Vogels describes it starting by this question:

What goals might you want from a shared-data system?

- Strong Consistency: all clients see the same view, even in presence of updates
- High Availability: all clients can find some replica of the data, even in the presence of failures
- Partition-tolerance: the system properties hold even when the system is partitioned

The theorem states that you can always have only two of the three CAP properties at the same time. The first property, Consistency, has to do with ACID systems, usually implemented through the two-phase commit protocol (XA transactions).

In his presentation Dr Vogels explain why big shops like Amazon and Google, as they handle an incredibly huge number of transactions and data, always need some kind of system partitioning. Amazon then must provide high availability, for example a customer must always has access to the shopping cart, because it obviously means that the customer is committing to buy something. As for Amazon the third and second CAP properties (Availability and Partitioning) are fixed, they need to sacrifice Consistency. It means they prefer to compensate or reconcile inconsistencies instead of sacrificing high availability, because their primary need is to scale well to allow for a smooth user experience.

This IMHO leads to some easy conclusions: most legacy application servers and relational database systems are built with consistency as their primary target, while big shops really need high availability. That's why firms like Google or Amazon have developed their own applicative infrastructure. That's why, as Dr Vogels presentations explain well, a two-phase commit protocol is never an appropriate choice in case of big scalability needs. On this subject you can also read this article from Gregor Hohpe: Your Coffee Shop Does Not Use Two-Phase Commit

To scale-up what you really need are asynchronous, stateless services, together with a good reconciliation and compensation mechanism in case of errors. Second, your data model has a dramatic impact on performances, that's why Amazon has implemented a simple put/get API instead of running complex database queries, and why Google performances are due to the MapReduce algorithm: simplicity rules.