Batch processing is widely used in back-end data processing, data roll-up, and off-line processing. In part 1 of this post, I will talk about WebSphere Feature Pack for Modern Batch. In part 2 I will talk about Spring Batch, and in part 3 we will develop our own simple batch processing framework.
If you are using WebSphere Application Server V7, you can install Feature Pack for Modern Batch for free. Feature pack provides support for Java batch programming model and tools to control and execute batch applications. In normal J2EE applications, an individual request is usually completed in seconds but batch work may run for hours or even days. Feature Pack for Modern Batch extends the WebSphere Application Server to accommodate these resource intensive and long running applications.
Major parts of Feature Pack for Modern Batch
Job Scheduler: Job scheduler provides job management functions like submit, cancel, and restart. A relational database is used to store all job history like jobs waiting to run, currently running jobs, and jobs already finished. Job scheduler provides management functions through web based management console, a command line shell, web services, and EJB interface. Job scheduler can be hosted on individual WAS server or a cluster.
Batch Container: Batch container provides execution environment for batch jobs. A relational database is used to store checkpoint information for transactional batch jobs.
Java EE batch application: Batch applications are standard Java EE applications which are deployed as EAR files and implements either a transactional or compute-intensive programming model provided by Modern Batch.
xJCL: Batch jobs are described using XML-based job control language. xJCL file identifies which applications to run, inputs, and outputs.
Batch Programming Models
Feature pack for Modern Batch provides a transactional model and a compute-intensive programming model. Both are implemented as Java objects and packaged in EAR for deployment.
A transactional batch job can be composed of one or more batch steps and are processed sequentially. Checkpoint algorithms are used by run-time to decide how often to commit transactions. Checkpoint algorithms are defined in the xJCL job control file. Time-based and record-based checkpoint algorithms are provided by the Feature Pack and provides API to create your own checkpoint algorithms. Batch jobs return system defined return codes as well as application defined result codes. Optional result algorithms can also be defined to act on return codes.
Compute-intensive batch jobs are not divided into steps and only have one job step. They are submitted asynchronously and run for extended periods of time. Packaged compute-intensive application can contain multiple work objects.