As Hadoop adoption continues to expand, Hadoop jobs are growing in volume and complexity. To optimize their execution, different Hadoop jobs can be organized into a single logical unit of work, called a workflow. Apache Oozie is a powerful tool that creates and manages complex workflows of Hadoop jobs. However, it is fundamental to integrate Oozie workflows with the rest of the business process flow. With the new job plug-in for Oozie, workflows and Hadoop jobs can be scheduled, monitored, and controlled by IBM Workload Automation in exactly the same way as any other job, extending the central management of the company workload to the Hadoop environment.

Business scenario

Marnie works as a workload scheduler in the IT department of the Weather Office, which collects weather data every minute of every day. Data is processed and weather forecasts are published on a website every 10 minutes. Emergency alerts are sent via sms when severe weather events are predicted. To analyze the large volume of raw data, the Weather Office chose Apache Hadoop. To further optimize the data processing, application developers used Oozie to organize different Hadoop jobs into a single workflow. Thanks to the plug-in for Oozie, Marnie was able to extend the workload automation and control to the Oozie workflow. The Weather Office improved the timeliness and accuracy of the forecast service. In particular, by analyzing the Oozie workflow output variables, Marnie was able to optimize the alerting system for severe weather conditions.