java.lang.Object
com.amazonaws.services.elasticmapreduce.util.StepFactory

public class StepFactory extends Object
This class provides helper methods for creating common Elastic MapReduce step types. To use StepFactory, you should construct it with the appropriate bucket for your region. The official bucket format is "<region>.elasticmapreduce", so us-east-1 would use the bucket "us-east-1.elasticmapreduce".

Example usage, create an interactive Hive job flow with debugging enabled:

   AWSCredentials credentials = new BasicAWSCredentials(accessKey, secretKey);
   AmazonElasticMapReduce emr = new AmazonElasticMapReduceClient(credentials);

   StepFactory stepFactory = new StepFactory();

   StepConfig enableDebugging = new StepConfig()
       .withName("Enable Debugging")
       .withActionOnFailure("TERMINATE_JOB_FLOW")
       .withHadoopJarStep(stepFactory.newEnableDebuggingStep());

   StepConfig installHive = new StepConfig()
       .withName("Install Hive")
       .withActionOnFailure("TERMINATE_JOB_FLOW")
       .withHadoopJarStep(stepFactory.newInstallHiveStep());

   RunJobFlowRequest request = new RunJobFlowRequest()
       .withName("Hive Interactive")
       .withSteps(enableDebugging, installHive)
       .withLogUri("s3://log-bucket/")
       .withInstances(new JobFlowInstancesConfig()
           .withEc2KeyName("keypair")
           .withHadoopVersion("0.20")
           .withInstanceCount(5)
           .withKeepJobFlowAliveWhenNoSteps(true)
           .withMasterInstanceType("m1.small")
           .withSlaveInstanceType("m1.small"));

   RunJobFlowResult result = emr.runJobFlow(request);
 
  • Constructor Details

    • StepFactory

      public StepFactory()
      Creates a new StepFactory using the default Elastic Map Reduce bucket (us-east-1.elasticmapreduce) for the default (us-east-1) region.
    • StepFactory

      public StepFactory(String bucket)
      Creates a new StepFactory using the specified Amazon S3 bucket to load resources.

      The official bucket format is "<region>.elasticmapreduce", so if you're using the us-east-1 region, you should use the bucket "us-east-1.elasticmapreduce".

      Parameters:
      bucket - The Amazon S3 bucket from which to load resources.
  • Method Details

    • newScriptRunnerStep

      public HadoopJarStepConfig newScriptRunnerStep(String script, String... args)
      Runs a specified script on the master node of your cluster.
      Parameters:
      script - The script to run.
      args - Arguments that get passed to the script.
      Returns:
      HadoopJarStepConfig that can be passed to your job flow.
    • newEnableDebuggingStep

      public HadoopJarStepConfig newEnableDebuggingStep()
      When ran as the first step in your job flow, enables the Hadoop debugging UI in the AWS Management Console.
      Returns:
      HadoopJarStepConfig that can be passed to your job flow.
    • newInstallHiveStep

      public HadoopJarStepConfig newInstallHiveStep(StepFactory.HiveVersion... hiveVersions)
      Step that installs the specified versions of Hive on your job flow.
      Parameters:
      hiveVersions - the versions of Hive to install
      Returns:
      HadoopJarStepConfig that can be passed to your job flow.
    • newInstallHiveStep

      public HadoopJarStepConfig newInstallHiveStep(String... hiveVersions)
      Step that installs the specified versions of Hive on your job flow.
      Parameters:
      hiveVersions - the versions of Hive to install
      Returns:
      HadoopJarStepConfig that can be passed to your job flow.
    • newInstallHiveStep

      public HadoopJarStepConfig newInstallHiveStep()
      Step that installs the default version of Hive on your job flow. This is 0.4 for Hadoop 0.18 and 0.5 for Hadoop 0.20.
      Returns:
      HadoopJarStepConfig that can be passed to your job flow.
    • newRunHiveScriptStepVersioned

      public HadoopJarStepConfig newRunHiveScriptStepVersioned(String script, String hiveVersion, String... scriptArgs)
      Step that runs a Hive script on your job flow using the specified Hive version.
      Parameters:
      script - The script to run.
      hiveVersion - The Hive version to use.
      scriptArgs - Arguments that get passed to the script.
      Returns:
      HadoopJarStepConfig that can be passed to your job flow.
    • newRunHiveScriptStep

      public HadoopJarStepConfig newRunHiveScriptStep(String script, String... args)
      Step that runs a Hive script on your job flow using the default Hive version.
      Parameters:
      script - The script to run.
      args - Arguments that get passed to the script.
      Returns:
      HadoopJarStepConfig that can be passed to your job flow.
    • newInstallPigStep

      public HadoopJarStepConfig newInstallPigStep()
      Step that installs the default version of Pig on your job flow.
      Returns:
      HadoopJarStepConfig that can be passed to your job flow.
    • newInstallPigStep

      public HadoopJarStepConfig newInstallPigStep(String... pigVersions)
      Step that installs Pig on your job flow.
      Parameters:
      pigVersions - the versions of Pig to install.
      Returns:
      HadoopJarStepConfig that can be passed to your job flow.
    • newRunPigScriptStep

      public HadoopJarStepConfig newRunPigScriptStep(String script, String pigVersion, String... scriptArgs)
      Step that runs a Pig script on your job flow using the specified Pig version.
      Parameters:
      script - The script to run.
      pigVersion - The Pig version to use.
      scriptArgs - Arguments that get passed to the script.
      Returns:
      HadoopJarStepConfig that can be passed to your job flow.
    • newRunPigScriptStep

      public HadoopJarStepConfig newRunPigScriptStep(String script, String... scriptArgs)
      Step that runs a Pig script on your job flow using the default Pig version.
      Parameters:
      script - The script to run.
      scriptArgs - Arguments that get passed to the script.
      Returns:
      HadoopJarStepConfig that can be passed to your job flow.