Class StepFactory
- java.lang.Object
-
- com.amazonaws.services.elasticmapreduce.util.StepFactory
-
public class StepFactory extends Object
This class provides helper methods for creating common Elastic MapReduce step types. To use StepFactory, you should construct it with the appropriate bucket for your region. The official bucket format is "<region>.elasticmapreduce", so us-east-1 would use the bucket "us-east-1.elasticmapreduce".Example usage, create an interactive Hive job flow with debugging enabled:
AWSCredentials credentials = new BasicAWSCredentials(accessKey, secretKey); AmazonElasticMapReduce emr = new AmazonElasticMapReduceClient(credentials); StepFactory stepFactory = new StepFactory(); StepConfig enableDebugging = new StepConfig() .withName("Enable Debugging") .withActionOnFailure("TERMINATE_JOB_FLOW") .withHadoopJarStep(stepFactory.newEnableDebuggingStep()); StepConfig installHive = new StepConfig() .withName("Install Hive") .withActionOnFailure("TERMINATE_JOB_FLOW") .withHadoopJarStep(stepFactory.newInstallHiveStep()); RunJobFlowRequest request = new RunJobFlowRequest() .withName("Hive Interactive") .withSteps(enableDebugging, installHive) .withLogUri("s3://log-bucket/") .withInstances(new JobFlowInstancesConfig() .withEc2KeyName("keypair") .withHadoopVersion("0.20") .withInstanceCount(5) .withKeepJobFlowAliveWhenNoSteps(true) .withMasterInstanceType("m1.small") .withSlaveInstanceType("m1.small")); RunJobFlowResult result = emr.runJobFlow(request);
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description static class
StepFactory.HiveVersion
The available Hive versions.
-
Constructor Summary
Constructors Constructor Description StepFactory()
Creates a new StepFactory using the default Elastic Map Reduce bucket (us-east-1.elasticmapreduce) for the default (us-east-1) region.StepFactory(String bucket)
Creates a new StepFactory using the specified Amazon S3 bucket to load resources.
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description HadoopJarStepConfig
newEnableDebuggingStep()
When ran as the first step in your job flow, enables the Hadoop debugging UI in the AWS Management Console.HadoopJarStepConfig
newInstallHiveStep()
Step that installs the default version of Hive on your job flow.HadoopJarStepConfig
newInstallHiveStep(StepFactory.HiveVersion... hiveVersions)
Step that installs the specified versions of Hive on your job flow.HadoopJarStepConfig
newInstallHiveStep(String... hiveVersions)
Step that installs the specified versions of Hive on your job flow.HadoopJarStepConfig
newInstallPigStep()
Step that installs the default version of Pig on your job flow.HadoopJarStepConfig
newInstallPigStep(String... pigVersions)
Step that installs Pig on your job flow.HadoopJarStepConfig
newRunHiveScriptStep(String script, String... args)
Step that runs a Hive script on your job flow using the default Hive version.HadoopJarStepConfig
newRunHiveScriptStepVersioned(String script, String hiveVersion, String... scriptArgs)
Step that runs a Hive script on your job flow using the specified Hive version.HadoopJarStepConfig
newRunPigScriptStep(String script, String... scriptArgs)
Step that runs a Pig script on your job flow using the default Pig version.HadoopJarStepConfig
newRunPigScriptStep(String script, String pigVersion, String... scriptArgs)
Step that runs a Pig script on your job flow using the specified Pig version.HadoopJarStepConfig
newScriptRunnerStep(String script, String... args)
Runs a specified script on the master node of your cluster.
-
-
-
Constructor Detail
-
StepFactory
public StepFactory()
Creates a new StepFactory using the default Elastic Map Reduce bucket (us-east-1.elasticmapreduce) for the default (us-east-1) region.
-
StepFactory
public StepFactory(String bucket)
Creates a new StepFactory using the specified Amazon S3 bucket to load resources.The official bucket format is "<region>.elasticmapreduce", so if you're using the us-east-1 region, you should use the bucket "us-east-1.elasticmapreduce".
- Parameters:
bucket
- The Amazon S3 bucket from which to load resources.
-
-
Method Detail
-
newScriptRunnerStep
public HadoopJarStepConfig newScriptRunnerStep(String script, String... args)
Runs a specified script on the master node of your cluster.- Parameters:
script
- The script to run.args
- Arguments that get passed to the script.- Returns:
- HadoopJarStepConfig that can be passed to your job flow.
-
newEnableDebuggingStep
public HadoopJarStepConfig newEnableDebuggingStep()
When ran as the first step in your job flow, enables the Hadoop debugging UI in the AWS Management Console.- Returns:
- HadoopJarStepConfig that can be passed to your job flow.
-
newInstallHiveStep
public HadoopJarStepConfig newInstallHiveStep(StepFactory.HiveVersion... hiveVersions)
Step that installs the specified versions of Hive on your job flow.- Parameters:
hiveVersions
- the versions of Hive to install- Returns:
- HadoopJarStepConfig that can be passed to your job flow.
-
newInstallHiveStep
public HadoopJarStepConfig newInstallHiveStep(String... hiveVersions)
Step that installs the specified versions of Hive on your job flow.- Parameters:
hiveVersions
- the versions of Hive to install- Returns:
- HadoopJarStepConfig that can be passed to your job flow.
-
newInstallHiveStep
public HadoopJarStepConfig newInstallHiveStep()
Step that installs the default version of Hive on your job flow. This is 0.4 for Hadoop 0.18 and 0.5 for Hadoop 0.20.- Returns:
- HadoopJarStepConfig that can be passed to your job flow.
-
newRunHiveScriptStepVersioned
public HadoopJarStepConfig newRunHiveScriptStepVersioned(String script, String hiveVersion, String... scriptArgs)
Step that runs a Hive script on your job flow using the specified Hive version.- Parameters:
script
- The script to run.hiveVersion
- The Hive version to use.scriptArgs
- Arguments that get passed to the script.- Returns:
- HadoopJarStepConfig that can be passed to your job flow.
-
newRunHiveScriptStep
public HadoopJarStepConfig newRunHiveScriptStep(String script, String... args)
Step that runs a Hive script on your job flow using the default Hive version.- Parameters:
script
- The script to run.args
- Arguments that get passed to the script.- Returns:
- HadoopJarStepConfig that can be passed to your job flow.
-
newInstallPigStep
public HadoopJarStepConfig newInstallPigStep()
Step that installs the default version of Pig on your job flow.- Returns:
- HadoopJarStepConfig that can be passed to your job flow.
-
newInstallPigStep
public HadoopJarStepConfig newInstallPigStep(String... pigVersions)
Step that installs Pig on your job flow.- Parameters:
pigVersions
- the versions of Pig to install.- Returns:
- HadoopJarStepConfig that can be passed to your job flow.
-
newRunPigScriptStep
public HadoopJarStepConfig newRunPigScriptStep(String script, String pigVersion, String... scriptArgs)
Step that runs a Pig script on your job flow using the specified Pig version.- Parameters:
script
- The script to run.pigVersion
- The Pig version to use.scriptArgs
- Arguments that get passed to the script.- Returns:
- HadoopJarStepConfig that can be passed to your job flow.
-
newRunPigScriptStep
public HadoopJarStepConfig newRunPigScriptStep(String script, String... scriptArgs)
Step that runs a Pig script on your job flow using the default Pig version.- Parameters:
script
- The script to run.scriptArgs
- Arguments that get passed to the script.- Returns:
- HadoopJarStepConfig that can be passed to your job flow.
-
-