Class StreamingStep


  • public class StreamingStep
    extends Object
    Class that makes it easy to define Hadoop Streaming steps.

    See also: Hadoop Streaming

     AWSCredentials credentials = new BasicAWSCredentials(accessKey, secretKey);
     AmazonElasticMapReduce emr = new AmazonElasticMapReduceClient(credentials);
    
     HadoopJarStepConfig config = new StreamingStep()
         .withInputs("s3://elasticmapreduce/samples/wordcount/input")
         .withOutput("s3://my-bucket/output/")
         .withMapper("s3://elasticmapreduce/samples/wordcount/wordSplitter.py")
         .withReducer("aggregate")
         .toHadoopJarStepConfig();
    
     StepConfig wordCount = new StepConfig()
         .withName("Word Count")
         .withActionOnFailure("TERMINATE_JOB_FLOW")
         .withHadoopJarStep(config);
    
     RunJobFlowRequest request = new RunJobFlowRequest()
         .withName("Word Count")
         .withSteps(wordCount)
         .withLogUri("s3://log-bucket/")
         .withInstances(new JobFlowInstancesConfig()
             .withEc2KeyName("keypairt")
             .withHadoopVersion("0.20")
             .withInstanceCount(5)
             .withKeepJobFlowAliveWhenNoSteps(true)
             .withMasterInstanceType("m1.small")
             .withSlaveInstanceType("m1.small"));
    
     RunJobFlowResult result = emr.runJobFlow(request);
     
    • Constructor Detail

      • StreamingStep

        public StreamingStep()
        Creates a new default StreamingStep.
    • Method Detail

      • getInputs

        public List<String> getInputs()
        Get list of step input paths.
        Returns:
        List of step inputs
      • setInputs

        public void setInputs​(Collection<String> inputs)
        Set the list of step input paths.
        Parameters:
        inputs - List of step inputs.
      • withInputs

        public StreamingStep withInputs​(String... inputs)
        Add more input paths to this step.
        Parameters:
        inputs - A list of inputs to this step.
        Returns:
        A reference to this updated object so that method calls can be chained together.
      • getOutput

        public String getOutput()
        Get output path.
        Returns:
        Output path.
      • setOutput

        public void setOutput​(String output)
        Set the output path for this step.
        Parameters:
        output - Output path.
      • withOutput

        public StreamingStep withOutput​(String output)
        Set the output path for this step.
        Parameters:
        output - Output path
        Returns:
        A reference to this updated object so that method calls can be chained together.
      • getMapper

        public String getMapper()
        Get the mapper.
        Returns:
        Mapper.
      • setMapper

        public void setMapper​(String mapper)
        Set the mapper.
        Parameters:
        mapper - Mapper
      • withMapper

        public StreamingStep withMapper​(String mapper)
        Set the mapper
        Parameters:
        mapper - Mapper
        Returns:
        A reference to this updated object so that method calls can be chained together.
      • getReducer

        public String getReducer()
        Get the reducer
        Returns:
        Reducer
      • setReducer

        public void setReducer​(String reducer)
        Set the reducer
        Parameters:
        reducer - Reducer
      • withReducer

        public StreamingStep withReducer​(String reducer)
        Set the reducer
        Parameters:
        reducer - Reducer
        Returns:
        A reference to this updated object so that method calls can be chained together.
      • getHadoopConfig

        public Map<String,​String> getHadoopConfig()
        Get the Hadoop config overrides (-D values).
        Returns:
        Hadoop config.
      • setHadoopConfig

        public void setHadoopConfig​(Map<String,​String> hadoopConfig)
        Set the Hadoop config overrides (-D values).
        Parameters:
        hadoopConfig - Hadoop config.
      • withHadoopConfig

        public StreamingStep withHadoopConfig​(String key,
                                              String value)
        Add a Hadoop config override (-D value).
        Parameters:
        key - Hadoop configuration key.
        value - Configuration value.
        Returns:
        A reference to this updated object so that method calls can be chained together.
      • toHadoopJarStepConfig

        public HadoopJarStepConfig toHadoopJarStepConfig()
        Creates the final HadoopJarStepConfig once you are done configuring the step. You can use this as you would any other HadoopJarStepConfig.
        Returns:
        HadoopJarStepConfig representing this streaming step.