How Do You Connect to Your Apache Spark Deployment in AWS?

Problem scenario
You have recently deployed Apache Spark to AWS.  You see the EC-2 instances were created.  But you cannot access them over the web UI (even over ports 4140, 8088, or 50070).  You cannot access the instances via Putty.  You changed your normal Security Group to allow TCP communication from your work station's IP address.  What should you do to connect to your new Spark instance for the first time?

Solution
The root cause is the deployment of Apache Spark creates security groups behind-the-scenes.  Go to Security Groups and look for a new security group.  The "Group Name" will be something like "ElasticMapReduce-master" and "ElasticMapReduce-slave."  The "Description" will be something like "Master group for Elastic MapReduce..." or "Slave group for Elastic MapReduce..."  Change these security groups to allow for inbound connections from your workstation's IP address. Now you should be able to connect to your Spark instance.

Leave a comment

Your email address will not be published. Required fields are marked *