I currently have one Elastic Beanstalk instance running a Java application that is deployed to Tomcat. I deploy the application using the Web interface but the application uses a data file (Lucene index) referenced in the web.xml that I copy to the underlying EC2 instance by ssh-ing to EC2 and getting the data file from my S3 bucket.
So far so good.
But if I changed my EB to a autoscaleable environent so that it automatically creates new instances as required then these EC2 instances will not have the data file, how do I deal with this.
- Can I preconfigure each EC2 instance with datafile before it is actually used ??
- Can I have a shared fs that each server can refer to, (the datafiles are read only)?
* Update *
I think Ive worked out the answer in principle. I was uploading my application from my local machine then adding the large datafiles later from Amazon. What I need to do is build my war on my dataprocessing EC2 instance, add the datafile to the war somewhere, then put this war onto S3, then when I create my EB I need to load the WAR from the S3 bucket.
So just need to work out where data-file should go in War and how to create via Maven build process.
* Update 2 *
Actually its not clear that the data files should go in the WAR file after all, I cannot see where to put them and the application expects them to be real files so if contained within WAR and the WAR was not expanded/unjarred (I dont know what EB) does the application would not work anyway.
* Update 3 *
I could certainly put the data in S3 (in fact it will probably will be there to start with) So I wonder if on server initlization I could get the s3 data and put it somewhere and then use it ? Guidance please.
* Update 4 *
So using the s3 idea I nearly have it working, within the servlet init() method I get the compressed file, save it to the current working directory (/usr/share/tomcat7/) and then uncompress it. Trouble is the compressed file is 2.7GB,uncompressed folder it resolves to is 5GB , the minor instance used by EB offers 8GB of which 2GB is used. So I have 6GB which is enough space for the uncompressed file, but not to save the compressed file and then uncompress it because I need 2.7 GB + 5 GB during the uncompressing process.
I loaded the compressed version to S3 because the original data is not a single file but a folder full of files it would be difficult to manage as a list of files. I cannot change the size of root dir in EB, I could try changing to a powerful instance but that will unnessarily be more expensive and not clear what disk space is provided with instance used by ECB. Any ideas ?
These were the dependencies I added to my maven repo
<dependency>
<groupId>com.amazonaws</groupId>
<artifactId>aws-java-sdk</artifactId>
<version>1.8.2</version>
</dependency>
<dependency>
<groupId>org.rauschig</groupId>
<artifactId>jarchivelib</artifactId>
<version>0.6.0</version>
</dependency>
And this is the code
@Override
public void init()
{
try
{
log.severe("Retrieving Indexes from S3");
AWSCredentials credentials = new BasicAWSCredentials("***********", "***********");
AmazonS3Client ac = new AmazonS3Client(credentials);
log.severe("datalength-testfile:"+ac.getObjectMetadata("widget","test.txt").getContentLength());
File testFile = new File("test.txt");
ac.getObject(new GetObjectRequest("widget", "test.txt"), testFile);
log.severe("datalength-testfile:retrieved");
log.severe("datalength-largefile:"+ac.getObjectMetadata("widget","indexes.tar.gz").getContentLength());
File largeFile = new File("indexes.tar.gz");
ac.getObject(new GetObjectRequest("widget", "indexes.tar.gz"), largeFile);
log.severe("datalength-largefile:retrieved");
log.severe("Retrieved Indexes from S3");
log.severe("Unzipping Indexes");
File indexDirFile = new File(indexDir).getAbsoluteFile();
indexDirFile.mkdirs();
Archiver archiver = ArchiverFactory.createArchiver(largeFile);
archiver.extract(largeFile, indexDirFile);
log.severe("Unzipped Indexes");
}
catch(Exception e)
{
log.log(Level.SEVERE, e.getMessage(), e );
}
}
* Update 5 *
Having realized the micro EC2 instance only provide 0.6GB not 6GB i needed to update to a larger machine anyway and that provided two disks so I could copy compressed file to one disk and then uncompress to root disk successfully, so ready to go.
* Update 6 *
EB does not respect init() method so in autoscaled EB configuration it starts up other EC2 instances believing the 1st one to be overloaded when in fact it is just getting ready. And I suspect if it starts new ones when genuinely busy the load balancer will start feeding requests to these instances before they are ready causing failed requests.
* Update 7 *
Tried putting indexes directly into WEB-INF/classes and referring to that location in web.xml. This works on a local test Tomcat deployment but unfortunately fails in EB because complains So it seems EB doesnt respoect init(). So instead of trying to get the indexes from S3 within the init() method I just put the indexes directly into the War file under WEB-INF/classes and point the paramter in my web.xml to there. Although they are not actually classes this does not cause a problem for Tomcat and I have tested against deployment against a local tomcat installation without problem.
Unfortunately having uploaded this larger war file containign the indexes to S3 attempt to deploy it to EB from S3 location fails with:
Could not launch environment: Source bundle is empty or exceeds maximum allowed size: 524288000.
Why have Amazon imposed this arbitary limit ?
* Update 8 *
So possible options are
- ebextensions
- Docker deployment
- Create custom Amazon image for use with EB
3rd option seems very hacky, not all keen on that, or very keen on the others really.
* Update 9 **
I got it working with ebextensions in the end, wasnt too bad, I document here in case useful
If using maven create folder ebextensions in src/main/resources Add the following to pom.xml (sao that ebextensions goes in the right place in final war)
<plugin>
<artifactId>maven-war-plugin</artifactId>
<configuration>
<webResources>
<resource>
<directory>src/main/ebextensions</directory>
<targetPath>.ebextensions</targetPath>
<filtering>true</filtering>
</resource>
</webResources>
</configuration>
</plugin>
Create .config file in ebextensions folder ( I called mine copyindex.cfg) and mine had this information
commands:
01_install_cli:
command: wget https://s3.amazonaws.com/aws-cli/awscli-bundle.zip; unzip awscli-bundle.zip; ./awscli-bundle/install -b ~/bin/aws
02_get_index:
command:
aws s3 cp --region eu-west-1 s3://jthink/release_index.tar.gz /dev/shm/release_index.tar.gz;
cd /usr/share/tomcat7; tar -xvf /dev/shm/release_index.tar.gz
Go to IAM console ( https://console.aws.amazon.com/iam/home?#home) and attach role policy Power User to Elastic Beanstalk Role user
Deploy your application
There are multiple ways of achieving this. You do not need to ssh to the instance and copy your files.
I would recommend the approach in your "Update 3".
You can configure your Elastic Beanstalk environment to execute commands before deploying the application. You can do this using ebextensions. Read the documentation on commands here.
Essentially you create a folder with the name
.ebextensions
in your app source. This folder can contain one or more files with.config
extension. These files are processed in lexicographical order of their name. You can execute shell commands by using ebextensions. For example you can do the following:You will need to install aws cli on your EC2 instances first. This can again be done with a command similar to above. Instructions on how to install AWS CLI using the bundled installer are available here. You can run more than one command. The commands within a config file will be executed in lexicographical order so you can name your commands like
01_install_awcli
,02_download_index
etc.Now if you plan to use AWS CLI on the EC2 instance, you will also need credentials. If you are using an IAM Instance Profile (most likely you are, if not read about it here). You can give your instance profile permissions to access your S3 object using IAM. That way your instances will have an IAM instance profile associated with it and will be able to download the file from S3. Alternatively you can also directly get the ACCESS_KEY_ID and SECRET_KEY using environment properties as shown here.
All new instances that come up should execute the commands in your ebextensions. Thus your instances can be preconfigured with the software that you want.