I am a newbie to Flume and Hadoop. We are developing a BI module where we can store all the logs from different servers in HDFS.
For this I am using Flume. I just started trying it out. Succesfully created a node but now I am willing to setup a HTTP source and a sink that will write incoming requests over HTTP to local file.
Any suggesstions?
Thanks in Advance/
Try this :
curl -X POST -H 'Content-Type: application/json; charset=UTF-8' -d '[{"username":"xrqwrqwryzas","password":"12124sfsfsfas123"}]' http://yourdomain.com:81/
It's a bit hard to tell exactly what you want from the way the Question is worded, but I'm operating on the assumption that you want to send JSON to Flume using HTTP POST requests and then have Flume dump those JSON events to HDFS (Not the Local File System). If that's what you want to do, this is what you need to do.
Make sure you create a directory in HDFS for Flume to send the events to, first. For example, if you want to send events to
/user/flume/events
in HDFS, you'll probably have to run the following commands:Configure Flume to use an HTTP Source and an HDFS Sink. You'll want to make sure to add in the interceptors for Host and Timestamp, otherwise your events will cause exceptions in the HDFS Sink because that sink is expecting a Host and Timestamp in the Event Headers. Also make sure to expose the port on the server that the Flume HTTPSource is listening on.
Here's a sample Flume config that works for the Cloudera Quickstart Docker container for CDH-5.7.0
It's necessary to create a Flume Client that can send the JSON events to the Flume HTTP in the format that it expects (this client could be as simple as a
curl
request). The most important thing about the format is that the JSON"body":
key must have a value that is a String."body":
cannot be a JSON object - if it is, theGson
library that the FlumeJSONHandler
is using to parse the JSONEvents will throw exceptions because it won't be able to parse the JSON - it is expecting a String.This is the JSON format you need:
Troubleshooting
/var/log/flume-ng/
). To fix this problem, increase thetier1.channels.channel1.capacity
.Hopefully this helps you get started. I'm having some problems testing this on my machine and don't have time to fully troubleshoot it right now, but I'll get to that...
Assuming you have Flume up and running right now, this should be what your flume.conf file needs to look like to use an HTTP POST source and local file sink (note: this goes to a local file, not HDFS)
Start Flume with the command on the second line. Tweak it for your needs (port, sink.directory, and rollInterval especially). This is a pretty bare minimum config file, there are more options availible, check out the Flume User Guide. Now, as far as this goes, the agent starts and runs fine for me....
Here's what I don't have time to test. The HTTP agent, by default, accepts data in JSON format. You -should- be able to test this agent by sending a cURL request with a form something like this:
-X sets the request to POST, -H sends headers, -d sends data (valid json), and then the host:port. The problem for me is that I get an error:
in my Flume client, invalid JSON? So something is being sent wrong. The fact that an error is popping up though shows the Flume source is receiving data. Whatever you have that's POSTing should work as long as it's in a valid format.