case study ap log collector
TRANSCRIPT
Hadoop is a useful package for cloud computing environment. It’s also available for Big Data computing.
In this case, we concern how to collect the real-world wireless AP traffic records and store them into Hadoop HDFS.
With this system, we can take more real-world big data analysis based on these records.
No real-world and qualified data, no efficient big data processing and analysis!
Introduction
Our DDWRT wireless AP is based on TP-Link WR941ND v2/v3. It just has 4 MB flash storage and 30MB memory space. Meanwhile, it doesn’t provide powerful tool such as “Rflow” for offering comprehensive traffic detail and “curl” for posting data .
In order to simplifying the tasks of each machine, we need to prepare some servers to handle the jobs.
We consider the “Enterprise Integration Application ” scenario for general purposes.
Limitation
We consider the three stages to collect the traffic data:
1st : Wireless AP throw-out the traffic record to the 2nd
stage server.
2nd: This server collects the records and call Avro-RPC to
the 3rd stage server.
3rd: This server puts the data to its HDFS and makes the
response the 2nd.
Optionally, the 2nd server can take some logs in order to
tracing the failed event.
Limitation (cont.)
The Process for Collecting Traffic Data
DDWRT Wireless Traffic
Data
JSP Server for Transmission
Data with Avro
Hadoop HDFS Server
Use Hadoop HDFS to collect DDWRT Wireless Traffic Data
[GET] Http Access Apache Avro RPC
The Detail of the Methods
Crontab
* * * * * root [ ! -f /tmp/postdata.sh ] && wget http://your_domain/postdata.sh -O /tmp/postdata.sh && chmod +x /tmp/postdata.sh
* * * * * root [ ! -f /tmp/wrtbwmon ] && wget http://your_domain/wrtbwmon -O /tmp/wrtbwmon && chmod +x /tmp/wrtbwmon
* * * * * root /tmp/wrtbwmon setup br0
*/30 0-3 * * * root /tmp/wrtbwmon update /tmp/usage.db peak
10,40 0-3 * * * root /tmp/postdata.sh
*/30,59 4-8 * * * root /tmp/wrtbwmon update /tmp/usage.db offpeak
10,40 4-8 * * * root /tmp/postdata.sh
*/30 9-23 * * * root /tmp/wrtbwmon update /tmp/usage.db peak
10,40 9-23 * * * root /tmp/postdata.sh
Wireless AP: DDWRT Settings
Post Data Script
#!/bin/sh
data=`cat usage.db|tr "\\n" "$"|tr " " "_"`
wget http://your_domain/ddwrt_collector.jsp?data=${data}
-O /dev/null
Wireless AP: DDWRT Settings (cont.)
Prepare some needed libraries for Apache-Tomcat in into <CATALINA>/libs jackson-core-asl-1.9.13.jar
jackson-mapper-asl-1.9.13.jar
avro-1.7.5.jar
avro-ipc-1.7.5.jar
Netty-3.4.0.Final.jar
slf4j-api-1.7.5.jar
Put your compiled Avro-RPC classes into <CATALINA>/libs
Receive the [GET] data from DDWRT-AP.
Call Avro-RPC to put data.
Wait for putting data in Hadoop HDFS…
JSP Server
Do NOT include avro-tools.jar
<body>
<%
String mes = "";
try {
String inputData = "";
if(request.getParameter("data") != null)
{
Usage req = new Usage();
inputData = request.getParameter("data");
StringTokenizer line = new StringTokenizer(inputData,"$");
while(line.hasMoreTokens())
{
NettyTransceiver client = new NettyTransceiver(
new InetSocketAddress(<your_host>,<your_port>));
NubLookup proxy = (NubLookup) SpecificRequestor.getClient(NubLookup.class, client);
String record = line.nextToken();
String[] items = record.split(",");
req.mac_addr = new Utf8(items[0]);
req.upload_peak_on = Integer.parseInt(items[1]);
req.download_peak_on = Integer.parseInt(items[2]);
req.upload_peak_off = Integer.parseInt(items[3]);
req.download_peak_off = Integer.parseInt(items[4]);
req.time = new Utf8(items[5]);
mes += "Result:" + proxy.send(req).is_ok + "<br />";
client.close();
}
}
} catch (IOException e) {
mes = e.toString();
}
%>
<h1><%=mes %></h1>
</body>
JSP Server: procedure code
Avro Serializable RPC call
Await the Avro RPC.
When the RPC call comes, put the data with indicated parameters of the RPC into HDFS.
You should put the needed libraries(generated source code):
Org.apache.avro.data.*
Org.apache.avro.generic.*
Org.apache.ipc.*
…
Hadoop HDFS Server
public class RPC
{
private static NettyServer server;
// A mock implementation
public static class NubLookupImpl implements NubLookup {
public Response send(Usage request) throws AvroRemoteException {
Response r = new Response();
try {
Calendar cal = Calendar.getInstance();
cal.getTime();
SimpleDateFormat sdf = new SimpleDateFormat("yyyy-MM-dd-HH-mm-ss-S");
Path pt=new Path("ddwrt_log/" + sdf.format(cal.getTime()).toString() );
Configuration conf = new Configuration();
conf.addResource(new Path("core-site.xml"));
FileSystem fs = FileSystem.get(conf);
BufferedWriter br=new BufferedWriter(new OutputStreamWriter(fs.create(pt,true)));
String record = request.mac_addr + " " + request.upload_peak_on + " " + request.download_peak_on + " " + request.upload_peak_off + " " + request.download_peak_off + " " + request.time;
System.out.println(record);
br.write(record);
r.is_ok = true;
}
catch (Exception exp)
{
System.out.println(exp);
r.is_ok = false;
}
return r;
}
}
public static void main( String[] args )
{
server = new NettyServer(new SpecificResponder(NubLookup.class, new NubLookupImpl()),new InetSocketAddress(<your_host>,<your_port>));
server.getPort();
}
}
NettyServer with Custom Avro-RPC
Libraries
RPC Call Implementation
Stored Records in HDFS
Avro
Avro-Tools
Avro-IPC
NettyServer/Netty Project
Jackson JSON Packages
Hadoop
Apache Tomcat/JSP
Simple Logging Facade for Java (SLF4J)
Wrtbwmon
DDWRT
Maven
The more important thing: Be patient.
Reference: Needed Packages or
Software Packs
Thank you!
D-Link AP Array: The integrated software for
controlling a numbers of D-Link AP. (Each AP cost: NT
$3600)
Related Work: Current Techniques
RFlow
RFlow (cont.)