Hive & Hdfs to HBase Phoenix using Pig script

Few days before I had to work in my project to import data from hive to hbase phoenix database and the script I used during that time is given below :

Hive table schema

 	col_name	data_type	comment
0	roadid	int	
1	year	int	
2	month	int	
3	day	int	
4	hour	int	
5	total_trips	bigint

Here is Hive table snap short which has been stored in phoenix table

0	1612342	2016	4	1	3	3
1	1612342	2016	4	1	4	1
2	1612342	2016	4	1	5	8
3	1612342	2016	4	2	4	1
4	1612342	2016	4	2	7	4
5	1612342	2016	4	2	8	2
6	1612342	2016	4	2	9	2
7	1612342	2016	4	2	10	2
8	1612342	2016	4	2	11	2
9	1612342	2016	4	2	12	4
10	1612342	2016	4	3	11	1
11	1612342	2016	4	3	12	2
12	1612342	2016	4	3	13	4
13	1612342	2016	4	3	14	2
14	1612342	2016	4	3	15	3
15	1612342	2016	4	3	16	2
16	1612342	2016	4	3	17	4
17	1612342	2016	4	4	18	4
18	1612342	2016	4	6	2	1
19	1612342	2016	4	7	2	1

Pig Script

REGISTER /usr/hdp/current/phoenix-client/phoenix-client.jar

waypoints = LOAD 'zurich.zurich_waypoints_orc_cum' USING org.apache.hive.hcatalog.pig.HCatLoader();

STORE waypoints into 'hbase://waypoint_cum' using
org.apache.phoenix.pig.PhoenixHBaseStorage('namenode.selise.ch', '-batchSize 5000');

Here you can see that I have used a phoenix-client.jar which is located in my hortonworks distribution. Simply I have just registered the jar for the pig script.

Sample Hdfs data what I have stored in my phoenix db using pig script.

1887574,2016-04-05T10:26:43.000Z,8.6782,47.4279,c5a3d5c06036818187068cbf82d81f87,2016,04,05,10
1887574,2016-04-05T10:27:29.000Z,8.6829,47.4375,c5a3d5c06036818187068cbf82d81f87,2016,04,05,10
1887574,2016-04-05T10:28:54.000Z,8.7012,47.451,c5a3d5c06036818187068cbf82d81f87,2016,04,05,10
1822482,2016-04-07T16:26:50.000Z,8.5743,47.417,36b454bb2f66270d192d55fb35f68a59,2016,04,07,16
1974545,2016-04-07T16:27:32.000Z,8.5743,47.4221,36b454bb2f66270d192d55fb35f68a59,2016,04,07,16
1974545,2016-04-07T16:27:37.000Z,8.5742,47.4222,36b454bb2f66270d192d55fb35f68a59,2016,04,07,16
1974545,2016-04-07T16:27:42.000Z,8.5742,47.4223,36b454bb2f66270d192d55fb35f68a59,2016,04,07,16
1974545,2016-04-07T16:27:47.000Z,8.5742,47.4224,36b454bb2f66270d192d55fb35f68a59,2016,04,07,16
1974545,2016-04-07T16:27:52.000Z,8.5741,47.4225,36b454bb2f66270d192d55fb35f68a59,2016,04,07,16
1974545,2016-04-07T16:28:17.000Z,8.5741,47.4225,36b454bb2f66270d192d55fb35f68a59,2016,04,07,16
1974545,2016-04-07T16:28:18.000Z,8.574,47.4226,36b454bb2f66270d192d55fb35f68a59,2016,04,07,16
1974545,2016-04-07T16:28:21.000Z,8.5739,47.4227,36b454bb2f66270d192d55fb35f68a59,2016,04,07,16
1974545,2016-04-07T16:28:22.000Z,8.5738,47.4227,36b454bb2f66270d192d55fb35f68a59,2016,04,07,16
1974545,2016-04-07T16:28:27.000Z,8.5735,47.423,36b454bb2f66270d192d55fb35f68a59,2016,04,07,16
1974545,2016-04-07T16:28:28.000Z,8.5734,47.4231,36b454bb2f66270d192d55fb35f68a59,2016,04,07,16

The script what I have used to import hdfs data in phoenix/hbase

REGISTER /usr/hdp/current/phoenix-client/phoenix-client.jar

waypoints = LOAD 'hdfs://datanode1.selise.ch/home/admin/pyspark/zurich-road-waypoints-trimmed/part-m-000[0-9]*'
USING PigStorage(',')
AS (roadid:int, capture_date:chararray, longitude:double, latitude:double, tripid:chararray, year:int, month:int, day:int, hour:int);

STORE waypoints into 'hbase://waypoint' using
org.apache.phoenix.pig.PhoenixHBaseStorage('namenode.selise.ch', '-batchSize 5000');

Now lets run the script to test


pig  -x mapreduce hive2phoenix.pig

pig -x mapreduce hdfs2phoenix.pig

Leave a Reply