`

HCatalog安装配置学习笔记

 
阅读更多

转载请标明出处SpringsSpace: http://springsfeng.iteye.com

 

1. 编译HCatalog安装包

   (0) JDK: jdkjdk1.6.0_43
   (1) 下载源码包hcatalog-trunk.zip, 下载地址:https://github.com/apache/hcatalog
   并解压至:/home/kevin/Downloads/hcatalog-trunk
   (2) 安装ANT, 并配置好环境变量:ANT_HOME, PATH,完整环境变量配置请参考本篇

   文章第5部分:5. HCatalog环境变量设置
   (3) 下载Apache Forrest
   apache-forrest-0.9-sources.tar.gz, apache-forrest-0.9-dependencies.tar.gz
   并将上述两个文件解压至一个文件夹下:/usr/custom/apache-forrest-0.9
   (4) 编译

   注意:因为这次测试的是HCatalog0.6.0-SNAPSHOT运行在YARN(hadoop-2.0.2-alpha),

   hive-0.11.0-SNAPSHOT, pig-0.11.0之上,请使用ANT进行编译,不要使用Maven进行

   编译, 编译前请修改下面文件:

             A. build.properties:  注意红色部分。

             mvn.deploy.repo.id=apache.snapshots.https
             mvn.deploy.repo.url=https://repository.apache.org/content/repositories/snapshots
             maven-ant-tasks.version=2.1.3
             mvn.local.repo=${user.home}/.m2/repository
             mvn.hadoop.profile=hadoop23

             B. build.xml: depends删除docs 。            
<!--
    ===============================================================================
    Distribution Section
    ===============================================================================
    -->
    <target name="package" depends="jar" description="Create an HCatalog release">
        <mkdir dir="${dist.dir}"/>
        <mkdir dir="${dist.dir}/share/${ant.project.name}/lib"/>
        <mkdir dir="${dist.dir}/etc/hcatalog"/>
        <mkdir dir="${dist.dir}/bin"/>
        <mkdir dir="${dist.dir}/sbin"/>
        <mkdir dir="${dist.dir}/share/${ant.project.name}/scripts"/>
        <mkdir dir="${dist.dir}/share/${ant.project.name}/templates/conf"/>
        <mkdir dir="${dist.dir}/share/doc/${ant.project.name}"/>
        <mkdir dir="${dist.dir}/share/doc/${ant.project.name}/api"/>
        <mkdir dir="${dist.dir}/share/doc/${ant.project.name}/jdiff"/>
        <mkdir dir="${dist.dir}/share/doc/${ant.project.name}/license"/>
               C. pom.xml
...
<properties>
      <activemq.version>5.5.0</activemq.version>
      <commons-exec.version>1.1</commons-exec.version>
      <commons-io.version>2.4</commons-io.version>
      <guava.version>11.0.2</guava.version>
      <hadoop20.version>1.0.3</hadoop20.version>
      <hadoop23.version>0.23.3</hadoop23.version>
      <hbase.version>0.92.0</hbase.version>
      <hcatalog.version>${project.version}</hcatalog.version>
      <hive.version>0.11.0</hive.version>
      <jackson.version>1.8.3</jackson.version>
      <jersey.version>1.14</jersey.version>
      <jetty.webhcat.version>7.6.0.v20120127</jetty.webhcat.version>
      <jms.version>1.1</jms.version>
      <pig.version>0.11.0</pig.version>
      <slf4j.version>1.6.1</slf4j.version>
      <wadl-resourcedoc-doclet.version>1.4</wadl-resourcedoc-doclet.version>
      <zookeeper.version>3.4.3</zookeeper.version>
  </properties>
...
   注意:<hive.version>0.11.0</hive.version>问题,请参考Hive的编译:
   http://springsfeng.iteye.com/admin/blogs/1734517  
<hadoop23.version>2.0.4-alpha</hadoop23.version>
   cd /home/kevin/Downloads/hcatalog-trunk

   /usr/custom/apache-ant-1.8.4/bin/ant -Dhcatalog.version=0.6.0-SNAPSHOT -Dforrest.home=/usr/custom/apache-forrest-0.9 package
   编译后在/home/kevin/Downloads/hcatalog-trunk/build目录下生成:

   hcatalog-0.6.0-SNAPSHOT.tar.gz

   (5) 编译过程中产生的异常

   异常1:
compile:
     [echo] hcatalog-core
    [mkdir] Created dir: /home/kevin/Downloads/hcatalog-trunk/core/build/classes
    [javac] Compiling 75 source files to /home/kevin/Downloads/hcatalog-trunk/core/build/classes
    [javac] /home/kevin/Downloads/hcatalog-trunk/core/src/main/java/org/apache/hcatalog/cli/SemanticAnalysis/CreateDatabaseHook.java:54: cannot access org.antlr.runtime.tree.CommonTree
    [javac] class file for org.antlr.runtime.tree.CommonTree not found
    [javac]         int numCh = ast.getChildCount();

解决方法:
Copy Hive/lib目录下的antlr-2.7.7.jar和antlr-runtime-3.4.jar至/home/kevin/Downloads/hcatalog-trunk/core/build/lib/compile目录下,重新执行编译。
异常2:
compile:
     [echo] hcatalog-server-extensions
    [mkdir] Created dir: /home/kevin/Downloads/hcatalog-trunk/server-extensions/build/classes
    [javac] Compiling 19 source files to /home/kevin/Downloads/hcatalog-trunk/server-extensions/build/classes
    [javac] /home/kevin/Downloads/hcatalog-trunk/server-extensions/src/main/java/org/apache/hcatalog/listener/NotificationListener.java:96: cannot access com.facebook.fb303.FacebookBase
    [javac] class file for com.facebook.fb303.FacebookBase not found
    [javac]                 .get_table(partition.getDbName(), partition.getTableName())
    [javac]                 ^
解决方法:
Copy Hive/lib目录下的libfb303-0.9.0.jar至/home/kevin/Downloads/hcatalog-trunk/server-extensions/build/lib/compile目录下,重新执行编译。
异常3:
compile-src:
    [javac] Compiling 35 source files to /home/kevin/Downloads/hcatalog-trunk/storage-handlers/hbase/build/classes
    [javac] /home/kevin/Downloads/hcatalog-trunk/storage-handlers/hbase/src/java/org/apache/hcatalog/hbase/HBaseBaseOutputFormat.java:27: package org.apache.hadoop.hbase.client does not exist
    [javac] import org.apache.hadoop.hbase.client.Put;
    [javac]                                      ^
解决方法:
Copy Hive/lib目录下的hbase-0.92.0.jar至/home/kevin/Downloads/hcatalog-trunk/storage-handlers/hbase/build/lib/compile目录下,重新执行编译。
异常4:
compile-src:
    [javac] Compiling 35 source files to /home/kevin/Downloads/hcatalog-trunk/storage-handlers/hbase/build/classes
    [javac] /home/kevin/Downloads/hcatalog-trunk/storage-handlers/hbase/src/java/org/apache/hcatalog/hbase/HBaseHCatStorageHandler.java:73: package com.facebook.fb303 does not exist
    [javac] import com.facebook.fb303.FacebookBase;
    [javac]                          ^
解决方法:
Copy Hive/lib目录下的libfb303-0.9.0.jar至/home/kevin/Downloads/hcatalog-trunk/storage-handlers/hbase/build/lib/compile目录下,重新执行编译。

2. 安装Thrift

   (1) 下载thrift-0.9.0.tar.gz, 并解压至:/home/kevin/Downloads/Software/thrift-0.9.0

   (2) 安装Thrift需要的以来包: 使用root权限

   yum install automake libtool flex bison gcc-c++ boost-devel libevent-devel zlib-devel python-devel ruby-devel autoconf crypto-utils pkgconfig openssl-devel

   (3) 安装Thrift:使用root权限

   cd /home/kevin/Downloads/Software/thrift-0.9.0

   ./configure --with-boost=/usr/local  //指定Thrift安装路径
   make

   make check  //检测过程,可不执行
   make install

   (4) 测试Thrift安装

   终端下执行:thrift, 若输出下面的提示,则证明Thrift安装成功:

   [kevin@linux-fdc ~]$ thrift
   Usage: thrift [options] file

   Use thrift -help for a list of options
   [kevin@linux-fdc ~]$

3. 安装MySQL并创建hive账户

   该部分参考:Hive安装配置学习笔记1和2部分:http://springsfeng.iteye.com/blog/1734517
4. 安装Hcatalog:

   在正式安装HCatalog前,建议先安装好Thrift, MySQL, Hive和Pig, 请分别参考:

   http://springsfeng.iteye.com/admin/blogs/1734517

   http://springsfeng.iteye.com/admin/blogs/1734556

  
   (1) 解压hcatalog-0.6.0-SNAPSHOT.tar.gz至:/home/kevin/Downloads/hcatalog-0.6.0
   (2) 安装

   /usr/custom/hcatalog-0.6.0/share/hcatalog/scripts

   [kevin@linux-fdc scripts]$ chmod 777 *
   cd /home/kevin/Downloads/hcatalog-0.6.0
   [kevin@linux-fdc hcatalog-0.6.0-SNAPSHOT]$ share/hcatalog/scripts/hcat_server_install.sh -r /usr/custom/hcatalog-0.6.0 -d /usr/custom/hive-0.11.0/lib -h /usr/custom/hadoop-2.0.2-alpha -p 8888
   说明:

         上述命令应写成一行,此处换行时为了方便阅读。

         -r /usr/custom/hcatalog-0.6.0 指定HCatalog将要安装的目,请先建立该目录
         -d /usr/custom/hive-0.11.0/lib 指定mysql-connector-java-5.1.22-bin.jar的路径
         -h /usr/custom/hadoop-2.0.2-alpha 指定Hadoop的安装主目录
         -p 8888指定HCatalog运行的端口
    (3) 启动服务
    cd /usr/custom/hcatalog-0.6.0
    sbin/hcat_server.sh start
    (4) 关闭服务
    cd /usr/custom/hcatalog-0.6.0
    sbin/hcat_server.sh stop
    (5) 查看日志
    进入/usr/custom/hcatalog-0.6.0/var/log目录查看。

5. HCatalog环境变量设置

    此处设置的环境变量时为了便于运行官方文件中的例子:

    http://incubator.apache.org/hcatalog/docs/r0.5.0/loadstore.html

    http://incubator.apache.org/hcatalog/docs/r0.5.0/inputoutput.html

    环境变量的完整文件请参考附件:profile.txt.   

JAVA_HOME=/usr/custom/jdk1.6.0_43
ANT_HOME=/usr/custom/apache-ant-1.8.4
M2_HOME=/usr/custom/apache-maven-3.0.5
HADOOP_HOME=/usr/custom/hadoop-2.0.2-alpha
PIG_HOME=/usr/custom/pig-0.11.0
HIVE_HOME=/usr/custom/hive-0.11.0
HCAT_HOME=/usr/custom/hcatalog-0.6.0

HADOOP_MAPRED_HOME=$HADOOP_HOME  
HADOOP_COMMON_HOME=$HADOOP_HOME  
HADOOP_HDFS_HOME=$HADOOP_HOME  
YARN_HOME=$HADOOP_HOME  
HADOOP_LIB=$HADOOP_HOME/lib  
HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop  
YARN_CONF_DIR=$HADOOP_HOME/etc/hadoop

PATH=$PATH:$M2_HOME/bin:$JAVA_HOME/bin:$PIG_HOME/bin:$ANT_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$HIVE_HOME/bin:$HCAT_HOME/bin

CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar:$JAVA_HOME/jre/lib/rt.jar


PIG_OPTS=-Dhive.metastore.uris=thrift://linux-fdc.linux.com:8888


LIB_JARS=$HCAT_HOME/share/hcatalog/hcatalog-core-0.6.0-SNAPSHOT.jar,$HCAT_HOME/share/hcatalog/hcatalog-pig-adapter-0.6.0-SNAPSHOT.jar,$HCAT_HOME/share/hcatalog/hcatalog-server-extensions-0.6.0-SNAPSHOT.jar,$HIVE_HOME/lib/hive-metastore-0.11.0-SNAPSHOT.jar,$HIVE_HOME/lib/libthrift-0.9.0.jar,$HIVE_HOME/lib/hive-exec-0.11.0-SNAPSHOT.jar,$HIVE_HOME/lib/libfb303-0.9.0.jar,$HIVE_HOME/lib/jdo2-api-2.3-ec.jar,$HIVE_HOME/lib/slf4j-api-1.6.1.jar,$HIVE_HOME/lib/datanucleus-core-2.0.3.jar,$HIVE_HOME/lib/datanucleus-rdbms-2.0.3.jar,$HIVE_HOME/lib/datanucleus-connectionpool-2.0.3.jar,$HIVE_HOME/lib/commons-collections-3.2.1.jar,$HIVE_HOME/lib/commons-pool-1.5.4.jar,$HIVE_HOME/lib/commons-dbcp-1.4.jar,$HIVE_HOME/lib/mysql-connector-java-5.1.22-bin.jar
  
HADOOP_CLASSPATH=$HCAT_HOME/share/hcatalog/hcatalog-core-0.6.0-SNAPSHOT.jar:$HCAT_HOME/share/hcatalog/hcatalog-pig-adapter-0.6.0-SNAPSHOT.jar:$HCAT_HOME/share/hcatalog/hcatalog-server-extensions-0.6.0-SNAPSHOT.jar:$HIVE_HOME/lib/hive-metastore-0.11.0-SNAPSHOT.jar:$HIVE_HOME/lib/libthrift-0.9.0.jar:$HIVE_HOME/lib/hive-exec-0.11.0-SNAPSHOT.jar:$HIVE_HOME/lib/libfb303-0.9.0.jar:$HIVE_HOME/lib/jdo2-api-2.3-ec.jar:$HIVE_HOME/lib/slf4j-api-1.6.1.jar:$HIVE_HOME/lib/datanucleus-core-2.0.3.jar:$HIVE_HOME/lib/datanucleus-rdbms-2.0.3.jar:$HIVE_HOME/lib/datanucleus-connectionpool-2.0.3.jar:$HIVE_HOME/lib/commons-collections-3.2.1.jar:$HIVE_HOME/lib/commons-pool-1.5.4.jar:$HIVE_HOME/lib/commons-dbcp-1.4.jar:$HIVE_HOME/lib/mysql-connector-java-5.1.22-bin.jar:$HIVE_HOME/conf:$HADOOP_HOME/etc/hadoop
  
PIG_CLASSPATH=$HCAT_HOME/share/hcatalog/hcatalog-core-0.6.0-SNAPSHOT.jar:$HCAT_HOME/share/hcatalog/hcatalog-pig-adapter-0.6.0-SNAPSHOT.jar:$HCAT_HOME/share/hcatalog/hcatalog-server-extensions-0.6.0-SNAPSHOT.jar:$HIVE_HOME/lib/hive-metastore-0.11.0-SNAPSHOT.jar:$HIVE_HOME/lib/libthrift-0.9.0.jar:$HIVE_HOME/lib/hive-exec-0.11.0-SNAPSHOT.jar:$HIVE_HOME/lib/libfb303-0.9.0.jar:$HIVE_HOME/lib/jdo2-api-2.3-ec.jar:$HIVE_HOME/lib/slf4j-api-1.6.1.jar:$HIVE_HOME/conf:$HADOOP_HOME/etc/hadoop


export JAVA_HOME M2_HOME PATH CLASSPATH LD_LIBRARY_PATH HADOOP_HOME PIG_HOME HADOOP_MAPRED_HOME HADOOP_COMMON_HOME HADOOP_HDFS_HOME YARN_HOME HADOOP_LIB HADOOP_CONF_DIR YARN_CONF_DIR ANT_HOME HIVE_HOME HCAT_HOME LIB_JARS HADOOP_CLASSPATH PIG_CLASSPATH

    说明:官方文档中需要依赖的JAR包不全,请参考我的配置进行操作。

6. Input/Output Interface( 基于Hive)例子运行步骤

    (1) 启动一下进程:

    mysql, hadoop, hive, pig, hcatalog

    (2) 创建age和age_out表:

   CREATE TABLE age (name STRING, birthday INT)
   ROW FORMAT DELIMITED
   FIELDS TERMINATED BY '\t'
   LINES TERMINATED BY '\n'
   STORED AS TEXTFILE;
    
   CREATE TABLE age_out (birthday INT, birthday_count INT)
   ROW FORMAT DELIMITED
   FIELDS TERMINATED BY '\t'
   LINES TERMINATED BY '\n'
   STORED AS TEXTFILE;  

    (3) Load数据至age表

    hive> LOAD DATA LOCAL INPATH '/home/kevin/Documents/age.txt' OVERWRITE INTO TABLE age;

    (4) 编写MapReduce程序    

package com.company.example.hcatalog;

import java.io.IOException;
import java.util.ArrayList;
import java.util.Iterator;
import java.util.List;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.conf.Configured;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.WritableComparable;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.util.GenericOptionsParser;
import org.apache.hadoop.util.Tool;
import org.apache.hadoop.util.ToolRunner;
import org.apache.hcatalog.common.HCatConstants;
import org.apache.hcatalog.common.HCatUtil;
import org.apache.hcatalog.data.DefaultHCatRecord;
import org.apache.hcatalog.data.HCatRecord;
import org.apache.hcatalog.data.schema.HCatFieldSchema;
import org.apache.hcatalog.data.schema.HCatSchema;
import org.apache.hcatalog.mapreduce.HCatInputFormat;
import org.apache.hcatalog.mapreduce.HCatOutputFormat;
import org.apache.hcatalog.mapreduce.OutputJobInfo;

public class GroupByBirthday extends Configured implements Tool {

	public static class Map extends Mapper<WritableComparable, HCatRecord, IntWritable, IntWritable> {

		Integer birthday;
		HCatSchema schema;

		@Override
		protected void map(WritableComparable key, HCatRecord value, Context context) throws IOException, InterruptedException {
			//Get our schema from the Job object.
			schema = HCatInputFormat.getTableSchema(context.getConfiguration());
			//Read the user field.
			birthday = value.getInteger("birthday",schema);
			context.write(new IntWritable(birthday), new IntWritable(1));
		}
	}
	
	public static class Reduce extends Reducer<IntWritable, IntWritable, WritableComparable, HCatRecord> {

		@Override
		protected void reduce(IntWritable key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException {
			
			List<HCatFieldSchema> columns = new ArrayList<HCatFieldSchema>(2);
			columns.add(new HCatFieldSchema("birthday", HCatFieldSchema.Type.INT, ""));
			columns.add(new HCatFieldSchema("birthday_count", HCatFieldSchema.Type.INT, ""));
			HCatSchema schema = new HCatSchema(columns);
			
			int sum = 0;
			Iterator<IntWritable> iter = values.iterator();
			while (iter.hasNext()) {
				sum++;
				iter.next().get();
			}
			HCatRecord record = new DefaultHCatRecord(2);
			record.set("birthday", schema, key.get());
			record.set("birthday_count", schema, sum);

			context.write(null, record);
		}
	}

	public int run(String[] args) throws Exception {
		Configuration conf = getConf();
		args = new GenericOptionsParser(conf, args).getRemainingArgs();

		String inputTableName = args[0];
		String outputTableName = args[1];

		Job job = new Job(conf, "GroupByAge");
		//Read the inputTableName table, partition null, initialize the default database.
		HCatInputFormat.setInput(job, null, inputTableName);

		// Initialize HCatoutputFormat
		job.setInputFormatClass(HCatInputFormat.class);
		job.setJarByClass(GroupByBirthday.class);
		job.setMapperClass(Map.class);
		job.setReducerClass(Reduce.class);
		
		job.setMapOutputKeyClass(IntWritable.class);
		job.setMapOutputValueClass(IntWritable.class);
		job.setOutputKeyClass(WritableComparable.class);
		job.setOutputValueClass(DefaultHCatRecord.class);
		
		String inputJobString = job.getConfiguration().get(HCatConstants.HCAT_KEY_JOB_INFO);
        job.getConfiguration().set(HCatConstants.HCAT_KEY_JOB_INFO, inputJobString);
		
		//Write into outputTableName table, partition null, initialize the default database.
        OutputJobInfo outputJobInfo = OutputJobInfo.create(null, outputTableName, null);
        job.getConfiguration().set(HCatConstants.HCAT_KEY_OUTPUT_INFO, HCatUtil.serialize(outputJobInfo));
		HCatOutputFormat.setOutput(job, outputJobInfo);
		
		HCatSchema s = HCatOutputFormat.getTableSchema(job.getConfiguration());
		HCatOutputFormat.setSchema(job, s);
		
		System.out.println("INFO: Output scheme explicity set for writing:" + s);
		
		job.setOutputFormatClass(HCatOutputFormat.class);

		return (job.waitForCompletion(true) ? 0 : 1);
	}
	
	public static void main(String[] args) throws Exception {
		int exitcode = ToolRunner.run(new GroupByBirthday(), args);
		System.exit(exitcode);
	}
}

    (5) 运行示例

    Eclipse中运行:   

 

    bin/hadoop --config $HADOOP_HOME/etc/hadoop jar /home/kevin/workspace-eclipse/example-hadoop.jar com.example.hcatalog.GroupByBirthday -libjars $LIB_JARS age age_out

    使用Maven进行项目管理时:

    bin/hadoop --config $HADOOP_HOME/etc/hadoop jar /home/kevin/workspace-eclipse/example-hadoop/target/example-hadoop-0.0.1-SNAPSHOT.jar -libjars $LIB_JARS age age_out

    example-hadoop.jar 见附件。
7.  Load/Store Interface (基于Pig)例子运行步骤

    (1) 创建表, 在hive命令下:

   CREATE TABLE student (name STRING, age INT)
   ROW FORMAT DELIMITED
   FIELDS TERMINATED BY '\t'
   LINES TERMINATED BY '\n'
   STORED AS TEXTFILE; 

    (2) 准备数据:student.txt:

张三	21
张思	22
王五	23
王六	22
李四	22
小三	21
小四	22

    (3)  Pig启动
    bin/pig -Dpig.additional.jars=$HCAT_HOME/share/hcatalog/hcatalog-core-0.6.0-SNAPSHOT.jar:$HCAT_HOME/share/hcatalog/hcatalog-pig-adapter-0.6.0-SNAPSHOT.jar:$HCAT_HOME/share/hcatalog/hcatalog-server-extensions-0.6.0-SNAPSHOT.jar:$HIVE_HOME/lib/hive-metastore-0.10.0.jar:$HIVE_HOME/lib/libthrift-0.9.0.jar:$HIVE_HOME/lib/hive-exec-0.10.0.jar:$HIVE_HOME/lib/libfb303-0.9.0.jar:$HIVE_HOME/lib/jdo2-api-2.3-ec.jar:$HIVE_HOME/lib/slf4j-api-1.6.1.jar
    (4) 存储数据
   grunt>student = LOAD '/input/student.txt' AS (name:chararray, age:int);    //"/input/student.txt"文件为HDFS中的文件
   grunt>lmt = LIMIT student 4;
   grunt>DUMP lmt;
   grunt>ILLUSTRATE lmt; --illustrate lmt;
   grunt>STORE student INTO 'student' USING org.apache.hcatalog.pig.HCatStorer();

    (5) 加载数据

   grunt> temp = LOAD 'student' USING org.apache.hcatalog.pig.HCatLoader();

   grunt> DUMP temp;

  • 大小: 90.4 KB
分享到:
评论
2 楼 olived 2014-01-06  
对pig的支持还是能看懂的,就是不知道支持hbase不,这个源代码安装也麻烦了点。
1 楼 olived 2014-01-06  
楼主的这个貌似比hive内嵌的版本hcatalog高一个版本,具体区别在哪里?看着安装起来好麻烦的。

相关推荐

Global site tag (gtag.js) - Google Analytics