mac osx单机上安装apache-kylin   2017-03-03


最近开始调研apache-kylin,关于kylin的介绍往上有很多,本文就不再过多介绍,只讲述如何在mac osx配置测试用的kylin环境。
需要安装及部署的软件包括:hadoop,mysql,hive,zookeeper,hbase,kylin。本篇主要讲述如下版本的安装工作:

  • hadoop-2.7.3
  • mysql-5.6.31
  • hive-2.1.1
  • zookeeper-3.4.8
  • hbase-1.3.0
  • kylin-1.6.0-hbase1.x

安装hadoop-2.7.3

从官网下载对应的二进制压缩包并解压

1
2
3
cd /Users/ruifengshan/soft
wget http://apache.mirrors.lucidnetworks.net/hadoop/common/hadoop-2.7.3/hadoop-2.7.3.tar.gz
tar -xf hadoop-2.7.3.tar.gz

进入etc/hadoop目录下进行相关配置,主要包括
hdfs-site.xml

1
2
3
4
5
6
7
8
9
10
11
12
13
14
<configuration>
<property>
<name>dfs.name.dir</name>
<value>/Users/ruifengshan/hadoop/datalog</value>
</property>
<property>
<name>dfs.data.dir</name>
<value>/Users/ruifengshan/hadoop/data</value>
</property>
<property>
<name>dfs.replication</name>
<value>3</value>
</property>
</configuration>

mapred-site.xml

1
2
3
4
5
6
7
8
9
10
<configuration>
<property>
<name>mapred.job.tracker</name>
<value>localhost:9001</value>
</property>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>

yarn-site.xml

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
<configuration>
<property>
  <name>yarn.resourcemanager.address</name>
  <value>localhost:8032</value>
</property>
<property>
  <name>yarn.resourcemanager.scheduler.address</name>
  <value>localhost:8030</value>
</property>
<property>
  <name>yarn.resourcemanager.resource-tracker.address</name>
  <value>localhost:8031</value>
</property>
<property>
  <name>yarn.resourcemanager.admin.address</name>
  <value>localhost:8033</value>
</property>
<property>
  <name>yarn.resourcemanager.webapp.address</name>
  <value>localhost:8088</value>
</property>
<property>
  <name>yarn.nodemanager.address</name>
  <value>localhost:8034</value>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>

</configuration>

core-site.xml

1
2
3
4
5
6
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:9000</value>
</property>
</configuration>

然后进入sbin目录,启动hadoop:

1
2
cd /Users/ruifengshan/soft/hadoop-2.7.3/sbin
./start-all.sh

hadoop需要启动日志服务

1
sbin/mr-jobhistory-daemon.sh start historyserver

可以查看如下几个页面,看是否正常:

http://localhost:50070/

http://localhost:8088/

安装mysql-5.6.31

可以参考文章mac-osx安装mysql进行安装。

安装hive-2.1.1

下载hive:

1
2
3
cd /Users/ruifengshan/soft
wget http://mirror.symnds.com/software/Apache/hive/hive-2.1.1/apache-hive-2.1.1-bin.tar.gz
tar -xf apache-hive-2.1.1-bin.tar.gz

配置hive:

1
2
cd /Users/ruifengshan/soft/apache-hive-2.1.1-bin/conf
cp hive-default.xml.template hive-site.xml

在hive-site.xml文件中配置mysql作为元数据库,需修改以下内容:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:mysql://localhost:3306/hive?createDatabaseIfNotExist=true</value>
<description>JDBC connect string for a JDBC metastore</description>
</property>

<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>com.mysql.jdbc.Driver</value>
<description>Driver class name for a JDBC metastore</description>
</property>

<property>
<name>javax.jdo.option.ConnectionUserName</name>
<value>hive</value>
<description>username to use against metastore database</description>
</property>

<property>
<name>javax.jdo.option.ConnectionPassword</name>
<value>hive</value>
<description>password to use against metastore database</description>
</property>

元数据库在mysql中的database为hive,Driver为com.mysql.jdbc.Driver,用户名与密码均为hive;需在mysql中配置Hive元数据库:

1
mysql -u root -p

创建hive用户,并赋予访问hive数据库的权限

1
2
3
mysql> CREATE DATABASE hive;
mysql> GRANT ALL PRIVILEGES ON hive.* TO 'hive'@'localhost' IDENTIFIED BY 'hive';
mysql> FLUSH PRIVILEGES;

此外还需要添加mysql-jdbc-jar包:
https://dev.mysql.com/downloads/connector/j/ 页面下载压缩包,然后移动到hive的lib目录下。如下:

1
2
tar xvzf mysql-connector-java-5.1.41.tar.gz
mv mysql-connector-java-5.1.41/mysql-connector-java-5.1.41-bin.jar /Users/ruifengshan/soft/apache-hive-2.1.1-bin/lib

创建hive元数据库

1
2
cd /Users/ruifengshan/soft/apache-hive-2.1.1-bin
bin/schematool -dbType mysql -initSchema

Kylin用 HCatalog 读取Hive表的,而HCatalog用属性 hive.metastore.uris创建HiveMetaStoreClient得到元信息。因此,我们还需修改hive-site.xml:

1
2
3
4
5
<property>
<name>hive.metastore.uris</name>
<value>thrift://localhost:9083</value>
<description>Thrift URI for the remote metastore. Used by metastore client to connect to remote metastore.</description>
</property>

开启metastore 服务:

1
bin/hive --service metastore -p 9083 &

此时就可以正常执行hive操作。

安装zookeeper-3.4.8

到zookeeper官网下载zookeeper并解压缩,然后到conf目录下修改配置文件:

1
2
3
4
/Users/ruifengshan/soft
tar -xf zookeeper-3.4.8.tar.gz
cd /Users/ruifengshan/soft/zookeeper-3.4.8
cp zoo_sample.cfg zoo.cfg

修改zoo.cfg

1
2
3
4
5
tickTime=2000
initLimit=10
syncLimit=5
dataDir=/Users/ruifengshan/zookeeper
clientPort=2181

执行如下命令启动zookeeper服务:

1
2
cd /Users/ruifengshan/soft/zookeeper-3.4.8
bin/zkServer.sh start

检查是否正常:

1
2
3
4
$ bin/zkServer.sh status 
ZooKeeper JMX enabled by default
Using config: /Users/ruifengshan/soft/zookeeper-3.4.8/bin/../conf/zoo.cfg
Mode: standalone

安装hbase-1.3.0

1
2
3
cd /Users/ruifengshan/soft
wget http://apache.osuosl.org/hbase/1.3.0/hbase-1.3.0-bin.tar.gz
tar -xf hbase-1.3.0-bin.tar.gz

修改conf下的hbase-site.xml

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
<configuration>
<property>
<name>hbase.rootdir</name>
<value>hdfs://localhost:9000/hbase</value>
</property>

<property>
<name>hbase.master</name>
<value>hdfs://localhost:60000</value>
</property>

<property>
<name>hbase.cluster.distributed</name>
<value>true</value>
</property>

<property>
<name>hbase.zookeeper.property.clientPort</name>
<value>2181</value>
</property>

<property>
<name>hbase.zookeeper.quorum</name>
<value>localhost</value>
</property>
<property>
<name>hbase.zookeeper.property.dataDir</name>
<value>/Users/ruifengshan/zookeeper</value>
</property>

<property>
<name>hbase.client.scanner.caching</name>
<value>200</value>
</property>

<property>
<name>hbase.balancer.period</name>
<value>300000</value>
</property>

<property>
<name>hbase.client.write.buffer</name>
<value>10485760</value>
</property>

<property>
<name>hbase.hregion.majorcompaction</name>
<value>7200000</value>
</property>

<property>
<name>hbase.hregion.max.filesize</name>
<value>67108864</value>
<description></description>
</property>
<property>
<name>hbase.hregion.memstore.flush.size</name>
<value>1048576</value>
<description></description>
</property>

<property>
<name>hbase.server.thread.wakefrequency</name>
<value>30000</value>
<description></description>
</property>

启动hbase:

1
2
cd /Users/ruifengshan/soft/hbase-1.3.0
bin/start-hbase.sh

查看如下页面看是否正常:

http://localhost:16010/master-status

安装kylin-1.6.0-hbase1.x

下载并解压

1
2
3
cd /Users/ruifengshan/soft
wget http://www.apache.org/dyn/closer.cgi/kylin/apache-kylin-1.6.0/apache-kylin-1.6.0-hbase1.x-bin.tar.gz
tar -xf apache-kylin-1.6.0-hbase1.x-bin.tar.gz

我用的zsh,所以修改~/.zshrc。如果你用的bash,可以修改~/.bashrc,添加如下内容。

1
2
3
4
5
6
export HADOOP_HOME=/Users/ruifengshan/soft/hadoop-2.7.3
export HBASE_HOME=/Users/ruifengshan/soft/hbase-1.3.0
export KYLIN_HOME=/Users/ruifengshan/soft/apache-kylin-1.6.0-hbase1.x-bin
export HIVE_HOME=/Users/ruifengshan/soft/apache-hive-2.1.1-bin
export HCAT_HOME=$HIVE_HOME/hcatalog
export PATH=$PATH:$HBASE_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$HIVE_HOME/bin:$HCAT_HOME/bin

然后执行

1
source ~/.zshrc

让配置生效。

用脚本检查环境是否准备好

1
2
cd /Users/ruifengshan/soft/apache-kylin-1.6.0-hbase1.x-bin
bin/check-env.sh

此处mac有坑,因为检查脚本使用的是gnu的一些命令,和mac上的命令不同。所以需要安装gnu命令:

1
2
brew install gnu-sed
brew install findutils

具体的解决方法请参考下面的安装中遇到的问题部分。

然后启动kylin:

1
2
cd /Users/ruifengshan/soft/apache-kylin-1.6.0-hbase1.x-bin
bin/kylin.sh start

http://localhost:7077/kylin

运行简单例子请参考官方文档:http://kylin.apache.org/docs/tutorial/kylin_sample.html

安装中遇到的问题

1.LocalDirsHandlerService ERROR

1
2
3
2017-03-03 14:32:09,695 WARN org.apache.hadoop.yarn.server.nodemanager.DirectoryCollection: Directory /Users/ruifengshan/soft/hadoop-2.7.3/logs/userlogs error
, used space above threshold of 90.0%, removing from list of valid directories
2017-03-03 14:32:09,696 ERROR org.apache.hadoop.yarn.server.nodemanager.LocalDirsHandlerService: Most of the disks failed. 1/1 local-dirs are bad: /tmp/hadoop-ruifengshan/nm-local-dir; 1/1 log-dirs are bad: /Users/ruifengshan/soft/hadoop-2.7.3/logs/userlogs

这是因为磁盘占用空间超过了90%,可以在yarn-site.xml里面配置

1
2
3
4
<property>
<name>yarn.nodemanager.disk-health-checker.max-disk-utilization-per-disk-percentage</name>
<value>95.0</value>
</property>

或者清除一下磁盘目录,重启hadoop服务即可。

2.无法导入示例数据

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
2017-02-27 18:53:55,845 ERROR [http-bio-7070-exec-1] cube.CubeManager:947 : Error during load cube instance, skipping : /cube/kylin_sales_cube.json
java.lang.IllegalStateException: Failed to init CubeDescManager from kylin_metadata@hbase
at org.apache.kylin.cube.CubeDescManager.getInstance(CubeDescManager.java:78)
at org.apache.kylin.cube.CubeManager.reloadCubeLocalAt(CubeManager.java:922)
at org.apache.kylin.cube.CubeManager.loadAllCubeInstance(CubeManager.java:900)
at org.apache.kylin.cube.CubeManager.<init>(CubeManager.java:141)
at org.apache.kylin.cube.CubeManager.getInstance(CubeManager.java:105)
at org.apache.kylin.rest.service.BasicService.getCubeManager(BasicService.java:68)
at org.apache.kylin.rest.service.CubeService.listAllCubes(CubeService.java:100)
at org.apache.kylin.rest.service.CubeService$$FastClassBySpringCGLIB$$17a07c0e.invoke(<generated>)
at org.springframework.cglib.proxy.MethodProxy.invoke(MethodProxy.java:204)
at org.springframework.aop.framework.CglibAopProxy$CglibMethodInvocation.invokeJoinpoint(CglibAopProxy.java:700)
at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:150)
at org.springframework.security.access.intercept.aopalliance.MethodSecurityInterceptor.invoke(MethodSecurityInterceptor.java:64)
at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:172)
at org.springframework.aop.framework.CglibAopProxy$DynamicAdvisedInterceptor.intercept(CglibAopProxy.java:633)
at org.apache.kylin.rest.service.CubeService$$EnhancerBySpringCGLIB$$4612bb79.listAllCubes(<generated>)
at org.apache.kylin.rest.controller.CubeController.getCubes(CubeController.java:97)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.springframework.web.method.support.InvocableHandlerMethod.doInvoke(InvocableHandlerMethod.java:221)
at org.springframework.web.method.support.InvocableHandlerMethod.invokeForRequest(InvocableHandlerMethod.java:136)
at org.springframework.web.servlet.mvc.method.annotation.ServletInvocableHandlerMethod.invokeAndHandle(ServletInvocableHandlerMethod.java:104)
at org.springframework.web.servlet.mvc.method.annotation.RequestMappingHandlerAdapter.invokeHandleMethod(RequestMappingHandlerAdapter.java:743)
at org.springframework.web.servlet.mvc.method.annotation.RequestMappingHandlerAdapter.handleInternal(RequestMappingHandlerAdapter.java:672)
at org.springframework.web.servlet.mvc.method.AbstractHandlerMethodAdapter.handle(AbstractHandlerMethodAdapter.java:82)
at org.springframework.web.servlet.DispatcherServlet.doDispatch(DispatcherServlet.java:933)
at org.springframework.web.servlet.DispatcherServlet.doService(DispatcherServlet.java:867)
at org.springframework.web.servlet.FrameworkServlet.processRequest(FrameworkServlet.java:951)
at org.springframework.web.servlet.FrameworkServlet.doGet(FrameworkServlet.java:842)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:624)
at org.springframework.web.servlet.FrameworkServlet.service(FrameworkServlet.java:827)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:731)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:303)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208)
at org.apache.tomcat.websocket.server.WsFilter.doFilter(WsFilter.java:52)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:241)
......
Caused by: com.fasterxml.jackson.core.JsonParseException: Unexpected character ('%' (code 37)): expected a valid value (number, String, array, object, 'true', 'false' or 'null')
at [Source: java.io.DataInputStream@1399d6b2; line: 196, column: 20]
at com.fasterxml.jackson.core.JsonParser._constructError(JsonParser.java:1581)
at com.fasterxml.jackson.core.base.ParserMinimalBase._reportError(ParserMinimalBase.java:533)
at com.fasterxml.jackson.core.base.ParserMinimalBase._reportUnexpectedChar(ParserMinimalBase.java:462)
at com.fasterxml.jackson.core.json.UTF8StreamJsonParser._handleUnexpectedValue(UTF8StreamJsonParser.java:2613)

这是因为默认的导入脚本

1
2
gsed -i "s/%default_storage_type%/${default_storage_type}/g" ${KYLIN_HOME}/sample_cube/metadata/cube_desc/kylin_sales_cube_desc.json
gsed -i "s/%default_engine_type%/${default_engine_type}/g" ${KYLIN_HOME}/sample_cube/metadata/cube_desc/kylin_sales_cube_desc.json

然后重新导入即可。

3.File does not exists:hive-exec-2.1.1.jar
和问题2一样,是因为mac上命令与linux不同导致,所以需要修改bin/find-hive-dependency.sh文件中的如下内容:

1
hive_lib=`gfind -L "$(dirname $hive_exec_path)" -name '*.jar' ! -name '*calcite*' -printf '%p:' | gsed 's/:$//'`

否则会因为找不到hive的lib而产生异常。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
java.io.FileNotFoundException: File does not exist: hdfs://localhost:9000/Users/ruifengshan/soft/apache-hive-2.1.1-bin/lib/hive-exec-2.1.1.jar
at org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1072)
at org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1064)
at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1064)
at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.getFileStatus(ClientDistributedCacheManager.java:288)
at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.getFileStatus(ClientDistributedCacheManager.java:224)
at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.determineTimestamps(ClientDistributedCacheManager.java:99)
at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.determineTimestampsAndCacheVisibilities(ClientDistributedCacheManager.java:57)
at org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:265)
at org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:301)
at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:389)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1285)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1282)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:1282)
at org.apache.kylin.engine.mr.common.AbstractHadoopJob.waitForCompletion(AbstractHadoopJob.java:149)
at org.apache.kylin.engine.mr.steps.FactDistinctColumnsJob.run(FactDistinctColumnsJob.java:108)
at org.apache.kylin.engine.mr.MRUtil.runMRJob(MRUtil.java:92)
at org.apache.kylin.engine.mr.common.MapReduceExecutable.doWork(MapReduceExecutable.java:120)
at org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:113)
at org.apache.kylin.job.execution.DefaultChainedExecutable.doWork(DefaultChainedExecutable.java:57)
at org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:113)
at org.apache.kylin.job.impl.threadpool.DefaultScheduler$JobRunner.run(DefaultScheduler.java:136)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)

然后重新启动kylin即可。


分享到:


  如果您觉得这篇文章对您的学习很有帮助, 请您也分享它, 让它能再次帮助到更多的需要学习的人. 您的支持将鼓励我继续创作 !
本文基于署名4.0国际许可协议发布,转载请保留本文署名和文章链接。 如您有任何授权方面的协商,请邮件联系我。

目录

  1. 1. 安装hadoop-2.7.3
  2. 2. 安装mysql-5.6.31
  3. 3. 安装hive-2.1.1
  4. 4. 安装zookeeper-3.4.8
  5. 5. 安装hbase-1.3.0
  6. 6. 安装kylin-1.6.0-hbase1.x
  7. 7. 安装中遇到的问题