We're in the process of replacing the VM image with docker images. Stay tuned!
For the ease of exploring the features of WhereHows, we prepared a VM image based on Cloudera Quick Start VM for CDH 5.4.x.
In this VM, we prepared some sample data on HDFS, and set up Azkaban and Oozie to run several sample jobs. And we run the WhereHows Service to collect metadata from these systems, so you can easily try WhereHows within minutes.
Steps to exploring the WhereHows application
-
Download and install the latest version of VMware Player, for example VMware Player 12.0.
-
Download the all partitions of image in WhereHows. (If you are in China, download from here WhereHows china link )
-
Extract by using
7za e WhereHows-VM.7z.001
-
Start VM and run the image. Check VM is running correctly following appendix
-
The WhereHows frontend service should auto start. Open a browser, type: http://localhost:9008
-
Have fun!
Steps to start the backend service
-
Start the WhereHows Backend Service:
cd ~/wherehows; ./runbackend;
-
There are already some jobs set up in VM. You can also add new jobs following the instruction in [Set Up New Metadata ETL Jobs](Set Up New Metadata ETL Jobs).
-
If you want to start the Teradata metadata ETL job, you need to first set up a Teradata VM.
a. Download the image of Teradata. (TDExpress15.00.02_Sles11_40GB is a good choice as it contains some sample data.)
b. Start a VM of Teradata following the Teradata instructions.
c. Add the Teradata configuration as Teradata ETL instruction. (We already have some sample configuration in VM, you need to change to your configuration).
d. Schedule the job through etl-job-schedule API.
e. After the job finished, check the result from UI.
- To test whether the backend service have run successfully, you can add some dataset to hdfs (in avro format) and/or run some pig/hive/mapreduce jobs that read/write to some datasets. Run correspond jobs again and check UI. Example of test backend lineage/execution metadata ETL:
- open azkaban : localhost:8081, login as azkaban, run the sample jobs of 'twitter' and 'retail'
- make sure the backend service is running :
cd ~/wherehows; ./runbackend
- activate the
AZKABAN_LINEAGE_METADATA_ETL
job andAZKABAN_EXECUTION_METADATA_ETL
job by API :
curl -H "Content-Type: application/json" -X PUT -d '{"wh_etl_job_name":"AZKABAN_LINEAGE_METADATA_ETL", "ref_id":1, "control":"activate"}' http://localhost:19001/etl/control
curl -H "Content-Type: application/json" -X PUT -d '{"wh_etl_job_name":"AZKABAN_EXECUTION_METADATA_ETL", "ref_id":1, "control":"activate"} 'http://localhost:19001/etl/control
The schedule will run the back end job in next check, you can also shut the backend service down and bring it back again so it will immediately run these jobs.
You can check the backend log by tail -f /var/tmp/wherehows/wherehows.log
. After you see these two jobs have been finished, refresh the wherehows page, you should be able to see the lineage.
Update to the newest version
As the project is under active development, there are always new changes and bug fix. We will periodically update the VM images, but you definitely don't want to download it again and again. Here is the simple steps to update WhereHows on your VM.
update backend models:
git pull
update to latest versioncd backend-service; gradle build; gradle dist
build backend modelsscp target/universal/backend-service-1.0-SNAPSHOT.zip cloudera@$VM_HOST:~/wherehows/
cp the zip file to your VM.- On your VM:
cd wherehows; rm -rf backend-service-1.0-SNAPSHOT; unzip backend-service-1.0-SNAPSHOT.zip
update to newest version kill -9 $current_wherehows_backend_pid; ./runbackend
restart the backend service.
update frontend models:
git pull
update to latest versioncd web; gradle build; gradle dist
build frontend modelsscp target/universal/wherehows-1.0-SNAPSHOT.zip cloudera@$VM_HOST:~/wherehows/
cp the zip file to your VM.- On your VM:
cd wherehows; rm -rf wherehows-1.0-SNAPSHOT; unzip wherehows-1.0-SNAPSHOT.zip
update to newest version kill -9 $current_wherehows_frontend_pid; ./runfrontend
restart the backend service.
Appendix: Check all service running on VM correctly
To make sure all the Hadoop services are running, refer to /root/bin/enable-all-hadoop-service.sh
chkconfig mysqld on
chkconfig hadoop-hdfs-namenode on
chkconfig hadoop-yarn-resourcemanager on
chkconfig hadoop-hdfs-secondarynamenode on
chkconfig hadoop-yarn-nodemanager on
chkconfig hadoop-hdfs-datanode on
chkconfig hadoop-mapreduce-historyserver on
chkconfig hue on
chkconfig oozie on
chkconfig hive-metastore on
chkconfig hive-server2 on
chkconfig zookeeper-server on
chkconfig azkaban on
chkconfig wherehows-frontend on
chkconfig cloudera-scm-agent off
chkconfig cloudera-scm-server off