Posts

Shell encrypt and decrypt

   #!/bin/sh       # path to virtualenv cd /home/srvc_nextgen_hitg/cp360 source cp360venv/bin/activate echo "Virtualenv started" cd /home/cp360/python_scripts #Read python code export AES_SECRET_KEY="K{;5%A5yHL&^efe-" export apiacc="yen.why.saw-99" encrypt_pswd=`python - <<END import os from AES import AES_ENCRYPT aes_obj = AES_ENCRYPT() encpt = aes_obj.encrypt(os.environ['apiacc'],os.environ['AES_SECRET_KEY']) print(encpt) END`  echo "------encrypt_pswd------------" echo $encrypt_pswd echo "-------------------------------" export encrypt_pswd=$encrypt_pswd decrypt_pswd=`python - <<END import os from AES import AES_ENCRYPT aes_obj = AES_ENCRYPT() dencpt = aes_obj.decrypt(os.environ['encrypt_pswd'],os.environ['AES_SECRET_KEY']) print(dencpt) END`  echo "------decrypt_pswd------------" echo $decrypt_pswd echo "-------------------------------" #############################...

spark union issue

 while union we will face issue in mix-match records var colList = propObject.getProperty("colList").split(",").map(_.trim) or 1. create df using csv val df = spark.read.option("header", "true").option("delimiter", "|").option("inferSchema", "true").csv("*") 2. create collist val collist=df.columns collist: Array[String] = Array(BU, LEVEL, ranking) 3. select with head and tail method  val fin=df.select(collist.head,collist.tail:_*).distinct fin: org.apache.spark.sql.Dataset[org.apache.spark.sql.Row] = [BU: string, LEVEL: string  4.getting head and tail val fin=df.select(colList.head,colList.tail:_*).distinct ########################################## val df2 = spark.read.option("header", "true").option("delimiter", "|").option("inferSchema", "true").csv("*") val collist=df2.columns  val fin=df.select(collist.head,collist.tail:_*...

capacity-scheduler-in-yarn-hadoop

Capacity Scheduler in YARN In the post YARN in Hadoop we have already seen that it is the scheduler component of the ResourceManager which is responsible for allocating resources to the running jobs. The scheduler component is pluggable in Hadoop and there are two options capacity scheduler and fair scheduler . This post talks about the capacity scheduler in YARN, its benefits and how capacity scheduler can be configured in Hadoop cluster. Capacity scheduler Capacity scheduler in YARN allows multi-tenancy of the Hadoop cluster where multiple users can share the large cluster. Every organization having their own private cluster leads to a poor resource utilization. An organization may provide enough resources in the cluster to meet their peak demand but that peak demand may not occur that frequently, resulting in poor resource utilization at rest of the time. Thus sharing cluster among organizations is a more cost effective idea. However, organizations are concerned ...

cdc

 select  CONCAT(TRIM(ASSET.asset_number),0),asset.data_sou) AS cdc_key ,CONCAT(nvl(TRIM(asset.asset_number),'0'),asset.data_sou) AS dat_sou ,md5(concat(nvl(asset.asset_number,'0'),nvl(asset.dat_sou,'0'),nvl(asset.coln,'0') AS cdc_hash ,asset.* from ( select distinct CONCAT(trim(cus_nu),'-'), cast(NULL as string) as vendor_id, ACUR.*, 'N' ISDELETED from INERMIDTE.ACUR )ASSET ===================================== WITH ADDTION AS ( SELECT A.*  ,current_timestamp AS Date_created1, ,current_timestamp AS Date_updated1, ,'N' AS ISDELTED1 FROM precdc.asset A left outer join processed.asset B ON (A.cdc_key = B.cdc_key)  WHERE B.data_sou='1002' AND  (B.cdc_hash IS Null) ) ,DELETION  AS ( SELECT B.* ,B.current_timestamp AS Date_created1, ,B.current_timestamp AS Date_updated1, ,'Y' AS ISDELTED1 FROM precdc.asset A rigt outer join processed.asset B ON (A.cdc_key = B.cdc_key)  WHERE B.data_sou='1002' AND  (B.cdc_hash IS ...

hive sqoop increment

create database ccdm_mstr; use ccdm_mstr; create table clm_cvs_fact( clm_cvs_fact_ket varchar(30), dw_cret_aud_key int(25), dw_updt_aud_key int(30)); insert into clm_cvs_fact (clm_cvs_fact_ket, dw_cret_aud_key, dw_updt_aud_key) values ('123456700','12347771', '12347771'); insert into clm_cvs_fact (clm_cvs_fact_ket, dw_cret_aud_key, dw_updt_aud_key) values ('123456701','12347772', '12347772');  select * from clm_cvs_fact;    output: +------------------+-----------------+-----------------+ | clm_cvs_fact_ket | dw_cret_aud_key | dw_updt_aud_key | +------------------+-----------------+-----------------+ | 123456700        |        12347771 | 12347771        | | 123456701        |        12347772 | 12347772        | +------------------+-----------------+-----------------+ [cloudera@quickstart ~]$ sqoop import --connect jdbc:mysql://localhost/ccdm_mstr --usern...

incrementally update

Incrementally update an imported table In  CDP Private Cloud Base , updating imported tables involves importing incremental changes made to the original table using Sqoop and then merging changes with the tables imported into Hive. After ingesting data from an operational database to Hive, you usually need to set up a process for periodically synchronizing the imported table with the operational database table. The base table is a Hive-managed table that was created during the first data ingestion. Incrementally updating Hive tables from operational database systems involves merging the base table and change records to reflect the latest record set. You create the incremental table as a Hive external table, typically from CSV data in HDFS, to store the change records. This external table contains the changes (INSERTs and UPDATEs) from the operational database since the last data ingestion. Generally, the table is partitioned and only the latest partition is updated, making this pro...

sed, awk, grep

1. Print 1-6 range of lines with p cat demo.txt | sed -n '1, 6p' 2. Print Except  4,9 lines with d cat demo.txt | sed -n '6, 10d' 3. Print all non consecutive lines cat demo.txt | sed -n -e '6, 10p' -e '10, 13p' 3.replace string golbal cat demo.txt | sed "s/oldword/newword/g" Igonre character case cat demo.txt | sed "s/oldword/newword/gi" 4. Replace blank space cat demo.txt | sed -n 's/ */ /g' 5. Replace  in b/w 4,9 lines  cat demo.txt | sed -n '10,15 s/oldwrd/new/g' 6. Delete 4,9 lines with d cat demo.txt | sed -n '6, 10d' 7. Other 4,9 Delete another lines with d & n cat demo.txt | sed -n '6, 10!d' Example :1) Displaying partial text of a file With sed, we can view only some part of a file rather than seeing whole file. To see some lines of the file, use the following command, [linuxtechi@localhost ~]$ sed -n 22,29p testfile.txt here, option ‘n’ suppresses printing of whole file & option ‘p’ wi...