Posts

Showing posts from May, 2022

Shell encrypt and decrypt

   #!/bin/sh       # path to virtualenv cd /home/srvc_nextgen_hitg/cp360 source cp360venv/bin/activate echo "Virtualenv started" cd /home/cp360/python_scripts #Read python code export AES_SECRET_KEY="K{;5%A5yHL&^efe-" export apiacc="yen.why.saw-99" encrypt_pswd=`python - <<END import os from AES import AES_ENCRYPT aes_obj = AES_ENCRYPT() encpt = aes_obj.encrypt(os.environ['apiacc'],os.environ['AES_SECRET_KEY']) print(encpt) END`  echo "------encrypt_pswd------------" echo $encrypt_pswd echo "-------------------------------" export encrypt_pswd=$encrypt_pswd decrypt_pswd=`python - <<END import os from AES import AES_ENCRYPT aes_obj = AES_ENCRYPT() dencpt = aes_obj.decrypt(os.environ['encrypt_pswd'],os.environ['AES_SECRET_KEY']) print(dencpt) END`  echo "------decrypt_pswd------------" echo $decrypt_pswd echo "-------------------------------" #############################...

spark union issue

 while union we will face issue in mix-match records var colList = propObject.getProperty("colList").split(",").map(_.trim) or 1. create df using csv val df = spark.read.option("header", "true").option("delimiter", "|").option("inferSchema", "true").csv("*") 2. create collist val collist=df.columns collist: Array[String] = Array(BU, LEVEL, ranking) 3. select with head and tail method  val fin=df.select(collist.head,collist.tail:_*).distinct fin: org.apache.spark.sql.Dataset[org.apache.spark.sql.Row] = [BU: string, LEVEL: string  4.getting head and tail val fin=df.select(colList.head,colList.tail:_*).distinct ########################################## val df2 = spark.read.option("header", "true").option("delimiter", "|").option("inferSchema", "true").csv("*") val collist=df2.columns  val fin=df.select(collist.head,collist.tail:_*...

capacity-scheduler-in-yarn-hadoop

Capacity Scheduler in YARN In the post YARN in Hadoop we have already seen that it is the scheduler component of the ResourceManager which is responsible for allocating resources to the running jobs. The scheduler component is pluggable in Hadoop and there are two options capacity scheduler and fair scheduler . This post talks about the capacity scheduler in YARN, its benefits and how capacity scheduler can be configured in Hadoop cluster. Capacity scheduler Capacity scheduler in YARN allows multi-tenancy of the Hadoop cluster where multiple users can share the large cluster. Every organization having their own private cluster leads to a poor resource utilization. An organization may provide enough resources in the cluster to meet their peak demand but that peak demand may not occur that frequently, resulting in poor resource utilization at rest of the time. Thus sharing cluster among organizations is a more cost effective idea. However, organizations are concerned ...

cdc

 select  CONCAT(TRIM(ASSET.asset_number),0),asset.data_sou) AS cdc_key ,CONCAT(nvl(TRIM(asset.asset_number),'0'),asset.data_sou) AS dat_sou ,md5(concat(nvl(asset.asset_number,'0'),nvl(asset.dat_sou,'0'),nvl(asset.coln,'0') AS cdc_hash ,asset.* from ( select distinct CONCAT(trim(cus_nu),'-'), cast(NULL as string) as vendor_id, ACUR.*, 'N' ISDELETED from INERMIDTE.ACUR )ASSET ===================================== WITH ADDTION AS ( SELECT A.*  ,current_timestamp AS Date_created1, ,current_timestamp AS Date_updated1, ,'N' AS ISDELTED1 FROM precdc.asset A left outer join processed.asset B ON (A.cdc_key = B.cdc_key)  WHERE B.data_sou='1002' AND  (B.cdc_hash IS Null) ) ,DELETION  AS ( SELECT B.* ,B.current_timestamp AS Date_created1, ,B.current_timestamp AS Date_updated1, ,'Y' AS ISDELTED1 FROM precdc.asset A rigt outer join processed.asset B ON (A.cdc_key = B.cdc_key)  WHERE B.data_sou='1002' AND  (B.cdc_hash IS ...