Thursday, April 18, 2019

Reading Data From RDBMS Database using Apache Spark (with Python)

Reading Data From RDBMS Database using Apache Spark (with  Python)

 

#We need to add ojdbc6 to our your_spark_home_path/jars.
#We have used oracle default table HR.EMPLOYEES

from pyspark.sql import SparkSession
spark = SparkSession \
    .builder \
    .appName("Python Spark SQL basic example") \
    .getOrCreate()
   
empDF = spark.read \
    .format("jdbc") \
    .option("url", "jdbc:oracle:thin:SYSTEM/oracle@//localhost:1521/xe") \
    .option("dbtable", "HR.EMPLOYEES") \
    .option("user", "SYSTEM") \
    .option("password", "oracle") \
    .option("driver", "oracle.jdbc.driver.OracleDriver") \
    .load()
   
empDF.printSchema()
empDF.show()

#How to use join
   
query = "(select employee_id,first_name,Last_name from HR.EMPLOYEES, HR.departments where HR.EMPLOYEES.department_id = HR.departments.department_id) emp"   

empDF1 = spark.read \
    .format("jdbc") \
    .option("url", "jdbc:oracle:thin:SYSTEM/oracle@//localhost:1521/xe") \
    .option("dbtable", query) \
    .option("user", "SYSTEM") \
    .option("password", "oracle") \
    .option("driver", "oracle.jdbc.driver.OracleDriver") \
    .load()         
   
empDF1.printSchema()
empDF1.show()   






 Output:

19/04/19 08:15:09 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
19/04/19 08:15:11 WARN Utils: Service 'SparkUI' could not bind on port 4040. Attempting port 4041.
root
 |-- EMPLOYEE_ID: decimal(6,0) (nullable = true)
 |-- FIRST_NAME: string (nullable = true)
 |-- LAST_NAME: string (nullable = true)
 |-- EMAIL: string (nullable = true)
 |-- PHONE_NUMBER: string (nullable = true)
 |-- HIRE_DATE: timestamp (nullable = true)
 |-- JOB_ID: string (nullable = true)
 |-- SALARY: decimal(8,2) (nullable = true)
 |-- COMMISSION_PCT: decimal(2,2) (nullable = true)
 |-- MANAGER_ID: decimal(6,0) (nullable = true)
 |-- DEPARTMENT_ID: decimal(4,0) (nullable = true)

+-----------+-----------+----------+--------+------------+-------------------+----------+--------+--------------+----------+-------------+
|EMPLOYEE_ID| FIRST_NAME| LAST_NAME|   EMAIL|PHONE_NUMBER|          HIRE_DATE|    JOB_ID|  SALARY|COMMISSION_PCT|MANAGER_ID|DEPARTMENT_ID|
+-----------+-----------+----------+--------+------------+-------------------+----------+--------+--------------+----------+-------------+
|        100|     Steven|      King|   SKING|515.123.4567|2003-06-17 00:00:00|   AD_PRES|24000.00|          null|      null|           90|
|        101|      Neena|   Kochhar|NKOCHHAR|515.123.4568|2005-09-21 00:00:00|     AD_VP|17000.00|          null|       100|           90|
|        102|        Lex|   De Haan| LDEHAAN|515.123.4569|2001-01-13 00:00:00|     AD_VP|17000.00|          null|       100|           90|
|        103|  Alexander|    Hunold| AHUNOLD|590.423.4567|2006-01-03 00:00:00|   IT_PROG| 9000.00|          null|       102|           60|
|        104|      Bruce|     Ernst|  BERNST|590.423.4568|2007-05-21 00:00:00|   IT_PROG| 6000.00|          null|       103|           60|
|        105|      David|    Austin| DAUSTIN|590.423.4569|2005-06-25 00:00:00|   IT_PROG| 4800.00|          null|       103|           60|
|        106|      Valli| Pataballa|VPATABAL|590.423.4560|2006-02-05 00:00:00|   IT_PROG| 4800.00|          null|       103|           60|
|        107|      Diana|   Lorentz|DLORENTZ|590.423.5567|2007-02-07 00:00:00|   IT_PROG| 4200.00|          null|       103|           60|
|        108|      Nancy| Greenberg|NGREENBE|515.124.4569|2002-08-17 00:00:00|    FI_MGR|12008.00|          null|       101|          100|
|        109|     Daniel|    Faviet| DFAVIET|515.124.4169|2002-08-16 00:00:00|FI_ACCOUNT| 9000.00|          null|       108|          100|
|        110|       John|      Chen|   JCHEN|515.124.4269|2005-09-28 00:00:00|FI_ACCOUNT| 8200.00|          null|       108|          100|
|        111|     Ismael|   Sciarra|ISCIARRA|515.124.4369|2005-09-30 00:00:00|FI_ACCOUNT| 7700.00|          null|       108|          100|
|        112|Jose Manuel|     Urman| JMURMAN|515.124.4469|2006-03-07 00:00:00|FI_ACCOUNT| 7800.00|          null|       108|          100|
|        113|       Luis|      Popp|   LPOPP|515.124.4567|2007-12-07 00:00:00|FI_ACCOUNT| 6900.00|          null|       108|          100|
|        114|        Den|  Raphaely|DRAPHEAL|515.127.4561|2002-12-07 00:00:00|    PU_MAN|11000.00|          null|       100|           30|
|        115|  Alexander|      Khoo|   AKHOO|515.127.4562|2003-05-18 00:00:00|  PU_CLERK| 3100.00|          null|       114|           30|
|        116|     Shelli|     Baida|  SBAIDA|515.127.4563|2005-12-24 00:00:00|  PU_CLERK| 2900.00|          null|       114|           30|
|        117|      Sigal|    Tobias| STOBIAS|515.127.4564|2005-07-24 00:00:00|  PU_CLERK| 2800.00|          null|       114|           30|
|        118|        Guy|    Himuro| GHIMURO|515.127.4565|2006-11-15 00:00:00|  PU_CLERK| 2600.00|          null|       114|           30|
|        119|      Karen|Colmenares|KCOLMENA|515.127.4566|2007-08-10 00:00:00|  PU_CLERK| 2500.00|          null|       114|           30|
+-----------+-----------+----------+--------+------------+-------------------+----------+--------+--------------+----------+-------------+
only showing top 20 rows

root
 |-- EMPLOYEE_ID: decimal(6,0) (nullable = true)
 |-- FIRST_NAME: string (nullable = true)
 |-- LAST_NAME: string (nullable = true)

+-----------+-----------+----------+
|EMPLOYEE_ID| FIRST_NAME| LAST_NAME|
+-----------+-----------+----------+
|        100|     Steven|      King|
|        101|      Neena|   Kochhar|
|        102|        Lex|   De Haan|
|        103|  Alexander|    Hunold|
|        104|      Bruce|     Ernst|
|        105|      David|    Austin|
|        106|      Valli| Pataballa|
|        107|      Diana|   Lorentz|
|        108|      Nancy| Greenberg|
|        109|     Daniel|    Faviet|
|        110|       John|      Chen|
|        111|     Ismael|   Sciarra|
|        112|Jose Manuel|     Urman|
|        113|       Luis|      Popp|
|        114|        Den|  Raphaely|
|        115|  Alexander|      Khoo|
|        116|     Shelli|     Baida|
|        117|      Sigal|    Tobias|
|        118|        Guy|    Himuro|
|        119|      Karen|Colmenares|
+-----------+-----------+----------+
only showing top 20 rows

SUCCESS: The process with PID 33856 (child process of PID 4676) has been terminated.
SUCCESS: The process with PID 4676 (child process of PID 18036) has been terminated.
SUCCESS: The process with PID 18036 (child process of PID 11316) has been terminated.


 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

0 comments:

Post a Comment