Reading Data From RDBMS Database using Apache Spark (with Python)
|
#We need to add ojdbc6 to our your_spark_home_path/jars.
#We have used oracle default table HR.EMPLOYEES
from pyspark.sql import SparkSession
spark = SparkSession \
.builder \
.appName("Python Spark SQL basic example") \
.getOrCreate()
empDF = spark.read \
.format("jdbc") \
.option("url", "jdbc:oracle:thin:SYSTEM/oracle@//localhost:1521/xe") \
.option("dbtable", "HR.EMPLOYEES") \
.option("user", "SYSTEM") \
.option("password", "oracle") \
.option("driver", "oracle.jdbc.driver.OracleDriver") \
.load()
empDF.printSchema()
empDF.show()
#How to use join
query = "(select
employee_id,first_name,Last_name from HR.EMPLOYEES, HR.departments where
HR.EMPLOYEES.department_id = HR.departments.department_id) emp"
empDF1 = spark.read \
.format("jdbc") \
.option("url", "jdbc:oracle:thin:SYSTEM/oracle@//localhost:1521/xe") \
.option("dbtable",
query) \
.option("user", "SYSTEM") \
.option("password", "oracle") \
.option("driver", "oracle.jdbc.driver.OracleDriver") \
.load()
empDF1.printSchema()
empDF1.show()
|
Output:
|
19/04/19 08:15:09 WARN NativeCodeLoader: Unable to load
native-hadoop library for your platform... using builtin-java classes where
applicable
Using Spark's default log4j profile:
org/apache/spark/log4j-defaults.properties
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR,
use setLogLevel(newLevel).
19/04/19 08:15:11 WARN Utils: Service 'SparkUI' could not bind on
port 4040. Attempting port 4041.
root
|-- EMPLOYEE_ID:
decimal(6,0) (nullable = true)
|-- FIRST_NAME: string
(nullable = true)
|-- LAST_NAME: string
(nullable = true)
|-- EMAIL: string
(nullable = true)
|-- PHONE_NUMBER: string
(nullable = true)
|-- HIRE_DATE: timestamp
(nullable = true)
|-- JOB_ID: string
(nullable = true)
|-- SALARY: decimal(8,2)
(nullable = true)
|-- COMMISSION_PCT:
decimal(2,2) (nullable = true)
|-- MANAGER_ID:
decimal(6,0) (nullable = true)
|-- DEPARTMENT_ID:
decimal(4,0) (nullable = true)
+-----------+-----------+----------+--------+------------+-------------------+----------+--------+--------------+----------+-------------+
|EMPLOYEE_ID| FIRST_NAME| LAST_NAME| EMAIL|PHONE_NUMBER| HIRE_DATE| JOB_ID|
SALARY|COMMISSION_PCT|MANAGER_ID|DEPARTMENT_ID|
+-----------+-----------+----------+--------+------------+-------------------+----------+--------+--------------+----------+-------------+
| 100| Steven| King|
SKING|515.123.4567|2003-06-17 00:00:00| AD_PRES|24000.00| null| null| 90|
| 101| Neena|
Kochhar|NKOCHHAR|515.123.4568|2005-09-21 00:00:00| AD_VP|17000.00| null| 100| 90|
| 102| Lex|
De Haan| LDEHAAN|515.123.4569|2001-01-13 00:00:00| AD_VP|17000.00| null| 100| 90|
| 103| Alexander| Hunold| AHUNOLD|590.423.4567|2006-01-03
00:00:00| IT_PROG| 9000.00| null| 102| 60|
| 104| Bruce| Ernst|
BERNST|590.423.4568|2007-05-21 00:00:00| IT_PROG| 6000.00| null| 103| 60|
| 105| David| Austin| DAUSTIN|590.423.4569|2005-06-25
00:00:00| IT_PROG| 4800.00| null| 103| 60|
| 106| Valli|
Pataballa|VPATABAL|590.423.4560|2006-02-05 00:00:00| IT_PROG| 4800.00| null| 103| 60|
| 107| Diana|
Lorentz|DLORENTZ|590.423.5567|2007-02-07 00:00:00| IT_PROG| 4200.00| null| 103| 60|
| 108| Nancy|
Greenberg|NGREENBE|515.124.4569|2002-08-17 00:00:00| FI_MGR|12008.00| null| 101| 100|
| 109| Daniel| Faviet| DFAVIET|515.124.4169|2002-08-16
00:00:00|FI_ACCOUNT| 9000.00|
null| 108| 100|
| 110| John| Chen|
JCHEN|515.124.4269|2005-09-28 00:00:00|FI_ACCOUNT| 8200.00| null| 108| 100|
| 111| Ismael|
Sciarra|ISCIARRA|515.124.4369|2005-09-30 00:00:00|FI_ACCOUNT|
7700.00| null| 108| 100|
| 112|Jose
Manuel| Urman| JMURMAN|515.124.4469|2006-03-07
00:00:00|FI_ACCOUNT| 7800.00|
null| 108| 100|
| 113| Luis| Popp|
LPOPP|515.124.4567|2007-12-07 00:00:00|FI_ACCOUNT| 6900.00| null| 108| 100|
| 114| Den|
Raphaely|DRAPHEAL|515.127.4561|2002-12-07 00:00:00| PU_MAN|11000.00| null| 100| 30|
| 115| Alexander| Khoo|
AKHOO|515.127.4562|2003-05-18 00:00:00| PU_CLERK| 3100.00| null| 114| 30|
| 116| Shelli| Baida|
SBAIDA|515.127.4563|2005-12-24 00:00:00| PU_CLERK| 2900.00| null| 114| 30|
| 117| Sigal| Tobias| STOBIAS|515.127.4564|2005-07-24
00:00:00| PU_CLERK| 2800.00| null| 114| 30|
| 118| Guy| Himuro| GHIMURO|515.127.4565|2006-11-15
00:00:00| PU_CLERK| 2600.00| null| 114| 30|
| 119|
Karen|Colmenares|KCOLMENA|515.127.4566|2007-08-10 00:00:00| PU_CLERK| 2500.00| null| 114| 30|
+-----------+-----------+----------+--------+------------+-------------------+----------+--------+--------------+----------+-------------+
only showing top 20 rows
root
|-- EMPLOYEE_ID: decimal(6,0)
(nullable = true)
|-- FIRST_NAME: string
(nullable = true)
|-- LAST_NAME: string
(nullable = true)
+-----------+-----------+----------+
|EMPLOYEE_ID| FIRST_NAME| LAST_NAME|
+-----------+-----------+----------+
| 100| Steven| King|
| 101| Neena|
Kochhar|
| 102| Lex|
De Haan|
| 103| Alexander| Hunold|
| 104| Bruce| Ernst|
| 105| David| Austin|
| 106| Valli| Pataballa|
| 107| Diana|
Lorentz|
| 108| Nancy| Greenberg|
| 109| Daniel| Faviet|
| 110| John| Chen|
| 111| Ismael|
Sciarra|
| 112|Jose
Manuel| Urman|
| 113| Luis| Popp|
| 114| Den|
Raphaely|
| 115| Alexander| Khoo|
| 116| Shelli| Baida|
| 117| Sigal| Tobias|
| 118| Guy| Himuro|
| 119| Karen|Colmenares|
+-----------+-----------+----------+
only showing top 20 rows
SUCCESS: The process with PID 33856 (child process of PID 4676)
has been terminated.
SUCCESS: The process with PID 4676 (child process of PID 18036)
has been terminated.
SUCCESS: The process with PID 18036 (child process of PID 11316)
has been terminated.
|



0 comments:
Post a Comment