Monday, February 27, 2012

Box and Whiskers Plot using R and BIP

 I’ve just started experimenting with R in conjunction with BI-Publisher. R provides various statistical packages and plotting utilities and there are numerous resources on the web dedicated to it. The goal is to take a generate a plot using R and render it in BI-Publisher. The installation steps for R enterprise are covered in the Oracle® R Enterprise User's Guide, the main steps are: 


  • Install the R server ((LINUX/UNIX)

To connect to R enterprise from R client run the following:
ore.connect(user = "",sid = "",host = "",password = "",port = PORTNUMBER)
# Synchronize R with user's schema in Oracle
ore.sync()
# Attach environment containing visible Oracle tables and views
# to the search path
ore.attach()
Modify (user, sid, host, password, and port) for the database where the R Server is installed

  • Create the R script

As an example, I’ll generate a box and whiskers plot using the ONTIME_S airline on-time performance data. Box and whiskers plots are not available out of the box in BI-Publisher.  The plot will use a subset of the dataset to focus on the DMV Airports: Reagan Airport (DCA), Dulles Airport (IAD) and Baltimore Washington airport (BWI) for 2007 and 2008.
Below is the R script that generates the plot:

f <-function(){
  ontime <- ONTIME_S
  delay <- ontime$ARRDELAY[ontime$DEST %in%  c("BWI", "IAD", "DCA") & ontime$YEAR %in% c(2007,2008)]
  dest <- ontime$DEST[ontime$DEST %in%  c("BWI", "IAD", "DCA") & ontime$YEAR %in% c(2007,2008)]
  dest <- reorder(dest, delay, FUN = median)
  bd <- split(delay, dest)
  res <- boxplot(bd, ylim=c(-50,60),col=c( "darkolivegreen","coral3","cornflowerblue"),main="Flight Delays by Airport",  xlab="Airport", ylab="Delay (minutes)")
}
f()




The R script has to be modified a bit to "escape" the "&" sign (in red), if not ontime will be considered a prompted parameter. This script can be run using SQL*Plus.
 BEGIN
sys.rqScriptCreate('DMVAirports',
'f <-function(){
  ontime <- ONTIME_S
  delay <- ontime$ARRDELAY[ontime$DEST %in%  c("BWI", "IAD", "DCA") '||'&'||' ontime$YEAR %in% c(2007,2008)]
  dest <- ontime$DEST[ontime$DEST %in%  c("BWI", "IAD", "DCA") '||'&'||' ontime$YEAR %in% c(2007,2008)]
  dest <- reorder(dest, delay, FUN = median)
  bd <- split(delay, dest)
  res <- boxplot(bd, ylim=c(-50,60),col=c( "darkolivegreen","coral3","cornflowerblue"),main="Flight Delays by Airport",  xlab="Airport", ylab="Delay (minutes)")
}  ');
END;
/

  • Create a BI-Publisher data model using the following query:


select value from table(rqEval(NULL,'XML','DMVAirports'))



You can list all the available R functions in the database by running:
select name from  sys.rq_scripts

  • Change the VALUE field from CLOB to XML
  • Generate some sample XML
  • Create a new template using BI-Publisher Desktop and insert the field img with the following properties:  

   


  •  Create a PDF output et voila!!!



Too bad I live 10 mins from the airport with the worse performance!

For more information, tutorials visit:

a+
Fiston