New Plu­gin to refresh Data Flows and Data Sets



As you know, Oracle is one of the largest com­pan­ies recently focus­ing on migrat­ing to the cloud; as a mat­ter of fact, the cloud is their pri­or­ity in their bid to return to Gartner’s magic quad­rant of BI lead­ers. One of their most used tools is Oracle Ana­lyt­ics Cloud, known as OAC.

OAC is a visu­al­iz­a­tion tool that has the same func­tion­al­it­ies as Oracle Data Visu­al­iz­a­tion Desktop (Oracle DVD), but embed­ded in a cloud solu­tion. As with many products, there are still detailed func­tion­al­it­ies that could be enhanced – or don’t exist yet.

In May 2017 Oracle developed a brand new plu­gin to auto­mat­ic­ally refresh data used in DV pro­jects for both OAC and DVD tools. How­ever, it didn’t offer the pos­sib­il­ity to refresh those data sets that were cre­ated by a data flow. At synvert we have enhanced the plu­gin to refresh the set of data flows with depend­ency on the project’s data sources. We are excited about shar­ing this enhance­ment and its bene­fits that will impact dir­ectly on the user experience.

1. The ori­ginal plugin

Ori­gin­ally, Oracle cre­ated a plu­gin to man­age data source updates, either sporad­ic­ally or peri­od­ic­ally, a use­ful solu­tion to ana­lyse chan­ging or stream­ing data. The goal of this sec­tion is to present the ini­tial plu­gin and its main func­tion­al­it­ies. Remem­ber that a plu­gin cre­ates a new visu­al­iz­a­tion and enables it dir­ectly to the data dis­cov­ery panel of the tool. This is the ori­ginal visualization:

Refresh plugin – original visualization.
Fig­ure 1: Refresh plu­gin – ori­ginal visualization.

In the main dash­board panel there are two dif­fer­ent options to reload the data manu­ally, Refresh Data and Refresh Data Sets.

Options to refresh data manually.
Fig­ure 2: Options to refresh data manually.

The first option does the same as Refresh Data in the plu­gin, while the second does the same as Refresh Data Sources. To dif­fer­en­ti­ate one option from the other, ima­gine that there is only an XLS file feed­ing the pro­ject and we add extra rows; to make those extra rows avail­able in the visu­al­iz­a­tion panel, we only need to refresh data, the first option in the plu­gin. How­ever, if we add an extra column to the XLS file, the metadata changes, so we need to apply the second option (refresh data sources). To fully man­age the plu­gin, we only have to choose one of these two options and select if we want to refresh sporad­ic­ally or peri­od­ic­ally. This is a use­ful plu­gin to ana­lyse real-time cases, or even if we just want to refresh the data source as we need to include new rows and see what the impact on the visu­al­iz­a­tion panel is.

How­ever, the plu­gin does not work in some com­mon cases. For example, when a pro­ject is fed by a data set cre­ated by a data flow, we need to run the data flow first if we want to refresh the data source. With this plu­gin, that’s not going to happen!

As we needed to find a way to auto­mat­ize the refresh­ing of the data flows behind the data source, we developed a new plu­gin to enhance Oracle’s.

2. Our cus­tom­ized solution

In this sec­tion we’ll present the solu­tion that synvert developed, focused on updat­ing data flows, detect­ing the data flows to be updated and auto­mat­ic­ally tak­ing the neces­sary steps to refresh them by click­ing on the refresh button.

Our main object­ive when we star­ted devel­op­ing the plu­gin was to execute data flows from a com­mand line auto­mat­ic­ally – it was a busi­ness request. Remem­ber that to run a data flow from OAC, the user must go to the data panel, choose Data Flows, and by right-click­ing on the desired data flow, run it.

OAC – How to run a data flow.
Fig­ure 3: OAC – How to run a data flow.

This pro­cess is time-con­sum­ing and even a little irrit­at­ing – that’s why we star­ted work­ing on this plu­gin. Fur­ther­more, there are other aspects to con­sider when refresh­ing: what if our data source is gen­er­ated by mul­tiple data flows so it needs pre­vi­ous data flows to be run in order to provide con­sist­ent data? Yes, that’s a tricky one …

Let’s explain what types of data sources exist and what actions we need to imple­ment if we want to refresh them:

  1. Data Set. When we want to refresh a Data Set, it must be done manu­ally because the tool forces us to provide another file. This is the only case that the plu­gin can’t cover as it requires a manual action.
  2. Con­nec­tion. Once the con­nec­tion is cre­ated, we don’t have to refresh it, because each time we want to refresh the data set provided by the con­nec­tion, the tool will throw a query to cap­ture the new­est inform­a­tion though the connection.
  3. Data Flow. The pro­cess to refresh a data flow is explained above in this art­icle and is quite laborious.
  4. Sequence. A sequence is a pro­cess that con­cat­en­ates mul­tiple data flows and executes them sequen­tially. The exe­cu­tion pro­cess is the same as the data flow.

As we said before, our plu­gin is only focused on updat­ing data flows, so our first step to make the refresh auto­matic was to look at the OAC option to sched­ule data flows. As you can see in Fig­ure 3, by right-click­ing on a desired data flow there’s the option to sched­ule it. But sched­ule is usu­ally related to a very reg­u­lar data load, and that’s not our case. Because what hap­pens when on Monday the data is loaded at 2 am and on Tues­day at 3 pm? What if it changes every week? This could be a nightmare!

That’s why we decided to start look­ing at how OAC runs a data flow through JavaS­cript com­mands, to try to rep­lic­ate the logic and to cre­ate an error hand­ler to know when a run fails and why.

As it also makes sense not to run the plu­gin only to ana­lyse chan­ging or stream­ing data, we have mod­i­fied the visu­al­iz­a­tion to refresh only when the users want. We have just left one but­ton to refresh it.

In addi­tion, it doesn’t mat­ter if the data sources have been changed by adding a new column or just by updat­ing the data, the plu­gin takes this into con­sid­er­a­tion and eval­u­ates if a Refresh Data or Refresh Data Sources is required.

Our plu­gin visu­al­iz­a­tion looks like this:

Our synvert plugin visualization.
Fig­ure 4: Our synvert plu­gin visualization.

What’s more, our cus­tom plu­gin can be used in both OAC and Oracle DVD!

Con­clu­sion

At synvert we have been able to develop a plu­gin solu­tion to refresh all data sources by execut­ing the data flows that depend on them and also allow­ing users to ensure that the ana­lysed data is as up-to-date as possible.

Unfor­tu­nately, as noth­ing in life is ever truly per­fect, the plu­gin still has some fea­tures that need cov­er­ing, but remem­ber that this is only our first enhance­ment: in the com­ing ver­sions we will go through the whole set of data source ele­ments, like sequences and the depend­en­cies between each element.

Stay tuned to see what’s new in the fol­low­ing ver­sions of the plu­gin. We are look­ing for­ward to see­ing what’s next in the Oracle port­fo­lio and we will update you as soon as we can. We’d be really inter­ested in help­ing you with any issue related to this art­icle, so feel free to con­tact us whenever you want.