In pro­duc­tion data envir­on­ments, audit­ing is key to track user exe­cu­tions and accesses, espe­cially in highly secure plat­forms where you need to know who is execut­ing whatwhen, and how.

In the con­text of a KNIME Server deploy­ment for a cus­tomer with strict secur­ity stand­ards, we developed a solu­tion for audit­ing KNIME a few years ago, which we can now tag as KNIME Audit­ing Ver­sion 1. In this blog post, we are going to explain KNIME Audit­ing Ver­sion 2, an updated solu­tion that has replaced and improved the pre­vi­ous approach.

KNIME Audit­ing Ver­sion 1 – The Pre­vi­ous Solu­tion with Open-Source KNIME Plugins

KNIME Audit­ing Ver­sion 1 is fully explained in the blog post Adding Audit­ing Cap­ab­il­it­ies to KNIME with open-source plu­gins. In short, this approach con­sisted of adding two open-source KNIME plu­gins to each KNIME Executor VM to extend the KNIME Executor log­ging cap­ab­il­it­ies and to send audit events as XMLs to an Act­iveMQ (AMQ) queue. For each executed node in a KNIME job, four audit events were sent to detail the fol­low­ing: when the node star­ted, its exe­cu­tion status on fin­ish­ing, its depend­en­cies on other nodes, and the para­met­ers used for execution.

The decision to develop two KNIME plu­gins, rather than a Python script to pro­cess the KNIME log file, was driven by three key reas­ons, shown below and elab­or­ated on in the above-men­tioned blog post:

  1. The inform­a­tion avail­able in the log is lim­ited. The first plu­gin (Log­ging Exten­ded) solved this.
  2. It requires a pars­ing tool that runs in the back­ground and parses the log file into audit events that are sent to the audit sys­tem. The second plu­gin (AMQ Log­ger) pro­cessed the log lines and sent the audit events dir­ectly to the JVM, even before the log lines were writ­ten to the log file, thus elim­in­at­ing the need to parse the log file.
  3. The log file is owned by the user or ser­vice account run­ning KNIME. In the KNIME Executor, the job owner also owns the logs, mak­ing them sus­cept­ible to changes and unre­li­able for audit­ing. The com­bin­a­tion of the two plu­gins meant that we no longer needed to parse the log, so it did not mat­ter if it could be tampered with.

While that solu­tion worked and ini­tially sat­is­fied the customer’s needs, we encountered a volume prob­lem while hand­ling the events sent to the audit sys­tem. Send­ing four events for each node basic­ally meant that com­plex work­flows or work­flows con­tain­ing loops were gen­er­at­ing hun­dreds or even thou­sands of events, flood­ing the entire audit server with hard-to-track information.

KNIME Audit­ing Ver­sion 2 – Solv­ing the Volume Prob­lem & Sim­pli­fy­ing the Approach

To solve the volume prob­lem and sim­plify the solu­tion, we have designed a new audit solu­tion for KNIME Server. Rather than send­ing four audit events for each node executed in a KNIME job, the new solu­tion sends just one audit event per job exe­cu­tion, drastic­ally redu­cing the num­ber of audit events sent. Moreover, for each executed job, instead of send­ing detailed inform­a­tion about the entire executed work­flow to the audit sys­tem, we send only basic job inform­a­tion in the audit event.

More exten­ded inform­a­tion is stored in an audit folder on the net­work share used for the KNIME work­flow repos­it­ory. This folder includes the work­flow and job exe­cu­tion details, as well as the work­flow con­fig­ur­a­tion files in KNWF format. The audit folder is only access­ible to admins and auditors.

This new approach not only alle­vi­ates the audit server load by send­ing only job inform­a­tion (and a link to the audit folder for more details if needed), but also allows admin­is­trat­ors and aud­it­ors to recre­ate the executed work­flow in the KNIME UI with the exact same con­fig­ur­a­tion, thanks to the KNWF file in the audit folder. This is pos­sible even if the user has deleted the job or the work­flow in the KNIME Server repository.

Note that we have imple­men­ted this solu­tion with Python, which in prin­ciple con­tra­dicts the three reas­ons that jus­ti­fied our pre­vi­ous solu­tion, lis­ted above. How­ever, there is an explan­a­tion for each point:

  • With ref­er­ence to point 1 from above, we developed the pre­vi­ous solu­tion three years ago, and since then KNIME has added more fea­tures. One of these fea­tures is the KNIME Server REST API which can be used to extract job and work­flow inform­a­tion (includ­ing down­load­ing a copy of the work­flow), so there is no need for extra plu­gins to add more inform­a­tion to the logs.
  • Point 2 remains valid, as devel­op­ing some Python logic to tail the log file and send the audit event was neces­sary. How­ever, our cus­tomer real­ised that a Python applic­a­tion is easier to main­tain than two cus­tom KNIME plu­gins writ­ten in Java.
  • Regard­ing point 3, in our new approach we will use the Tom­cat server log, which is not access­ible to the KNIME executor, and there­fore com­pletely inac­cess­ible to users, ensur­ing that no tam­per­ing can be done to the log file.

In the fol­low­ing sec­tion we’ll explain in more detail how this Python solu­tion was developed to extract the job ID from the Tom­cat logs, gen­er­ate the audit folder with the work­flow con­fig­ur­a­tion, and then send the audit event.

A Deeper Dive into KNIME Audit­ing Ver­sion 2

Fig­ure 1: KNIME Audit­ing Ver­sion 2

KNIME Audit­ing Ver­sion 2 is a mul­ti­th­readed Python applic­a­tion. One thread, the Log Reader, reads the KNIME Server logs to extract the job ID from fin­ished jobs. It reads the new lines of the cur­rent daily log file and searches for lines con­tain­ing the strings

EXECUTION_FINISHED or EXECUTION_FAILED

to identify the end of the job exe­cu­tion, which includes the job ID. This job ID is then sent to a thread-safe FIFO queue to be retrieved by the main thread, which is respons­ible for gen­er­at­ing the backup and send­ing the audit event:

25-Apr-2024 16:58:53.721 INFO [https-jsse-nio2-8443-exec-23] com.knime.enterprise.server.jobs.WorkflowJobManagerImpl.loadWorkflow Loading workflow '/TeamA/TestWorkflow' for user 'victor'
25-Apr-2024 16:58:53.727 INFO [https-jsse-nio2-8443-exec-23] com.knime.enterprise.server.executor.msgq.RabbitMQExecutorImpl.loadWorkflow Loading workflow '/TeamA/TestWorkflow (TestWorkflow 2024-04-25 16.58.53; f9ff25f9-6ac8-489b-ad8e-73c421f8b119)' via executor group 'knime-jobs'
25-Apr-2024 16:59:00.351 INFO [https-jsse-nio2-8443-exec-17] com.knime.enterprise.server.executor.msgq.RabbitMQExecutorImpl.resetWorkflow Resetting workflow of job '/TeamA/TestWorkflow (TestWorkflow 2024-04-25 16.58.53; f9ff25f9-6ac8-489b-ad8e-73c421f8b119)' via executor group 'knime-jobs', executor '01f57cec-85de-496f-81d2-b24548e8f47e@node003.clearpeaks.com'
25-Apr-2024 16:59:00.352 INFO [https-jsse-nio2-8443-exec-17] com.knime.enterprise.server.jobs.WorkflowJobManagerImpl.execute Executing job '/TeamA/TestWorkflow (TestWorkflow 2024-04-25 16.58.53; f9ff25f9-6ac8-489b-ad8e-73c421f8b119)' (UUID f9ff25f9-6ac8-489b-ad8e-73c421f8b119)
25-Apr-2024 16:59:00.352 INFO [https-jsse-nio2-8443-exec-17] com.knime.enterprise.server.executor.msgq.RabbitMQExecutorImpl.execute Executing job '/TeamA/TestWorkflow (TestWorkflow 2024-04-25 16.58.53; f9ff25f9-6ac8-489b-ad8e-73c421f8b119)' via executor group 'knime-jobs', executor '01f57cec-85de-496f-81d2-b24548e8f47e@node003.clearpeaks.com'
25-Apr-2024 16:59:00.390 INFO [Distributed executor message handler for Envelope(deliveryTag=740, redeliver=false, exchange=knime-executor2server, routingKey=job.f9ff25f9-6ac8-489b-ad8e-73c421f8b119)] com.knime.enterprise.server.executor.msgq.StatusMessageHandler.updateJob Job: TestWorkflow 2024-04-25 16.58.53 (f9ff25f9-6ac8-489b-ad8e-73c421f8b119) of owner victor finished with state EXECUTION_FAILED
25-Apr-2024 16:59:00.749 INFO [https-jsse-nio2-8443-exec-7] com.knime.enterprise.server.executor.msgq.RabbitMQExecutorImpl.doSendGenericRequest Sending generic message to job '/TeamA/TestWorkflow (TestWorkflow 2024-04-25 16.58.53; f9ff25f9-6ac8-489b-ad8e-73c421f8b119)' via executor group 'knime-jobs', executor '01f57cec-85de-496f-81d2-b24548e8f47e@node003.clearpeaks.com'
25-Apr-2024 16:59:11.277 INFO [KNIME-Job-Lifecycle-Handler_1] com.knime.enterprise.server.jobs.WorkflowJobManagerImpl.discardInternal Discarding job '/Groups/tst_dev_application_knime_AP1_User/TestWorkflow (TestWorkflow 2024-04-18 16.54.05; f4660d17-5fbf-4440-a418-73772b6e86a6)' (UUID f4660d17-5fbf-4440-a418-73772b6e86a6)
25-Apr-2024 16:59:11.981 INFO [KNIME-Executor-Watchdog_1] com.knime.enterprise.server.util.ExecutorWatchdog.scheduleThreads Checking for vanished executors in 100000ms

Fig­ure 2: Por­tion of a KNIME Server Log dur­ing a job exe­cu­tion fail­ure (notice the high­lighted log line end­ing with EXECUTION_FAILED)

Once the job ID has been retrieved and shared with the main thread of the Python applic­a­tion, it is used to per­form three API calls to the KNIME Server REST API to obtain all the job and work­flow information:

  • GET https://<serverurl>:<port>/knime/rest/v4/jobs/{job_id}

to retrieve job inform­a­tion. This inform­a­tion is stored in a JSON file called job-summary.json.

  • GET https://<serverurl>:<port>/knime/rest/v4/jobs/{job_id} /workflow-summary?format=JSON&includeExecutionInfo=true

to retrieve the work­flow inform­a­tion. This inform­a­tion is stored in a JSON file called workflow-summary.json.

  • GET https://<serverurl>:<port>/knime/rest/v4/repository/{workflow_path}:data

to down­load the work­flow KNWF file which can be used to recre­ate the work­flow by import­ing it. The file is stored with the name {job_id}.knwf; note that the KNWF file is actu­ally a ZIP file.

Fig­ure 3: KNIME Server REST API Swagger

The inform­a­tion retrieved from all the API calls is stored in the audit folder in the file sys­tem. For a KNIME Server deploy­ment with High Avail­ab­il­ity (as is the case for our cus­tomer), we advise loc­at­ing the audit folder in the same net­work share used as the KNIME repos­it­ory, moun­ted as a file sys­tem in both KNIME Server nodes. The audit folder is organ­ised by days, mak­ing it easy to apply reten­tion policies by time if neces­sary. Each job exe­cu­tion is stored in a folder named using the job ID and a timestamp.

If the same job is executed twice, which is pos­sible in KNIME, there will be two job folders with the same job ID but dif­fer­ent timestamps. Moreover, to min­im­ise the volume of the audit folder, the work­flow KNWF file is unzipped, and unne­ces­sary files are removed to recre­ate the work­flow with the exact same con­fig­ur­a­tion. Any inter­me­di­ate data is also wiped dur­ing this process:

[root@node003|development|logs]# cd /knime/knimerepo/audit/
[root@node003|development|audit]# ls
20240416  20240418  20240424  20240425
[root@node003|development|audit]# cd 20240425/
[root@node003|development|20240425]# ls
2da5bf15-57ca-487f-99cc-9e67f9136be1-20240425145128  ca044773-bc26-4584-9c80-6c80c2585f6d-20240425095001
67c55db2-39e9-483f-90e7-9acfb952df07-20240425095103  f9ff25f9-6ac8-489b-ad8e-73c421f8b119-20240425165900
9b21efa4-2b70-42b5-8206-419c16c5b32b-20240425145215
[root@node003|development|20240425]# cd f9ff25f9-6ac8-489b-ad8e-73c421f8b119-20240425165900/
[root@node003|development|f9ff25f9-6ac8-489b-ad8e-73c421f8b119-20240425165900]# ls
f9ff25f9-6ac8-489b-ad8e-73c421f8b119.knwf  job-summary.json  workflow-summary.json

Fig­ure 4: Example of con­tent stored in the audit folder

As well as the clean­ing men­tioned above, we also pro­cess all the settings.xml files within the KNWF to read the ‘paths’ entry of nodes such as CSVReader to extract the data­sets being accessed in each work­flow node. This allows the admin­is­trator to quickly identify which data­sets are being quer­ied by each user:

[root@node003|development|f9ff25f9-6ac8-489b-ad8e-73c421f8b119-20240425165900]# unzip f9ff25f9-6ac8-489b-ad8e-73c421f8b119.knwf
Archive: f9ff25f9-6ac8-489b-ad8e-73c421f8b119.knwf
creating: TestWorkflow/
creating: TestWorkflow/.artifacts/
creating: TestWorkflow/CSV Reader (#3)/
creating: TestWorkflow/Credentials Configuration (#2)/
creating: TestWorkflow/Kerberos Initializer (#1)/
creating: TestWorkflow/tmp/
inflating: TestWorkflow/workflow.svg
inflating: TestWorkflow/workflow.knime
inflating: TestWorkflow/workflowset.meta
inflating: TestWorkflow/Kerberos Initializer (#1)/settings.xml
inflating: TestWorkflow/Credentials Configuration (#2)/settings.xml
inflating: TestWorkflow/.artifacts/workflow-configuration-representation.json
inflating: TestWorkflow/.artifacts/workflow-configuration.json
inflating: TestWorkflow/CSV Reader (#3)/settings.xml

Fig­ure 5: Example of unzip­ping a cleaned KNWF file in the audit folder

Finally, an audit event con­struc­ted from the extrac­ted inform­a­tion via the API calls is sent as an XML to an Act­iveMQ queue. The inform­a­tion sent con­tains the user ID who executed the work­flow, the host where the KNIME Server is run­ning, the status of the work­flow (suc­cess or fail), the job ID, the exe­cu­tion timestamp, the data­set paths accessed in nodes such as CSVReader, the error mes­sage if the exe­cu­tion failed, and the path within the audit folder where aud­it­ors and admin­is­trat­ors can find more inform­a­tion about the job, includ­ing the KNWF file for recre­at­ing the work­flow for this spe­cific job execution:

2024-04-25 16:59:00,636 knime_audit INFO Encountered job f9ff25f9-6ac8-489b-ad8e-73c421f8b119
2024-04-25 16:59:00,637 knime_audit INFO Processing job: f9ff25f9-6ac8-489b-ad8e-73c421f8b119
2024-04-25 16:59:00,903 knime_audit INFO Storing f9ff25f9-6ac8-489b-ad8e-73c421f8b119 information in: /knime/knimerepo/audit/20240425/f9ff25f9-6ac8-489b-ad8e-73c421f8b119-20240425165900
2024-04-25 16:59:00,977 knime_audit INFO Extract knwf files into: /knime/knime_audit/temp_wf/f9ff25f9-6ac8-489b-ad8e-73c421f8b119
2024-04-25 16:59:00,980 knime_audit INFO Processing all settings.xml to extract paths
2024-04-25 16:59:00,988 knime_audit INFO Remove temp folder /knime/knime_audit/temp_wf/f9ff25f9-6ac8-489b-ad8e-73c421f8b119
2024-04-25 16:59:00,989 knime_audit INFO Send audit info for job f9ff25f9-6ac8-489b-ad8e-73c421f8b119
2024-04-25 16:59:01,003 knime_audit INFO Establishing SSL connectivity with AMQ
2024-04-25 16:59:01,019 knime_audit INFO Connection established
2024-04-25 16:59:01,033 knime_audit INFO Send audit:
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<auditEventList xmlns="http://www.clearpeaks.com/AuditEvent">
  <AuditEvent>
   <actor>
     <id>victor</id>
     <name>victor</name>
   </actor>
   <application>
     <component>KNIME Server</component>
     <hostName>node003.clearpeaks.com</hostName>
     <name>KNIME</name>
   </application>
     <action>
     <actionType>EXECUTION_FAILED</actionType>
     <additionalInfo name="jobId">f9ff25f9-6ac8-489b-ad8e-73c421f8b119</additionalInfo>
     <additionalInfo name="errorMessage">Execute failed: Invoking Kerberos init (kinit) failed.</additionalInfo>
     <additionalInfo name="paths">/knime/data/test.csv</additionalInfo>
     <additionalInfo name="audit_path">/knime/knimerepo/audit/20240425/f9ff25f9-6ac8-489b-ad8e-73c421f8b119-20240425165900</additionalInfo>
     <timestamp>2024-04-25T16:59:00.638955+02:00</timestamp>
   </action>
  </AuditEvent>
</auditEventList>
2024-04-25 16:59:01,035 knime_audit INFO Accepted message

Fig­ure 6: Por­tion of KNIME Audit­ing Ver­sion 2 log dur­ing the audit­ing of failed job. Note that the log shows the con­tent of the XML sent

The XML is sent via the Act­iveMQ Qpid pro­tocol using the Python qpid-pro­ton cli­ent, ideal for send­ing mes­sages to an AMQ queue, even if the server is con­figured with high avail­ab­il­ity and requires SSL for connectivity.

KNIME Audit Ver­sion 2 is eas­ily con­fig­ur­able via a JSON con­fig­ur­a­tion file. The code for this new solu­tion, along with all neces­sary instruc­tions, can be found in this Git­Hub repos­it­ory. It is designed to run as a back­ground Linux ser­vice on your KNIME Server machines.

Con­clu­sion

In this blog post, we have presen­ted an approach to audit­ing jobs executed on KNIME Server (spe­cific­ally on the KNIME Execut­ors con­trolled by the KNIME Server) using its REST API and KNIME Server logs. This solu­tion replaces the pre­vi­ous ver­sion and is both easier to use and to main­tain, whilst also avoid­ing the flood­ing of the audit­ing sys­tem with numer­ous events by provid­ing less but more valu­able inform­a­tion. It also pro­tects the audit­ing inform­a­tion in case jobs are deleted, and even offers the pos­sib­il­ity of recre­at­ing the entire work­flow with the exact con­fig­ur­a­tion used dur­ing its execution.

As read­ers famil­iar with KNIME might know, any KNIME Server deploy­ment will even­tu­ally be replaced by KNIME Busi­ness Hub. Note that the audit­ing solu­tion presen­ted here is not yet com­pat­ible with KNIME Busi­ness Hub because it relies on the KNIME Server logs, which are not present on KNIME Busi­ness Hub. How­ever, with a few minor modi­fic­a­tions (to adapt to KNIME Busi­ness Hub log­ging and its REST API), the solu­tion should also work in that new offering.

Here at ClearPeaks we pride ourselves on our extens­ive exper­i­ence in imple­ment­ing tailored solu­tions that meet our cus­tom­ers’ exact stand­ards and needs. Our team is adept at nav­ig­at­ing and adapt­ing to the ever-evolving land­scape of data tech­no­lo­gies, ensur­ing that our solu­tions remain cut­ting-edge and effect­ive. Don’t hes­it­ate to con­tact us if you have any ques­tions or if you find yourselves facing a situ­ation sim­ilar to that described in this blog post. Our experts are ready to provide the sup­port and insights you need to achieve your data man­age­ment goals!