Data Engineering
Python – Pass by object: Practical pitfall
Inside a loop I was accessing an object within a dictionary multiple times, transform and visualize it. The intention was, to have all transformation isolated from each other. What actually happened though, was that those transformations accumulated because of Python’s…
Duplicate Keys when Generating a Json from a Dictionary in Python
TLDR: A dictionary in json treats all keys as string, while a python dict distinguishes not only between the content but also its datatype (see stackoverflow). When saving a dictionary into a json and reloading the dictionary from it, you…
How To Create A Superset Guest Token With Python To Embed Dashboards
The ulterior motive is to embed a Superset Dashboard into e.g. a REACT application. To achieve this, one step includes the creation of guest tokens (service accounts). This process is (in my opinion) not sufficiently well documented, which is why…
Airflow – Fill Dagbag takes too long
TLDR: It is possible to dynamically create dags with only one dag script. However, at task execution the original dag script will be parsed once again. This results in unnecessary parsing iterations of dags, which are not the parent dag…
Migrating existing OCI Kubernetes to VCN-Native Cluster with Terraform
Your OCI Kubernetes Cluster might have a little tool tip which states “migration required”. This is because, “in earlier releases (before March 16, 2021), Container Engine for Kubernetes provisioned clusters with Kubernetes API endpoints that were not integrated into your…
Using pushdataset in PowerBI to create near real time logging dashboard
Recently i participated in a hackerthon, in which the goal was to create a near real time monitoring dashboard using Microsoft PowerBI. The data was already generated and persisted in SQLServer and needed to be queried efficiently. Since i am…
Reminder to update statistics
The other day we had a moderate complex query which involved around 270000 rows but run for over an hour. After updating the statistics the query finished in only 4 seconds. The query looked like so: WITH input_rows as (…
Kubernetes pod stuck in pending status. Nodes had no available volume
Observation A deployed pod is stuck in pending status. A describe pod gives the following warning: Warning FailedScheduling [..] 0/3 nodes are available: 3 node(s) had no available volume zone. What happened We already had a PVC (PVC_A), which we…
Airflow tasks do not run at specified time as scheduled
We observed a problem where dags did not run at the specified time at all but consistently started at a random time. Let’s dig into it. Expected Behavior: We have a job chain of three dags which are scheduled for…
How to attach block volume to VM instance in OCI
As a lot of cloud providers offer a free tier option of their products now, i wanted to run some tests on the Oracle Cloud Infrastructure (OCI) and encountered some actually trivial problems on how to increase the volume of…