TidyBlocks Framework

Dragging the colors dataset block into the workspace and hitting run produces a table in the bottom right corner. This only requires three user movements: a click, a drag and drop, and another click. We can leverage these three movements to understand the basic architecture of how a pipeline of blocks within TidyBlocks runs.

In my post laying out how blockly works, you have block code to design a block (the data block is yellow and doesn’t accept inputs, for example) and behind the scenes you have the block’s generator code. When the user clicks the run button, they evaluate the generator code string. But what if you wanted to create a data frame that is the combination of two other data frames?

To do this required rethinking the entire framework we use to evaluate blocks. Instead of using the run button to evaluate a string, we need to add a layer of abstraction.

What run does now is register all the block pipelines within the workspace, giving each one a name and a queue of functions. These names can be used for more complex operations like combining pipeline left with pipeline right only once all the transformations in each pipeline queue (import colors data and filter) is complete.

We do all of this using the TidyBlocksManagerClass within tidyblocks/tidyblocks.js. The TidyBlocksManagerClass is the air traffic controller of all our pipelines. This class makes sure all the pipelines are complete (with names when there’s more than one pipeline in the workspace) and runs each pipeline’s functions in order. The TidyBlockManagerClass also keeps track if one pipeline depends on another, waiting for the first one to finish before running the second.

Function Bookends

Now we can discuss the mechanics of how and what the TidyBlocksManager is actually checking for. In simpler times, the colors dataset block held the instructions to create the actual colors dataset. Now it does that, but is also wrapped within the manager class to make sure it runs when it’s supposed to.

We accomplish this by starting the guts of any hat block (a block that initiates a data transformation/visualization pipeline) within a variable called prefix. Prefix uses the function registerPrefix which can be found within tidyblocks/tidyblocks.js. Located in the same file, registerPrefix uses the function register which has three arguments: the name of the pipeline, the functions inside the pipeline, and any other pipeline it depends on). When running a single pipeline it doesn’t need a name and it isn’t contingent on any other pipeline so the arguments we feed to register via registerPrefix are simply "".

Colors Dataset Generator Code

Blockly.JavaScript['data_colors'] = (block) => {
  const prefix = registerPrefix('')
  return `${prefix} new TidyBlocksDataFrame([
  {'name': 'black',   'red':   0, 'green':   0, 'blue':   0},
  {'name': 'red',     'red': 255, 'green':   0, 'blue':   0},
  {'name': 'maroon',  'red': 128, 'green':   0, 'blue':   0},
  {'name': 'lime',    'red':   0, 'green': 255, 'blue':   0},
  {'name': 'green',   'red':   0, 'green': 128, 'blue':   0},
  {'name': 'blue',    'red':   0, 'green':   0, 'blue': 255},
  {'name': 'navy',    'red':   0, 'green':   0, 'blue': 128},
  {'name': 'yellow',  'red': 255, 'green': 255, 'blue':   0},
  {'name': 'fuchsia', 'red': 255, 'green':   0, 'blue': 255},
  {'name': 'aqua',    'red':   0, 'green': 255, 'blue': 255},
  {'name': 'white',   'red': 255, 'green': 255, 'blue': 255}
])`
}

The built in Prefix

// defined at the top of tidyblocks/tidyblocks.js
TIDYBLOCKS_START = '/* tidyblocks start */'

// add the prefix book end to begining of the blocks javascript string
const registerPrefix = (fill) => {
  return `${TIDYBLOCKS_START} TidyBlocksManager.register([${fill}], () => {`
}

// check for other pipeline dependencies and their outputs if they're needed
register (depends, func, produces) {
  if (depends.length == 0) {
    this.queue.push(func)
  }
  else {
    this.waiting.set(func, new Set(depends))
 }
}

Suffix

Note that the prefix ends in an open brace {. Right now we have a slice of bread and a single filling, the color dataset. We can’t add the other slice of bread directly into the color block generator code because we might want to add a slice of tomato (a filter block) or lettuce (a plotting block) to our sandwich. That’s why only blocks that don’t accept outputs can include a suffix, and we need to dynamically add a suffix to pipelines that don’t end with one of these blocks.

When the user clicks run, they’re really firing off the pictured method of the manager:

  • clear existing errors in console
  • check for a prefix
  • fixcode looks for a suffix, and adds one if not included
  • The code is evaluated
  • This registers our pipeline
  • The functions of the pipeline are queued in order
  • AND NOW we can run the queue, the actual sandwich fillings
  • Any issues with running the blocks should raise an error

The function takes on the variables within the environment: the code to be run, a table to display, a plot (if any), and error messages (if any). The environement class can be found in test/utils.js. Adding the various components we’d like to display within the webpage inside their own class instead of calling them directly allows for neat and concise testing - a topic for another blog post.

Wrapping our pipeline within a function introduces a ton more functions, passing data around into the manager before using any of it within the GUI. But it’s worth it as it allows for optimal sandwich customization, and who doesn’t want to add their own fillings?

The Join Block

Adding an empty prefix and suffix to a single pipeline is as interesting as, well, white bread. In the picture at the top of the post there are two datasets named left and right. Join combines these datasets which requires (1) the generator code to actually perform an inner_join but on a higher level also requires the finished products of both left AND right. Printing the variable code to the console from the run function helps to trace the manager’s workflow:

/* tidyblocks start */ 
TidyBlocksManager.register([], () => { 
  new TidyBlocksDataFrame ([
  {'name': 'black',   'red':   0, 'green':   0, 'blue':   0},
  {'name': 'red',     'red': 255, 'green':   0, 'blue':   0},
  {'name': 'maroon',  'red': 128, 'green':   0, 'blue':   0},
  {'name': 'lime',    'red':   0, 'green': 255, 'blue':   0},
  {'name': 'green',   'red':   0, 'green': 128, 'blue':   0},
  {'name': 'blue',    'red':   0, 'green':   0, 'blue': 255},
  {'name': 'navy',    'red':   0, 'green':   0, 'blue': 128},
  {'name': 'yellow',  'red': 255, 'green': 255, 'blue':   0},
  {'name': 'fuchsia', 'red': 255, 'green':   0, 'blue': 255},
  {'name': 'aqua',    'red':   0, 'green': 255, 'blue': 255},
  {'name': 'white',   'red': 255, 'green': 255, 'blue': 255}
  ])
  .filter(1, (row) => tbGt(2, row, (row) => tbGet(3, row, 'red'), (row) => (0)))
  .notify((name, frame) => TidyBlocksManager.notify(name, frame), 'left') 
}, 
['left']) 
/* tidyblocks end */

/* tidyblocks start */ 
TidyBlocksManager.register([], () => { 
  new TidyBlocksDataFrame([
  {'name': 'black',   'red':   0, 'green':   0, 'blue':   0},
  {'name': 'red',     'red': 255, 'green':   0, 'blue':   0},
  {'name': 'maroon',  'red': 128, 'green':   0, 'blue':   0},
  {'name': 'lime',    'red':   0, 'green': 255, 'blue':   0},
  {'name': 'green',   'red':   0, 'green': 128, 'blue':   0},
  {'name': 'blue',    'red':   0, 'green':   0, 'blue': 255},
  {'name': 'navy',    'red':   0, 'green':   0, 'blue': 128},
  {'name': 'yellow',  'red': 255, 'green': 255, 'blue':   0},
  {'name': 'fuchsia', 'red': 255, 'green':   0, 'blue': 255},
  {'name': 'aqua',    'red':   0, 'green': 255, 'blue': 255},
  {'name': 'white',   'red': 255, 'green': 255, 'blue': 255}
])
  .filter(6, (row) => tbGt(undefined, row, (row) => tbGet(undefined, row, 'green'), (row) => (0)))
  .notify((name, frame) => TidyBlocksManager.notify(name, frame), 'right') 
}, 
['right']) 
/* tidyblocks end */
  
/* tidyblocks start */ 
TidyBlocksManager.register(['left', 'right'], () => { 
    new TidyBlocksDataFrame([])
    .join((name) => TidyBlocksManager.getResult(name), 'left', 'red', 'right', 'green')
    .plot(environment, {}) 
}, []) 
/* tidyblocks end */

Each pipeline is wrapped in its own bread, /* tidyblocks start */ and /* tidyblocks end */. left has an empty array in its prefix since it doesn’t need any pipeline to run, but it is called left within its suffix [we use the notify block to give this pipeline its name]. We repeat this for the pipeline we named right.

Conversely, the join pipeline has left and right included in its prefix; it needs to wait for both of those pipelines before it can run and uses the function getResult to grab these two newly created datasets. The suffix of the join block is an empty array because we’re done here, it will not be the input of another pipeline.

Understanding that there is a layer of functions on top of the actual functions each block is named after isn’t just a fun exercise in code complexity, it is the first line of defense in debugging. Up next - writing unit tests to make sure your code produces kosher sandwiches.

Avatar
Maya Gans
Data Scientist

Maya’s work as a Master’s student was focused in quantitative biology. She loves coding and is extremely passionate about data science and data visualization.

Related