Contribute to Open Source. Search issue labels to find the right project for you!

Have CTExplainer leverage TupleBundler


As of right now, CTExplainer’s explainRow and explainCell methods rely on a hack to compute statistical metrics for values. Now that we have TupleBundler and compileForSamples() (at least in the TupleBundler branch), it would make sense to start leveraging this functionality to produce samples in the backend.

One quick caveat is that most databases will freak out with respect to schemas changing in the middle of a query, so this task is really dependant on #193 happening first.

Updated 17/04/2017 21:47

Convert TypeInference lens into an Adaptive Schema


When Arindam first created the TypeInference lens several years back, we didn’t have Adaptive Schemas. We do now. TypeInference is really something that belongs as an AdaptiveSchema. One nifty consequence is that we might be able to do knowledge inference entirely in the backend db, something along the lines of: CREATE TABLE TYPES(name varchar, regexp varchar); CREATE LENS TYPE_CHOICE AS SELECT 'col' as col,, COUNT(*) AS count FROM (SELECT col FROM input LIMIT 10000) input, types WHERE regexp_match(types.regexp, input.col)) WITH KEY_REPAIR(col, SCORE_BY(count))

Updated 17/04/2017 21:46



Mimir should be able to replicate both MayBMS and MCDB query semantics.

I propose the following semantics: * SELECT POSSIBLE * SELECT N WORLDS

MayBMS Possible Queries

SELECT POSSIBLE A, B, ... FROM ... Computes and returns the full set of results that could appear from SELECT A, B, ... FROM ... in any possible world. Each tuple (each distinct lineage formula) is emitted exactly once, regardless of how many possible worlds it appears in (although different tuples with the same values may be emitted). An optional CONF() function call emits the confidence of the row.


  1. This isn’t always possible. For Discrete models, we could use something like a Table-Generating Function to emit all possible values of a given VGTerm. However, for queries that depend on continuous models (and can’t be discretized, like if the model only influences a selection predicate), we might need to do something like emit a placeholder value instead.
  2. This is going to completely mess with aggregation…

MCDB Sample Queries

SELECT 10 WORLDS A, B, ... FROM ... Computes the values of 10 possible results for the query SELECT A, B, ... FROM .... In contrast to SELECT POSSIBLE, SELECT WORLDS emits 10 distinct sets of (possibly overlapping) results. An optional WORLD_ID()function call emits the world ID (a value from 0 to the number of worlds requested).

Updated 12/03/2017 16:32

Either Lens


Create a new lens to support expressions of the form PICK_ONE(…)


Optional thoughts… * PICK_ONE : Allow a different choice for each row * PICK_ONE_GLOBALLY : Effectively a form of Schema Matching * PICK_ONE_BY : Maybe a group-by formulation of the same problem

Use case: * We have a dataset with columns A, B that conceptually represent the same thing. In other words, you want to assume that A = B everywhere, but if that assumption is violated you want to be notified.

Updated 30/03/2017 00:16

Box for explanation of missing value lens result is not showing at all


When I ran Mimir, I replaced a null value resulting from a type inference lens (A3 in a numeric column is replaced with null), the estimation result showed, but the explanation box didn’t show. Below is the process I did for product, rating1, rating2 tables.

  1. load ratings1.csv, ratings2.csv (rating cell with product id P2345 is A3)
  2. Missing Value for rating1 in rating column. (rating with product id P2345 is 4 now)
  3. click on 4, nothing showed.
Updated 04/02/2017 17:45

Fork me on GitHub