this is more of a design query, so please move if required.. we have a very high volume of logistics supply chain data being stored and analysed within Azure Data Lake service… it’s then written out to Azure Data Warehouse once ready to ‘serve it up’.
we have a couple of situations where we need to ‘lookup’ the history of some items in this supply chain and see what has happened to them at some unknown, prior time in the past.. the lookup table has the propensity to get very very large.
a question might be.. given that this item x is being packed into item y, has item y ever previously been marked as defective? We don’t know at which point in time it could have been defective..
maybe yesterday, maybe last week, maybe 6 months ago – and this is where i am struggling a bit… i keep returning, in my mind, to a vast table in a database that i can simply scan (with suitable indexes) and verify if we have ever seen the prior event behaviour…
how is this problem best solved? i can’t really partition my data on anything as the we will need to scan over everything anyway as we don’t know when the previous event happened or really where it happened – which rules out partition on time or physical location.