This is really good, especially the hint about sho...
# functional
j
This is really good, especially the hint about short circuiting the failure. My question is about efficiency and the next steps. My structure is the full data set and I am trying to pick out and label element pairs that match each set of predicates. So I have a set of set of predicates. The algorithm above will allow me to process a single set. I really need the short circuiting - for example, if the size is not the same there's no reason to compute the SHA512 hash (which is a lazy hash property above). If the hash matches I want to label the elements as IDENTICAL. Now for the next step, imagine I have fuzzy matchers based on different logical combinations of predicates, and if all the combinations in a matcher return true I apply the label associated with that matcher to the data. I currently traverse the structure for each matcher. What I want to do instead is traverse the structure only once as a sequence. So one approach would be to have a 6-tuple structure (imagining that I have six matchers).. And the short-circuited operations also share their constituent tests with each other (perhaps on the actual data through memoization as is currently the case with hash). Keep in mind my algorithm is already O(n^2) as I need to compare every data element against every other data element.. So I want to minimize both the necessary calculations as well as the traversals, creating no intermediate data structures. Here's is how my analyse function above is being called:
Copy code
structure.map {data -> analyse(data, structure)}.zip(structure.map{ it.reference}).filter{ it.first.isNotEmpty() }.forEach(::println)
And here's what I have inside the function (without any of the functional predicates/fold):
Copy code
val identical = structure.filter {
        data.size == it.size &&
        it != data &&
        it.hash == data.hash
    }.map { data ->
        Record(data, EnumSet.of(Similarity.IDENTICAL))
    }
The last step would be to merge the 6-tuples so that the resulting EnumSet has 0-6 elements. Also, what is the alternative to the zip which necessitates creating a lot of unnecessary data.. I guess I would have to change my signature?
@pakoito Should have started a thread on this, channel just not that busy.