so, what is a reasonable, or comfortable amount of boilerplate for setting up data?
does anyone have notebook samples they would share that gets the job done, and if there's too much or too little setup, this is an interesting topic to discuss.
I have built a pipeline from bottom-up, and i have a little bit of boilerplate in the data load, basically arrays of columns seperatly names, types, sometimes character sizes, not unlike other toolkits, but i haven't really made a study of things other than seeing some equally bulky but more curated method chaining that appears to do the same thing. In ages past, I had the pleasure of working with zookeeper and hdfs and hadoop trappings, and decided that i would donate none of my attention in pursuit of those conventions, as a matter of personal bias.
As discussions about "better practices" appear to happen in here frequently enough, this seems like the time and place to ask.