First thing I would investigate about working with...
# datascience
a
First thing I would investigate about working with such large data sets is whether the whole set is needed simultaniously. Python programs usually tend to load the whold dataset in the memory and work with it due to limitations of the ecosystem, but if if you need a map-reduce style processing, you do not need to load it all and only load those parts, thay you need. Or even better - load only the structure and metadata and load data lazily on-demand.