#partitioning

[ follow ]
Data science
fromInfoQ
1 day ago

Time-Series Storage: Design Choices That Shape Cost and Performance

Normalizing series identity into a metadata table and referencing it by compact IDs reduces time-series storage by about 42% in the experiment.
Data science
frommedium.com
1 year ago

Spark Scala Exercise 22: Custom Partitioning in Spark RDDsLoad Balancing and Shuffle

Custom partitioners in Spark Scala enable optimal control over data distribution for RDDs.
[ Load more ]