Welcome to DDS!
Distributed Data Set (DDS) is a lightweight library to ease development of PyCOMPSs applications. It provides an interface (inspired by PySpark) where programmers can load data from basic Python data structures, generators, or files, distribute the data on available nodes, and run some most common big data operations on it. By using DDS, number of code lines can be reduced, where performance improvement is not expected comparing with regular PyCOMPSs applications.
DDS has been implemented on top of PyCOMPSs programming model, and it is being developed by the Workflows and Distributed Computing group of the Barcelona Supercomputing Center.
Contents:
Source code
The source code of DDS is available online at Github.
Support
If you have questions or issues about the DDS you can join us in Slack.
Alternatively, you can send us an e-mail to support-compss@bsc.es.