Data Organization and Workflow Practices for Efficient Search and Retrieval of Big Data using Apache Spark
Citation
McCombs, James. "Data Organization and Workflow Practices for Efficient Search and Retrieval of Big Data using Apache Spark." Statewide IT 2024, Bloomington, IN. 23 April 2024.
Description
Data organization, processing workflow, and cluster configuration must be considered for efficient processing of large datasets using Apache Spark. Appropriate data organization enables more robust use of the Spark API and efficient workflows. Proper cluster configuration is essential for the best allocation of resources. We present ways to meet these considerations in the context of the search and retrieval of social media being analyzed by the IU Observatory on Social Media (OsOme).
Date
Apr 2024
Staff
Type
Presentation