Data Organization and Workflow Practices for Efficient Search and Retrieval of Big Data using Apache Spark

Line drawing of person pointing to a projection screen.

Citation

McCombs, James. "Data Organization and Workflow Practices for Efficient Search and Retrieval of Big Data using Apache Spark." Statewide IT 2024, Bloomington, IN. 23 April 2024.

Description

Data organization, processing workflow, and cluster configuration must be considered for efficient processing of large datasets using Apache Spark. Appropriate data organization enables more robust use of the Spark API and efficient workflows. Proper cluster configuration is essential for the best allocation of resources. We present ways to meet these considerations in the context of the search and retrieval of social media being analyzed by the IU Observatory on Social Media (OsOme).

Date

Apr 2024

Type

Presentation