A Hybrid On-premises and Public Cloud Attention Clustering Workflow
Citation
McCombs, James. "A Hybrid On-premises and Public Cloud Attention Clustering Workflow" 2020, July. Presented at the Practice and Experience in Research Computing 2020 workshop on Facilitation Strategies and Experiences for Research Use of Cloud Computing.
Description
Research information technology professionals frequently develop solutions for researchers who need to analyze large data sets. There is a strong cost incentive for utilizing existing on-premises resources, but use of those resources has challenges and risks that can make leveraging public cloud infrastructure a preferable option. Furthermore, it is not always obvious what the challenges and risks will be until an on-premises solution is attempted. We present an interesting case study that arose from a research project in attention clustering in macro and financial economics for which we developed an on-premises solution but realized the solution would not provide the needed performance to ingest and analyze data to keep pace with the rate the data was being generated. As an alternative, we developed a solution that utilized Google Cloud Platform’s BigQuery data analysis platform combined with an on-premises analysis front end which provided a simplified data ingestion and analysis workflow, empowering project members to more easily explore the data, make discoveries and further refine their attention clustering methods. Our experiences demonstrate that under certain conditions, a hybrid solution which strategically leverages public cloud resources like BigQuery, provides a compelling solution for analysis of large data sets.
Date
July 2020
Staff
Services
Cloud for Research
Type
Workshop