PDB-Hadoop: Parallelising Legacy applications on the Protein Databank using Apache Hadoop

Jamie AlNasir, Hugh Shanahan

Research output: Contribution to journalArticlepeer-review

Abstract

We provide a framework that facilitates the parallel execution of protein structure analysis tools to be carried out on the entire (or large subsets of) the Protein Databank (PDB) using the Apache Hadoop platform. The framework is desgined so that structural Biologists can use the Hadoop platform without having to write the relatively complex Java code that Hadoop is implemented for.The framework is easily scalable and uses a mapper architecture that functions stand-alone or can be extended to include further map-reduce operations.
Original languageEnglish
JournalBioinformatics
Publication statusIn preparation - 2015

Cite this