PDB-Hadoop: Parallelising Legacy applications on the Protein Databank using Apache Hadoop

Jamie AlNasir; Hugh Shanahan

PDB-Hadoop: Parallelising Legacy applications on the Protein Databank using Apache Hadoop

Research output: Contribution to journal › Article › peer-review

Abstract

We provide a framework that facilitates the parallel execution of protein structure analysis tools to be carried out on the entire (or large subsets of) the Protein Databank (PDB) using the Apache Hadoop platform. The framework is desgined so that structural Biologists can use the Hadoop platform without having to write the relatively complex Java code that Hadoop is implemented for.The framework is easily scalable and uses a mapper architecture that functions stand-alone or can be extended to include further map-reduce operations.

Original language	English
Journal	Bioinformatics
Publication status	In preparation - 2015

Cite this

@article{40212625afb94657b74d90a609ead549,

title = "PDB-Hadoop: Parallelising Legacy applications on the Protein Databank using Apache Hadoop",

abstract = "We provide a framework that facilitates the parallel execution of protein structure analysis tools to be carried out on the entire (or large subsets of) the Protein Databank (PDB) using the Apache Hadoop platform. The framework is desgined so that structural Biologists can use the Hadoop platform without having to write the relatively complex Java code that Hadoop is implemented for.The framework is easily scalable and uses a mapper architecture that functions stand-alone or can be extended to include further map-reduce operations.",

author = "Jamie AlNasir and Hugh Shanahan",

year = "2015",

language = "English",

journal = "Bioinformatics",

issn = "1367-4803",

publisher = "Oxford University Press",

}

TY - JOUR

T1 - PDB-Hadoop: Parallelising Legacy applications on the Protein Databank using Apache Hadoop

AU - AlNasir, Jamie

AU - Shanahan, Hugh

PY - 2015

Y1 - 2015

N2 - We provide a framework that facilitates the parallel execution of protein structure analysis tools to be carried out on the entire (or large subsets of) the Protein Databank (PDB) using the Apache Hadoop platform. The framework is desgined so that structural Biologists can use the Hadoop platform without having to write the relatively complex Java code that Hadoop is implemented for.The framework is easily scalable and uses a mapper architecture that functions stand-alone or can be extended to include further map-reduce operations.

AB - We provide a framework that facilitates the parallel execution of protein structure analysis tools to be carried out on the entire (or large subsets of) the Protein Databank (PDB) using the Apache Hadoop platform. The framework is desgined so that structural Biologists can use the Hadoop platform without having to write the relatively complex Java code that Hadoop is implemented for.The framework is easily scalable and uses a mapper architecture that functions stand-alone or can be extended to include further map-reduce operations.

M3 - Article

SN - 1367-4803

JO - Bioinformatics

JF - Bioinformatics

ER -