Map-reduce for Repy

Creating a MapReduce framework for Repy from scratch!

This is the personal home of my distributed systems capstone project, Map-reduce for Repy.  This project was submitted for CSE 490H in Winter 2009.

Project Name: Map-reduce for Repy

Team Members: Alper Sarikaya

Google Mentors: Charlie Garrett

Quick Synopsis: This project aims to develop a map-reduce platform for the UW CSE project (named) Seattle using the Repy (restricted Python) language and explores the ramifications of working over WAN with a multitude of vessels.  This distribution of map-reduce tasks is controlled by a primary node that controls a given set of vessels (unit of computational resource, acquired for free) and allocates data, runs a pre-prepared map() and reduce() method on the nodes in a parallel fashion, and returns and aggregates all of the computed data to the primary node.  For fault toleratance, all lines of communication are timeout sockets; in some instances it may be necessary to bring up an extra node to overtake a downed node’s work.

The code is available from Seattle’s SVN in /trunk/repy/apps/mapreduce/.  The two main files are mapred.repy and mappri.repy.

The report can be found here! (docx format; pdf format also available)

The slidedeck from the presentation is now available.  pptx, pdf

Leave a Reply