正在加载图片...
[9]S.Gorlatch.Systematic efficient parallelization of scan if (start <i) and other list homomorphisms.In L.Bouge,P.Fraigni- Emit(text.substr(start,i-start),"1"); aud,A.Mignotte,and Y.Robert,editors,Euro-Par'96. Parallel Processing,Lecture Notes in Computer Science 1124.pages 401-408.Springer-Verlag.1996. 1 REGISTER_MAPPER (WordCounter); [10]Jim Gray. Sort benchmark home page. http://research.microsoft.com/barc/SortBenchmark/. /User's reduce function class Adder:public Reducer [11]William Gropp.Ewing Lusk,and Anthony Skjellum. virtual void Reduce(ReduceInput*input)( Using MPI:Portable Parallel Programming with the /Iterate over all entries with the Message-Passing Interface.MIT Press,Cambridge,MA. /same key and add the values 1999. int64 value 0; while (!input->done()){ [12]L.Huston,R.Sukthankar,R.Wickremesinghe,M.Satya- value +StringToInt (input->value()); narayanan,G.R.Ganger,E.Riedel,and A.Ailamaki.Di- input->Nextvalue(); amond:A storage architecture for early discard in inter- active search.In Proceedings of the 2004 USENIX File and Storage Technologies FAST Conference,April 2004. /Emit sum for input->key ( Emit (IntTostring(value)); [13]Richard E.Ladner and Michael J.Fischer.Parallel prefix computation.Journal of the ACM,27(4):831-838,1980. } REGISTER REDUCER (Adder); [14]Michael O.Rabin.Efficient dispersal of information for security,load balancing and fault tolerance.Journal of int main(int argc,char**argv){ the ACM.36(2):335-348.1989. ParseCommandLineFlags (argc,argv); [15]Erik Riedel,Christos Faloutsos,Garth A.Gibson,and MapReduceSpecification spec; David Nagle.Active disks for large-scale data process- ing.IEEE Computer,pages 68-74,June 2001. /store list of input files into "spec" for (int i 1;i argc;i++)( [16]Douglas Thain,Todd Tannenbaum,and Miron Livny. MapReduceInput*input spec.add_input () Distributed computing in practice:The Condor experi- input->set format ("text"); ence.Concurrency and Computation:Practice and Ex- input->set_filepattern(argv[i]); perience,2004. input->set_mapper_class ("WordCounter"); [17]L.G.Valiant.A bridging model for parallel computation. Communications of the ACM,33(8):103-111,1997. /Specify the output files: /gfs/test/freq-00000-of-00100 [18]Jim Wyllie.Spsort:How to sort a terabyte quickly. /gfs/test/freq-00001-of-00100 http://alme1.almaden.ibm.com/cs/spsort.pdf. MapReduceOutput*out spec.output () out->set_filebase("/gfs/test/freg"); A Word Frequency out->set_num_tasks (100); out->set format ("text"); This section contains a program that counts the number out->set_reducerclass ("Adder"); of occurrences of each unique word in a set of input files /Optional:do partial sums within map specified on the command line. /tasks to save network bandwidth out->set_combiner_class("Adder"); #include "mapreduce/mapreduce.h" /Tuning parameters:use at most 2000 /User's map function /machines and 100 MB of memory per task class Wordcounter public Mapper spec.set_machines(2000); public: spec.set_map_megabytes(100); virtual void Map(const MapInput&input)( spec.set_reduce_megabytes(100); const string&text input.value(); const int n text.size(); /Now run it for(inti=0;i<n;)( MapReduceResult result; /skip past leading whitespace if (!MapReduce(spec,&result))abort () while ((i n)6&isspace(text [i])) /Done:'result'structure contains info 1++; /about counters,time taken,number of /Find word end /machines used,etc. int start =i; while ((i<n)&6 !isspace(text [i])) return 0; 1++: To appear in OSDI 2004 13[9] S. Gorlatch. Systematic efficient parallelization of scan and other list homomorphisms. In L. Bouge, P. Fraigni￾aud, A. Mignotte, and Y. Robert, editors, Euro-Par’96. Parallel Processing, Lecture Notes in Computer Science 1124, pages 401–408. Springer-Verlag, 1996. [10] Jim Gray. Sort benchmark home page. http://research.microsoft.com/barc/SortBenchmark/. [11] William Gropp, Ewing Lusk, and Anthony Skjellum. Using MPI: Portable Parallel Programming with the Message-Passing Interface. MIT Press, Cambridge, MA, 1999. [12] L. Huston, R. Sukthankar, R. Wickremesinghe, M. Satya￾narayanan, G. R. Ganger, E. Riedel, and A. Ailamaki. Di￾amond: A storage architecture for early discard in inter￾active search. In Proceedings of the 2004 USENIX File and Storage Technologies FAST Conference, April 2004. [13] Richard E. Ladner and Michael J. Fischer. Parallel prefix computation. Journal of the ACM, 27(4):831–838, 1980. [14] Michael O. Rabin. Efficient dispersal of information for security, load balancing and fault tolerance. Journal of the ACM, 36(2):335–348, 1989. [15] Erik Riedel, Christos Faloutsos, Garth A. Gibson, and David Nagle. Active disks for large-scale data process￾ing. IEEE Computer, pages 68–74, June 2001. [16] Douglas Thain, Todd Tannenbaum, and Miron Livny. Distributed computing in practice: The Condor experi￾ence. Concurrency and Computation: Practice and Ex￾perience, 2004. [17] L. G. Valiant. A bridging model for parallel computation. Communications of the ACM, 33(8):103–111, 1997. [18] Jim Wyllie. Spsort: How to sort a terabyte quickly. http://alme1.almaden.ibm.com/cs/spsort.pdf. A Word Frequency This section contains a program that counts the number of occurrences of each unique word in a set of input files specified on the command line. #include "mapreduce/mapreduce.h" // User’s map function class WordCounter : public Mapper { public: virtual void Map(const MapInput& input) { const string& text = input.value(); const int n = text.size(); for (int i = 0; i < n; ) { // Skip past leading whitespace while ((i < n) && isspace(text[i])) i++; // Find word end int start = i; while ((i < n) && !isspace(text[i])) i++; if (start < i) Emit(text.substr(start,i-start),"1"); } } }; REGISTER_MAPPER(WordCounter); // User’s reduce function class Adder : public Reducer { virtual void Reduce(ReduceInput* input) { // Iterate over all entries with the // same key and add the values int64 value = 0; while (!input->done()) { value += StringToInt(input->value()); input->NextValue(); } // Emit sum for input->key() Emit(IntToString(value)); } }; REGISTER_REDUCER(Adder); int main(int argc, char** argv) { ParseCommandLineFlags(argc, argv); MapReduceSpecification spec; // Store list of input files into "spec" for (int i = 1; i < argc; i++) { MapReduceInput* input = spec.add_input(); input->set_format("text"); input->set_filepattern(argv[i]); input->set_mapper_class("WordCounter"); } // Specify the output files: // /gfs/test/freq-00000-of-00100 // /gfs/test/freq-00001-of-00100 // ... MapReduceOutput* out = spec.output(); out->set_filebase("/gfs/test/freq"); out->set_num_tasks(100); out->set_format("text"); out->set_reducer_class("Adder"); // Optional: do partial sums within map // tasks to save network bandwidth out->set_combiner_class("Adder"); // Tuning parameters: use at most 2000 // machines and 100 MB of memory per task spec.set_machines(2000); spec.set_map_megabytes(100); spec.set_reduce_megabytes(100); // Now run it MapReduceResult result; if (!MapReduce(spec, &result)) abort(); // Done: ’result’ structure contains info // about counters, time taken, number of // machines used, etc. return 0; } To appear in OSDI 2004 13
<<向上翻页
©2008-现在 cucdc.com 高等教育资讯网 版权所有