Map-Reduce Platform
|
- J. Dean and S. Ghemawat. MapReduce: Simplified Data Processing on Large Clusters. In OSDI, pages 137–150, 2004
- S. Ghemawat, H. Gobioff, and S.-T. Leung. The Google file system. SIGOPS Oper. Syst. Rev., 37(5):29–43, 2003.
- T. White. Hadoop: The Definitive Guide. O’Reilly Media, Inc., 3rd edition, 2012.
|
Map-Reduce
High-Level Languages
|
- K. Beyer, V. Ercegovac, R. Gemulla, A. Balmin, M. Y.
Eltabakh, C.-C. Kanne, F. Ozcan, and E. Shekita. Jaql: A scripting
language for large scale semi-structured data analysis. In PVLDB,
volume 4, 2011.
- E. Friedman, P. Pawlowski, and J. Cieslewicz.
SQL/MapReduce: a practical approach to self- describing, polymorphic,
and parallelizable user-defined functions. Proc. VLDB Endow.,
2(2):1402– 1413, 2009.
- A. Gates, O. Natkovich, S. Chopra, P. Kamath, S.
Narayanam, C. Olston, B. Reed, S. Srinivasan, and U. Srivastava.
Building a highlevel dataflow system on top of mapreduce: The pig
experience. PVLDB, 2(2):1414–1425, 2009.
- A. Thusoo, J. S. Sarma, N. Jain, Z. Shao, P. Chakka,
S. Anthony, H. Liu, P. Wyckoff, and R. Murthy. Hive - A Warehousing
Solution Over a Map-Reduce Framework. PVLDB, 2(2):1626–1629, 2009.
|
Map-Reduce
Workflow Managment
|
- Y. Amsterdamer, S. B. Davidson, D. Deutch, T. Milo,
J. Stoyanovich, and V. Tannen. Putting lipstick on pig: Enabling
database-style workflow provenance. PVLDB, pages 346–357, 2011.
- H. Lim, H. Herodotou, and S. Babu. Stubby: A
Transformation-based Optimizer for MapReduce Workflows. PVLDB,
5(11):1196–1207, 2012.
- K. Morton, M. Balazinska, and D. Grossman.
Paratimer: a progress indicator for mapreduce dags. In Proceedings of
the 2010 international conference on Management of data, pages 507–518,
2010.
- K. Morton, A. Friesen, M. Balazinska, and D.
Grossman. Estimating the progress of mapreduce pipelines. In ICDE,
pages 681–684, 2010.
- C. Olston, G. Chiou, L. Chitnis, F. Liu, Y. Han, M.
Larsson, A. Neumann, V. B. N. Rao, V. Sankara- subramanian, S. Seth, C.
Tian, T. ZiCornell, and X. Wang. Nova: continuous pig/hadoop workflows.
In SIGMOD Conference, pages 1081–1090, 2011.
- Oozie.
http://incubator.apache.org/oozie/map-reduce-cookbook.html.
|
Map-Reduce
Indexing and Query Optimization
|
- D. J. Abadi. Tradeoffs between parallel database
systems, hadoop, and hadoopdb as platforms for petabyte-scale analysis.
In SSDBM, pages 1–3, 2010
- A. Abouzeid, K. Bajda-Pawlikowski, and A. R. Daniel
Abadi, Avi Silberschatz. HadoopDB: An Architectural Hybrid of MapReduce
and DBMS Technologies for Analytical Workloads. In VLDB, pages 922–933,
2009.
- F. N. Afrati and J. D. Ullman. Optimizing joins in a
map-reduce environment. In Proceedings of the 13th International
Conference on Extending Database Technology, pages 99–110, 2010.
- S. Blanas, J. M. Patel, V. Ercegovac, J. Rao, E. J.
Shekita, and Y. Tian. A comparison of join algorithms for log
processing in mapreduce. In Proceedings of the 2010 international
conference on Management of data, pages 975–986, 2010.
- Y. Bu, B. Howe, M. Balazinska, and M. D. Ernst.
Haloop: efficient iterative data processing on large clusters. Proc.
VLDB Endow., 3(1-2):285–296, 2010.
- S. Chen. Cheetah: a high performance, custom data
warehouse on top of mapreduce. Proc. VLDB Endow., pages 1459–1468, 2010.
- J. Dittrich, J.-A. Quiane-Ruiz, A. Jindal, Y.
Kargin, V. Setty, and J. Schad. Hadoop++: Making a yellow elephant run
like a cheetah (without it even noticing). In VLDB, volume 3, pages
518–529, 2010.
- J. Dittrich, J.-A. Quiane-Ruiz, S. Richter, S.
Schuh, A. Jindal, and J. Schad. Only Aggressive Elephants are Fast
Elephants. PVLDB, 5(11):1591–1602, 2012.
- I. Elghandour and A. Aboulnaga. Restore: reusing
results of mapreduce jobs. Proc. VLDB Endow., 5(6):586–597, 2012.
- H. Herodotou and S. Babu. Profiling, what-if
analysis, and cost-based optimization of mapreduce programs. PVLDB,
4(11):1111–1122, 2011.
- D. Jiang, B. C. Ooi, L. Shi, and S. Wu. The
performance of mapreduce: an in-depth study. Proc. VLDB Endow., pages
472–483, 2010.
- D. Jiang, A. K. H. Tung, and G. Chen.
Map-join-reduce: Toward scalable and efficient data analysis on large
clusters. IEEE Trans. on Knowl. and Data Eng., pages 1299–1311, 2011.
- B. Li, E. Mazur, Y. Diao, A. McGregor, and P.
Shenoy. A platform for scalable one-pass analytics using mapreduce. In
SIGMOD, pages 985–996, 2011.
- T. Nykiel, M. Potamias, C. Mishra, G. Kollios, and
N. Koudas. Mrshare: sharing across multiple queries in mapreduce. Proc.
VLDB Endow., pages 494–505, 2010.
- A. Pavlo and et al. A comparison of approaches to
large-scale data analysis. In SIGMOD, pages 165–178, 2009.
- R. Vernica, M. J. Carey, and C. Li. Efficient
parallel set-similarity joins using mapreduce. In Pro- ceedings of the
2010 international conference on Management of data, pages 495–506,
2010.
- H.-c. Yang, A. Dasdan, R.-L. Hsiao, and D. S.
Parker. Map-reduce-merge: simplified relational data processing on
large clusters. In Proceedings of the 2007 ACM SIGMOD international
conference on Management of data, pages 1029–1040, 2007.
- M. Zaharia, A. Konwinski, A. D. Joseph, R. Katz, and
I. Stoica. Improving mapreduce performance in heterogeneous
environments. In Proceedings of the 8th USENIX conference on Operating
systems design and implementation, pages 29–42, 2008.
- J. Jestes, K. Yi, and F. Li. Building wavelet
histograms on large data in mapreduce. PVLDB, pages 109–120, 2011.
- Rares Vernica, Andrey Balmin, Kevin S. Beyer, Vuk
Ercegovac: Adaptive MapReduce using situation-aware mappers. EDBT 2012:
420-431
|
Map-Reduce
Physical Layout Optimizations
|
- J. Ekanayake, H. Li, B. Zhang, T. Gunarathne, S.-H.
Bae, J. Qiu, and G. Fox. Twister: a runtime for iterative mapreduce. In
Proceedings of the 19th ACM International Symposium on High Performance
Distributed Computing, pages 810–818, 2010.
- A. Jindal, J.-A. Quiane-Ruiz, and J. Dittrich.
Trojan data layouts: right shoes for a running elephant. In Proceedings
of the 2nd ACM Symposium on Cloud Computing (SOCC), pages 1–14, 2011.
- M.Y. Eltabakh,Y. Tian, F. Ozcan, R. Gemulla, A. Krettek, and J. McPherson. Cohadoop : Flexible data
placement and its exploitation in hadoop. PVLDB, 4(9):575–585, 2011.
- H. Herodotou, H. Lim, G. Luo, N. Borisov, L. Dong,
F. B. Cetin, and S. Babu. Starfish: A self-tuning system for big data
analytics. In CIDR, pages 261–272, 2011.
- Y. Lin, D. Agrawal, C. Chen, B. C. Ooi, and S. Wu.
Llama: leveraging columnar storage for scalable join processing in the
mapreduce framework. In Proceedings of the 2011 international
conference on Management of data, pages 961–972, 2011.
- Y. Xu, P. Kostamaa, and L. Gao. Integrating hadoop
and parallel dbms. In Proceedings of the 2010 international conference
on Management of data, pages 969–974, 2010.
- Avrilia Floratou, Jignesh M. Patel, Eugene J.
Shekita, Sandeep Tata: Column-Oriented Storage Techniques for
MapReduce. PVLDB 4(7): 419-429 (2011)
|
Map-Reduce
Statistical, Mining, and Approximation Algorithms
|
- B. Bahmani, R. Kumar, and S. Vassilvitskii. Densest
subgraph in streaming and mapreduce. PVLDB, pages 454–465, 2012.
- S. Das, Y. Sismanis, K. S. Beyer, R. Gemulla, P. J.
Haas, and J. McPherson. Ricardo: integrating R and Hadoop. In SIGMOD,
pages 987–998, 2010.
- R.Gemulla, E. Nijkamp, P .J . Haas, and Y. Sismanis. Large-scale matrix factorization with distributed
stochastic gradient descent. In KDD, pages 69–77, 2011.
- R. Groverand M. J. Carey. Extending Map-Reduce for Efficient Predicate-Based Sampling .In ICDE,
pages 486–497, 2012.
- N. Laptev, K. Zeng, and C. Zaniolo. Early accurate
results for advanced analytics on MapReduce. Proc. VLDB Endow.,
5(10):1028–1039, 2012.
- N. Pansare, V. R. Borkar, C. Jermaine, and T.
Condie. Online Aggregation for Large MapReduce Jobs. PVLDB,
4(11):1135–1145, 2011.
- The Apache Software Foundation. Mahout.
http://mahout.apache.org/.
- The RevolutionAnalytics Foundation. Rhadoop.
https://github.com/RevolutionAnalytics/RHadoop.
|
Map-Reduce Online
Processing and Provenance Management
|
- T. Condie, N. Conway, P. Alvaro, J. M. Hellerstein,
K. Elmeleegy, and R. Sears. Mapreduce online. In NSDI, pages 313–328,
2010.
- D. Crawl, J. Wang, and I. Altintas. Provenance for
MapReduce-based data-intensive workflows. In Proceedings of the 6th
workshop on Workflows in support of large-scale science (WORKS), pages
21–30, 2011.
- R. Ikeda, H. Park, and J. Widom. Provenance for
generalized map and reduce workflows. In CIDR, pages 273–283, 2011.
- N. Khoussainova, M.Balazinska, and D. Suciu. PerfXplain: debugging MapReduce job performance.
Proc. VLDB Endow., 5(7):598–609, 2012.
- H.Park, R. Ikeda, and J.Widom. Ramp: Asystemforcapturingandtracingprovenanceinmapreduce
workflows. In VLDB. Stanford InfoLab, August 2011.
|