Big Data Modeling, MapReduce, Spark, PySpark @ Santa Clara University
-
Updated
Dec 3, 2024 - HTML
Big Data Modeling, MapReduce, Spark, PySpark @ Santa Clara University
《大数据挖掘技术》@复旦 课程项目,试图从搜狗实验室用户查询日志数据(2008)中找出搜索记录中有较高支持度关键词的频繁二项集。在实现层面上,我搭建了一个由五台服务器组成的微型 Hadoop 集群,并且用 Python 实现了 Parallel FP-Growth 算法中的三个 MapReduce 过程。
KMeans, Cure and Canpoy algorithms are demonstrated using Pyspark.
基于Item-based CF和XGBRegressor完成的用户对商品的推荐系统
Using hadoop to utilize data from an automobile tracking platform that tracks the history of important incidents after the initial sale of a new vehicle.
Lambda to start EMR and run a map reduce job
A REST-based service that translates the SQL query into MapReduce and Spark jobs. It runs these jobs and provides the JSON object. SQL to MapReduce and Spark translator.
This repository have codes that extracts meaningful information from News headline data-set.
A repository containing the source codes for the assignments done as a part of the Big Data course (UE18CS322) at PES University.
Hadoop MapReduce Python
⚡️공개용 맵리듀스 플랫폼인 Spark를 사용하여 데이��마이닝을 해보자⚡️
Lightweight and extensible library to execute MapReduce-like jobs in Python
Pulled 10GB ofYelp Business data through the terminal via Kaggle API. The data was then pushed to and AWS S3 Bucket bucket for storage and analyzed on a Elastic MapReduce Cluster on a Jupyter Notebook using PySpark
Market basket analysis of finding frequent itemsets using SON algorithm in Spark
MapReduce is a programming model and an associated implementation for processing and generating large data sets. Users specify a map function that processes a key/value pair to generate a set of intermediate key/value pairs, and a reduce function that merges all intermediate values associated with the same intermediate key. Many real-world tasks���
Performing Map reduce to get the page rank on the WDC data.
Multiprocessing can be an effective way to speed up a time-consuming workflow via parallelization. This article illustrates how multiprocessing can be utilized in a concise way when implementing MapReduce-like workflows.
Understand how map reduce works for parsing a text data with parallel processing of sub tasks using multi threading
Apache Hadoop docker image | Running Python MapReduce
Add a description, image, and links to the mapreduce-python topic page so that developers can more easily learn about it.
To associate your repository with the mapreduce-python topic, visit your repo's landing page and select "manage topics."