Project #1: US Election
Industry: Government
Technologies Used:
- HDFS (for storage)
- Spark SQL (for transformation)
- Spark MLlib (for machine learning)
- Zeppelin (for visualization)
Problem Statement : In the US Primary Election 2016, Hillary Clinton was nominated over Bernie Sanders from Democrats and on the other hand, Donald Trump was nominated from Republican Party to contest for the presidential position. As an analyst, you have been tasked to understand different factors that led to the winning of Hillary Clinton and Donald Trump in the primary elections based on demographic features to plan their next initiatives and campaigns.
Project #2: Design a system to replay the real time replay of transactions in HDFS using Spark.
Technology Used :
- Spark Streaming
- Kafka (for messaging)
- HDFS (for storage)
- Core Spark API (for aggregation)
Project #3: Instant Cabs
Industry: Transportation
Technologies Used :
- HDFS (for storage)
- Spark SQL (for transformation)
- Spark MLlib (for machine learning)
- Zeppelin (for visualization)
Problem Statement : A US cab service start-up (i.e. Instant cabs) wants to meet the demands in an optimum manner and maximize the profit. Thus, they hired you as a data analyst to interpret the available Uber’s data set and find out the beehive customer pick-up points & peak hours for meeting the demand in a profitable manner.
Project #4: Drop-page of signal during Roaming
Industry: Telecom
Technologies Used :
- HDFS (for storage)
- Spark SQL (for transformation)
Problem Statement : You will be given a CDR (Call Details Record) file, you need to find out top 10 customers facing frequent call drops in Roaming. This is a very important report which telecom companies use to prevent customer churn out, by calling them back and at the same time contacting their roaming partners to improve the connectivity issues in specific areas.