From the course: Hands-On Data Science using SQL, Tableau, Python, and Spark

Unlock this course with a free trial

Join today to access over 24,200 courses taught by industry experts.

Aggregating data in Spark

Aggregating data in Spark

- Now let's actually perform some aggregations on our data that we just uploaded to Databricks. To get started, I'm going to go to Create and Notebook, and here we have a new notebook that's running Python by default. And we already have our table there, so I can actually just punch in some commands and start working with this data. So the first command that I want to run, and you can download these in the exercise files here, is just a simple select *. So this is a SQL statement that just returns everything from a table. And notice all I had to do was punch in the %sql and it changed from Python to SQL. I'll go ahead and run this cell here. And we have our data down below. So easy enough. Now let's try that again by creating a new cell, using the plus sign there. And I'll paste this command in so that way you don't have to see me fumble through it. But what we're doing here is pretty interesting. We're doing spark.sql, giving it a SQL command, which is the same one we actually just…

Contents