November 10, 2020

Embracing the new Analytics Platform: Databricks

Abhishek DeyAbhishekD01

Kirti Purohitkp_1122

DURATION

15 min

Background knowledge:

Databricks leverages Apache Spark for computational capabilities and supports several programming languages such as Python, R, Scala and SQL for code formulation. It is henceforth imperative for coders to have a sound understanding of the above to be able to utilize the available Databricks capabilities.

About Apache Spark: It is a lightning-fast cluster computing technology, designed for fast computation. It is an open-source, distributed processing system used for big data workloads. The main features are Spark are its ‘in-memory caching’ and ‘optimized query execution’ that increases the processing speed of the application.

You can read more about Apache Spark from here:

https://docs.databricks.com/getting-started/spark/index.html

https://spark.apache.org/docs/latest/

https://spark.apache.org/

A peep into Unified Data Analytics Platform:

The Platform can be vastly divided into following major constituents-
Screenshot 2020-11-10 08:41:41

Data Science Workspace: From data ingest to data analysis- the Workspace provides a physical location for collaborative working to your Data Science team. Based on the data practitioner’s roles, the team can utilize different functionalities. Additionally, each Workspace is connected to an organization’s cloud data store to facilitate data munging and analysis. The Workspace has 3 major components as follows-

Screenshot 2020-11-10 08:43:08

Unified Data Service: It is the engine powering the work data practitioners perform in the Data Science Workspace. The 3 major components are as follows-

Screenshot 2020-11-10 08:43:24

Enterprise Cloud Service: It allows organizations to set up, secure, manage and scale their platform. The major components include-

Screenshot 2020-11-10 08:45:00

Wish to explore more? Below is the user manual guide to help you setup Databricks and dive into the vast analytical suite it offers.

You should have a working Databricks account. If not, sign up for free Community Edition now at https://databricks.com/try-databricks

Getting Started:

These steps are illustrated on subsequent pages, this is the summary:

1.Copy the courseware URL

2.Import courseware into your Databricks account per the instructions on the following slides.

3.Create a cluster: choose Databricks Runtime 4.0 (also illustrated in the following slides)

Congratulations! You have successfully created your account. We will now guide you to login into your account.

Once you have successfully registered, this is how the profile looks.

Screenshot 2020-11-10 08:46:48

Creating Notebooks: Notebooks can be created to provide a collaborative workspace to Data Practitioners.

Screenshot 2020-11-10 08:47:00

Screenshot 2020-11-10 08:48:28

Importing Notebooks: Alternatively, notebooks can be imported for further code manipulation or simply to re-use codes.

Screenshot 2020-11-10 08:48:49

Screenshot 2020-11-10 08:50:05

Finding your Notebook: This is where you see the notebooks created.

Screenshot 2020-11-10 08:50:55

The following link will give you a detailed understanding of Databricks Notebook: Documentation- Notebooks

Creating a Cluster

Screenshot 2020-11-10 08:52:00

Interesting right? So why wait?

Unified Data Analytics is a new category of solutions that unifies data processing with AI technologies. The central theme behind adopting a Unified Data Analytics approach is to make AI much more achievable while extracting hidden and meaningful insights from the data available.

Explore how Databricks can helps individuals and organizations adopt a Unified Data Analytics approach for better performance and keeping ahead of the competition.

Databricks Sign Up

Curious enough? Read more on Databricks from here:

Databricks Concepts
Video Content for Databricks

Chat on Discord

November 10, 2020

Embracing the new Analytics Platform: Databricks

DURATION

categories

Tags

share

COMPETITIVE PROGRAMMING AT TOPCODER

Background knowledge:

A peep into Unified Data Analytics Platform:

Getting Started: