An HDInsights cluster consists of several nodes. A PySpark script may run on any one of those nodes.
In order to install a package on HDInsights we need to use Script Actions
Script Action allows you to run a shell script on all nodes of an Azure cluster. You may run the script action when you create the cluster or while the cluster is running.
You may also persist a Script Action in the cluster, so that it is run on every new node that is added to the cluster.
This script will install and update the given packages in the parameters section.
For example, you may install a new version of
Plotly to use in your
Your script action must be in a location accessible to the cluster. It could be anywhere, but let’s upload it to storage account.
Upload your Script Action to a Storage Account that your chosen HDInsights have access to.
Bash script URI https://
- Select the cluster that you want to run the Script action in Azure portal
- On the blade under Configuration, you can find Script Actions
- Select that and then click Submit New
- Select custom
- Give a name
- Select nodes to install
- In the parameters section give list of packages to install separated by spaces
We can also add this script to HDInsights ARM template in the
__NB: You may create other Script Actions to do any kind of customisations on your HDInsights Cluster_
Author: Deepu Mohan Puthrote
This work by Deepu Mohan Puthrote is licensed under a Creative Commons Attribution 4.0 International License