Job Definition
Defining specifications for a compute job
Job definitions can be specified by visiting the Jobs
tab on the left-hand sidebar and clicking on the +
icon next to New Job
. Currently, Kaspian supports Python
and PySpark
jobs.
Python Jobs
The following parameters can be modified:
- Name: User-friendly name of the job
- Description: A longer text field for describing the purpose of the job
- Code Source: Provide the Git HTTPS URL that points to the entrypoint file. Kaspian will clone the repo in the job and run the file from its directory. The file can be on any branch in any repository so long as the connected Git integration has read access to it.
- Cluster: Select the Cluster to use for running the job. Python Jobs can only be configured to run on Single-Node Clusters.
- Environment: Select the Environment to use for running the job. Take care to ensure all necessary dependencies are included.
- Environment Variables: Environment variables can be set for the job by providing them as a JSON object. Keys and values should be set as strings.
An example job configuration is shown here
PySpark Jobs
The following parameters can be modified:
- Name: User-friendly name of the job
- Description: A longer text field for describing the purpose of the job
- Code Source:Provide the Git HTTPS URL that points to the entrypoint file. Kaspian will clone the repo in the job and run the file from its directory. The file can be on any branch in any repository so long as the connected Git integration has read access to it.
- Cluster: Select the Cluster to use for running the job. PySpark Jobs can only be configured to run on Spark Clusters.
- Environment: Select the Environment to use for running the job. Take care to ensure all necessary dependencies are included.
- Environment Variables: Environment variables can be set for the job by providing them as a JSON object. Keys and values should be set as strings.
- Spark Configuration Override: Spark jobs allow for providing additional SparkConf parameters or overriding ones defined at the cluster level. The conf parameters must be provided as a JSON object.
An example job configuration is shown here