1. Posts/

Querying AWS Athena with Clojure using HugSQL

···
clojure aws hugsql

HugSQL is handsdown the best experience I've had with relational database frameworks.

I've used it with Postgres and AWS Athena. Getting started with HugSQL and postgres is simple as it is documented in https://hugsql.org.

This blog is about HugSQL configuration to query Athena. Since you are here, I assume you already know HugSQL or you are using it.

Athena JDBC my Simba

AWS provides a JDBC driver for Athena. Note that this driver is working on top of http API's provided by AWS, so we don't have jdbc connection session that is open all the time. This has some limitations.

You can read about Athena JDBC driver features in this link https://www.simba.com/drivers/athena-odbc-jdbc/

You can find the jdbc driver documentation here.

Below are the two types of configurations that worked for me with hugsql version [com.layerware/hugsql "0.5.1"]

Install the drivers

Install Athena JDBC Driver in your local repository

Athena JDBC driver is not directly available from Maven central, although third party one's exist.

The official jar can be downloaded and installed in your local maven repository with following.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
  # Download official athena jdbc driver
  wget https://s3.amazonaws.com/athena-downloads/drivers/JDBC/SimbaAthenaJDBC_2.0.9/AthenaJDBC42_2.0.9.jar

  # Install in local maven repository
  mvn install:install-file \
  -Dfile=AthenaJDBC42_2.0.9.jar \
  -DgroupId=com.simba.athena \
  -DartifactId=athena-jdbc \
  -Dversion=2.0.9 \
  -Dpackaging=jar

Configuration

Using dbhost

1
2
3
4
5
6
7
8
9
  (def athena-db {:dbtype "awsathena"
                  :host "athena.amazonaws.com"
                  :port 443
                  :dbname "my_database"
                  :classname "com.simba.athena.jdbc.Driver"
                  :AwsCredentialsProviderClass "com.amazonaws.auth.EnvironmentVariableCredentialsProvider"
                  :AWSRegion "eu-west-1"
                  :S3OutputLocation "s3://athena-query-results-eu-west-1/"
                  :LogLevel 3})

Using subprotocol

1
2
3
4
5
6
7
  (def athena-db {:dbtype "awsathena"
                  :subprotocol "awsathena"
                  :subname "//athena.eu-west-1.amazonaws.com:443/my_database;"
                  :classname "com.simba.athena.jdbc.Driver"
                  :AwsCredentialsProviderClass "com.amazonaws.auth.EnvironmentVariableCredentialsProvider"
                  :S3OutputLocation "s3://athena-query-results-eu-west-1/"
                  :LogLevel 4})

This driver will look for credentials in the AWS credentials prodiver chain.

In my case I have the credentials in environment variables, and I rotate them periodically.

Running a query

1
2
3
4
5
  -- :name get-10-items-from-my-table
  -- :doc Get 10 items for my_table
  SELECT * FROM
  my_database.my_table
  LIMIT 10
1
2
3
4
5
6
7
8
9
  (ns query-runner
    (:require [clojure.string :as str]
              [hugsql.core :as hugsql]
              [clj-time.jdbc]))

  (def map-of-db-fns (hugsql/map-of-db-fns "myqueries.sql"))

  (when-some [query-fn (get-in map-of-db-fns [:get-10-items-from-my-table :fn])]
    (query-fn athena-db))

That should do!

Let me know if this is useful in comments, thanks.