Eureka User Guide

CHANGES TO NOTE IN EUREKA V4

Connecting to Eureka App VM

Each user has been provided a unique URL to connect to your Eureka App VM. Here is how to connect to Eureka App VM:

If you abandon a session (this can happen by closing the web browser instead of disconnecting from Eureka) your session will remain active for 30 minutes before automatically disconnecting. To connect to your abandoned session, simply log back in to the custom App VM URL.

If you have forgotten your password, please contact Compass to reset your account credentials. Any links will be sent to the e-mail that you used to request your Compass User Account.

Disconnecting from Eureka App VM

When you are done working in Eureka, it is a good security practice to end your login session. This can be done by manually logging out of Guacamole to end your session, or by shutting down the VM if no other users are logged in. 

Manually logging out of Guacamole (without shutting down the VM):

Manually shutting down your VM:

Accessing Compass Data Marts in Eureka

If you have been authorized access to a Health Data Compass data mart in BigQuery, you can safely view it from your Eureka App VM. You can also download it to your Eureka App VM for further analysis on your Eureka App VM.

Via the Web User Interface

You can interact with BigQuery through the BigQuery user interface via your Eureka App VM. Simply open a web browser to https://console.cloud.google.com/bigquery. The user interface should be fairly self-explanatory. Some helpful documentation here: https://cloud.google.com/bigquery/docs/bigquery-web-ui

Important: The BigQuery Web UI at the above links will allow you to access Compass data marts in BigQuery from your local workstation. This access is not approved and will fire an alert with our security monitoring team. Only access Compass data marts in BigQuery from your Eureka App VM.

Via the Command Line

You can access BigQuery datasets using the “bq” command line utility. This is a powerful utility, and full documentation can be found here: https://cloud.google.com/bigquery/docs/bq-command-line-tool

Moving Files In & Out of Eureka App VM

Eureka is designed first and foremost to protect sensitive data files. One important aspect of this is that you cannot access the Internet directly from your app server in order to upload or download files via the usual mechanisms such as FTP, email, or web sites. Instead, you will use a specially configured location on Google Cloud Storage, called your Eureka Staging Bucket. Your Eureka Staging Bucket can be used to transfer files between your local workstation and your Eureka App VM. 

This is a two-step process: 

There are three options for doing this: Google Cloud Console, gsutil, and GCSFuse.

Important: 

 - You have the ability to use your Eureka Staging Bucket to download data to your local workstation. 

 - Just because you can do this does not mean that you should. 

 - Sensitive data such as PHI may only be downloaded to workstations or servers that comply with your institutional HIPAA policies. 

Please contact us if you have any questions.

Preinstalled Software

Each Eureka App-VM comes with the following software preinstalled:

Limited Internet Access from Eureka App VM

Eureka App VM V4 has the ability to connect to the following URLs from within Eureka via the Eureka Limited Internet App. Google Chrome is the only optimized browser to use in Eureka App VM with the limited internet access functionality.

The first time you use the Eureka Limited Internet App you will need to run the following command from your Eureka App VM and follow the prompts. It may provide you with a long URL which you should paste into a web browser in Eureka App VM and authenticate using your Eureka credentials.

gcloud auth login 

There are two ways to interact with the Eureka Limited Internet App. 

Option #1: Locate the Eureka Limited Internet App in the applications directory and select the website to which you wish to connect.

Option #2: Open a terminal window and use one of the eureka-internet commands listed below.

After you select the site from the Eureka Limited Internet App, connection to the chosen site will be allowed after a short delay. This is usually around 5 seconds, but can take up to 15. Access to the site is limited to 30 minutes, if you need the connection open for longer, re-select the site from the Eureka Limited Internet App and that will add another 30 minutes of connection. Below are the options for internet connectivity:


Limited Internet App Button: CRAN & Bioconductor

Console Command: eureka-internet-CRAN-Bioconductor

Sites:

https://cloud.r-project.org

https://www.bioconductor.org


Limited Internet App Button: Github.com

Console Command: eureka-internet-GitHub.com

Sites:

https://github.com

https://raw.githubusercontent.com


Limited Internet App Button: PyPi.org & Python.org

Console Command: eureka-internet-Python.org

Sites:

https://pypi.org

https://www.python.org


Limited Internet App Button: REDCap

Console Command: eureka-internet-RedCap

Sites:

https://redcap.ucdenver.edu


NOTE: Some R Packages require access to GitHub at the same time to CRAN so make sure you select both sites from the Eureka Limited Internet App to ensure complete installation of those packages.

When you are done with your session and no longer need to use the Eureka Limited Internet App, you can logout of GCloud by running the following command from your Eureka App VM:

gcloud auth revoke

Internet Security & Eureka App VM

Security is a group effort between you and Compass. We cannot do it without you. Please be sure to follow all rules in the Eureka User Agreement.

Some common problems with software downloaded from the internet include:

You must ensure that you have carefully reviewed software from any source for these problems, but be particularly careful with container hubs (such as Docker Hub) and software from GitHub that is not widely used. Due to the difficulty of determining the trustworthiness of software on container hubs, we discourage their use. You are responsible for vetting software you upload to Eureka.

You must not store confidential information on sites outside Eureka, unless you have received specific permission. You must never store confidential information on GitHub.

Idle Shutdown of Eureka App VM

Each Eureka instance is pre-configured to shut down the VM after 30 minutes of undetected usage of the VM. If you want to temporarily disable the idle shut down, run the following command from your VM terminal window:

If you disable the idle shut down, you are responsible for manually shutting down the VM if you are not longer using it.

The pre-configured idle shutdown will be re-enabled anytime the VM is rebooted, until then you will need to manually shut down the VM.

Using Eureka HPC

Jobs are submitted to the SLURM workload manager.  Primarily, you will use the sbatch command to submit jobs, and the squeue command to monitor jobs. You can submit any valid shell script as a job using sbatch. Once it’s submitted, Eureka HPC will create a temporary compute node just for this job. As soon as this node has no more work to do, it will be deleted.  This is the core cost saving feature of Eureka HPC

If you submit a job with sbatch and provide no options to SLURM, your job will be submitted with the following defaults:

To exceed 23 hours of runtime, you must submit to a non-default Slurm partition. Each partition corresponds to a particular Google Cloud machine type. You can see what partitions exist by typing "sinfo" and get information about their hardware resources by typing "man eureka-queues". If your job can run in a short amount of time (under 23 hours) consider submitting it to a preemptible partition so that it will run at lower cost. Otherwise, submit to a partition with the _nonpre suffix to get a standard Google Cloud instance without the spot/preemptible cost-saving feature.

Running interactive SLURM jobs

You should not run any interactive processes that require more than minimal CPU on the login node.  Things like text editors are fine, but commands like sort or R should be run as a SLURM interactive job so that they get run on powerful hardware.  You can start an interactive job by typing interact. There may be a delay of approx. 1 minute after you type this, and then you will receive a shell prompt on a compute node.

SLURM Best Practices on Eureka 

The ideal batch job is longer than a few minutes, but shorter than a day.  Jobs in this range of lengths will be scheduled more efficiently by SLURM, and so you will get more work through the system and, on average experience less queue wait time if you keep your jobs within these limits.

Batch jobs should be able to be killed and restarted without losing too much progress.  You can accomplish this by writing your code to checkpoint, or simply by breaking up a long running job into multiple shorter jobs.

Do not hard code the UNIX path of your home directory into your SLURM scripts.  If you need to reference a file in your scripts, use the shell’s $HOME variable.

Storage options on Eureka 

To minimize dollars spent on computing and storage, you must be aware of the different types of Google storage available in Eureka HPC:

Google Cloud Storage

This is Google’s most cost effective storage method for long term storage.  Ideally, you should store both input data here, and also results.  You must move data in and out of Google Cloud Storage using the gsutil tool or the Google Cloud web console. Instructions on using Google Cloud storage can be found above.

/home

Your home directory exists only for holding files that are required for your Linux account to function.  It is small and slow, but it’s OK to temporarily allow SLURM logs to be written in your home, or other small files.

/tmp on compute nodes

Each compute node has temporary storage attached to it. This is faster and larger than /home, but exists only while the node is running and is destroyed when the node shuts down. It's only useful for working storage while your job is running, and so as the last step of your job, you must copy results that you want to save from /tmp to a permanent storage location, like Google Cloud Storage.

Google Cloud Source Repository

Each Eureka App VM instance has Google Cloud Source Repository set up and enabled for sharing code files between multiple users on a shared Eureka instance.

Note that sensitive data like PHI should never be included in code files. This includes those shared on other code sharing platforms like GitHub.