Using Eureka App VM
Connecting to Eureka App VM
Each User has been provided a unique URL to connect to your Eureka App VM. Here is how to connect to Eureka App VM:
Using the current (HDC recommends using an incognito window) version of Chrome or Firefox web browser go to your custom App VM URL. (Private or Incognito windows are okay to use)
Use your Compass User Account to authenticate with Google (including 2-factor authentication).
Use your Compass User Account to log into NoMachine (use all lower case for username). If you get a Server Error message that is because Google is starting up the VM (it may take up to 3 minutes for the VM to start), refresh the browser and then you'll be in.
In the NoMachine window select 'Create a New Virtual Desktop' and select continue, unless you have an existing session saved from before then select that saved session.
You're in your App VM once you see the Centos7 background image!
If you are going to use RStudio to connect to your App VM, connect to your App VM first as this will create the home directory needed for RStudio to run.
If you abandon a session (this can happen by closing the web browser instead of disconnecting from Eureka) your session will remain active for 90-minutes before automatically disconnecting. To connect to your abandoned session, log back in to the custom App VM URL and at Step 4 instead of selecting 'Create a New Virtual Desktop' select 'Connect to local...' and select continue.
Disconnecting from Eureka App VM
When you are done working in your Eureka App VM it is best to manually log out of NoMachine. By manually logging out of NoMachine you free up the license (each Eureka instance comes with 4 user licenses).
Manually Logging out of NoMachine (but not turn off App VM):
In the upper right hand corner of your NoMachine screen, click on the triangle and a drop down window will appear.
Click on your name and another sub-menu will appear.
Select 'Log Out' and this will log you out of your NoMachine session.
Manually Turning off App VM from NoMachine:
In the upper right hand corner of your NoMachine screen, click on the triangle and a drop down window will appear.
Click on the power off symbol and a new pop up window will appear.
If there are multiple users on the App VM their names will display. If you want to power off the App VM select Power Off (this will apply to all users on the App VM). You can also restart the App VM by selecting Restart.
Frequently Asked Questions About Connecting to Eureka
What if I forgot my Compass User Account name and/or password?
Please contact Compass to reset your account credentials. These will be sent to the e-mail that you used to request your Compass User Account.
I received a Server Error message after logging into NoMachine using my Compass User Account (step 3 above). What should I do?
This means that Google is starting up the App VM. It may take up to 3 minutes for the App VM to start). After this time has passed please refresh the browser and then you'll be in.
What does the message ‘Reached the maximum number of concurrent sessions on this server’ mean?
By default, each App VM allows up to 4 active sessions at one time. This message means that all 4 sessions are occupied. To free up a session on the App VM we first recommend coordinating access to sessions with users on your team and encourage users to properly disconnecting from the App VM when those sessions are no longer needed. If you need help beyond this contact Compass.
Additional Eureka App VM Features
Accessing Compass Data Marts in Eureka
If you have been authorized access to a Health Data Compass data mart in BigQuery, you can safely view it from your Eureka App VM. You can also download it to your Eureka App VM for further analysis on your Eureka App VM.
Via the Web User Interface
You can interact with BigQuery through the BigQuery user interface via your Eureka App VM. Simply open a web browser to https://console.cloud.google.com/bigquery. The user interface should be fairly self-explanatory. Some helpful documentation here: https://cloud.google.com/bigquery/docs/bigquery-web-ui
Important: The BigQuery Web UI at the above links will allow you to access Compass data marts in BigQuery from your local workstation. This access is not approved and will fire an alert with our security monitoring team. Only access Compass data marts in BigQuery from your Eureka App VM.
Via the Command Line
You can access BigQuery datasets using the “bq” command line utility. This is a powerful utility, and full documentation can be found here: https://cloud.google.com/bigquery/docs/bq-command-line-tool. Below are a few simple examples for common uses:
Examples using Command Line to access data
Examples: Exploring Data
See what datasets you can access in a project:
bq --project_id [project-name] ls
See what tables are in a dataset:
bq --dataset_id [project-name]:[dataset-name] ls
Show the schema of a table:
bq show [project-name]:[dataset-name].[table-name]
Show the first few rows of a table:
bq head [project-name]:[dataset-name].[table-name]
Examples: Querying Data
*Note 1: In this and all SELECT examples, if the name of the project that contains the data you are querying has a hyphen in it, you may need to surround any table identifiers with backticks, as follows: `[project-name]:[dataset-name].[table-name]`
*Note 2: In the examples below, [PROJECT]:[DATASET] refers to the project and dataset that contains the data you wish to query, not necessarily your own Eureka project.
Execute a SELECT query from the command line and view the results:
bq query --use_legacy_sql=false “select (*) from [PROJECT]:[DATASET].[TABLE]”
Execute a SELECT query from a query that’s stored in a file (for more complex queries) and view the results:
cat [LOCAL-SQL-FILENAME] | bq query --use_legacy_sql=false
Examples: Downloading Data
*Note 1: In the examples below, [PROJECT]:[DATASET] refers to the project and dataset that contains the data you wish to query, not necessarily your own Eureka project.
*Note 2: The “bq query” command will return a maximum of 16,000 rows. For larger datasets, see the example for “bq extract”
Output the results of a SELECT command to a CSV file:
bq query --use_legacy_sql=false --format=csv "select (*) from [PROJECT]:[DATASET].[TABLE]" > result.csv
Export a table to a file in your Google Cloud Storage Staging Bucket:
bq extract --destination_format CSV --field_delimiter “,” [PROJECT]:[DATASET].[TABLE] gs://[EUREKA-PROJECT]-staging/[FILENAME]
Copy a file from your Google Cloud Storage Staging Bucket to your App VM:
gsutil cp gs://[EUREKA-PROJECT]-staging/[FILENAME] [FILENAME]
Export the results of a large query (>16,000 resulting rows) to your BigQuery Staging Dataset:
1 . Query the data and store the results in a new BigQuery table.
bq query --use_legacy_sql=false –destination_table [EUREKA-PROJECT-ID]:staging.[TABLE] "select (*) from [PROJECT]:[DATASET].[TABLE]"
2. Use the instructions above to export the new table to a file in Google Cloud Storage.
3. Use the instructions above to copy the file from your Google Cloud Storage Staging Bucket to your App VM.
Moving Files In & Out of Eureka App VM
Eureka is designed first and foremost to protect sensitive data files. One important aspect of this is that you cannot access the Internet directly from your app server in order to upload or download files via the usual mechanisms such as FTP, email, or web sites. Instead, you will use a specially configured location on Google Cloud Storage, called your Eureka Staging Bucket. Your Eureka Staging Bucket can be used to transfer files between your local workstation and your Eureka App VM.
This is a two-step process:
Upload files from your app server or local workstation to your staging bucket
Download the files from your staging bucket to your app server or local workstation
There are three options for doing this: Google Cloud Console, gsutil, and GCSFuse. See below for how to use each of these options.
Important:
- You have the ability to use your Eureka Staging Bucket to download data to your local workstation.
- Just because you can do this does not mean that you should.
- Sensitive data such as PHI may only be downloaded to workstations or servers that comply with your institutional HIPAA policies.
Please contact us if you have any questions.
Using the Google Cloud Console
The Google Cloud Console provides a point-and-click graphical user interface to your staging bucket. This is a good option for ad-hoc transfer of a few small files at a time.
For scripted transfers, transfers of very large files or many files at once, use one of the other options.
Open a web browser to https://console.cloud.google.com/storage
If prompted, authenticate using your Compass User Account credentials .
Make sure the dropdown list to the right of the "Google Cloud Platform" logo on the top-left of the screen contains the name of your Eureka project. If it does not, click the down-arrow, and select your project.
Within the table called "Buckets," you will see the name of your Eureka Staging Bucket, in the format [projectname-staging]. Click the name of the bucket to open the bucket.
To upload files into your staging bucket, use the "Upload Files" or "Upload Folders" buttons. Alternatively, you can drag and drop files onto the whitespace on the bottom-right quadrant of the page to upload.
To download files from your staging bucket, click on file names to download files via your web browser and select the location you want to download the file to.
Using the gsutil Command-Line Interface
The gsutil command-line interface is extremely useful for transferring large files, large groups of files, or for scripting file transfer.
Configuring Your Credentials
gsutil is already installed on your Eureka App VM. To install gsutil on your local workstation from which you connect to your Eureka App VMs instructions are found here for Mac and here for Windows. You will need to configure your Google credentials on your Eureka App VM if you have not already done so. Run the following command from your Eureka App VM and follow the prompts. It may provide you with a long URL which you should paste into a web browser and authenticate using your Eureka credentials.
gcloud auth login
Transferring Files
The basic syntax for transferring a file using gsutil is as follows:
gsutil cp [source] [destination]
Local files are specified following usual syntax, for example ~/myfile.txt. Your bucket will be specified as gs://[projectid-staging].
Examples, assuming a project id of hdcekaxmp:
To copy a local file to your staging bucket (this works from your Eureka App VM too):
gsutil cp myfile.txt gs://hdcekaxmp-staging
To copy a file from your staging bucket to a local file:
gsutil cp gs://hdcekaxmp-staging/myfile.txt.
To copy a file from one bucket to another bucket:
gsutil cp gs://hdcekaxmp1-staging/obj
gs://hdcekaxmp2-staging/obj2
More Examples
The gsutil cp command is powerful, supporting wildcards, simultaneous file transfers, resumeable transfers, and more. For examples, see the gsutil cp documentation.
To synchronize entire folder hierarchies with your staging bucket, see the gsutil rsync command.
Using GCSFuse
GCSFuse allows you to mount your staging bucket as a folder within a Linux or MacOS filesystem. (This feature is not available on Windows systems.) You can use GCSFuse to mount your staging bucket on your Eureka App VM, your local workstation, or both.
Setting Up GCSFuse on Your Eureka App VM
GCSFuse is already installed on your Eureka App VM -- you only need to configure it.
(One time only) Execute the following two commands to authenticate to Google Cloud:
gcloud auth login [your-compass-user@account.org]
gcloud auth application-default login
(One time only) Create a folder at which to mount the bucket:
mkdir ~/gcs
(Each time you start your VM) Mount the folder, using the name of your staging bucket:
gcsfuse [projectid-staging] ~/gcs
Advanced users may wish to explore modifying fstab to mount their staging bucket by default at startup, thereby skipping Step 3. See the GCSFuse documentation for details.
Setting Up GCSFuse on Your Local Workstation
Configuring GCSFuse on your local workstation is nontrivial, but can be very useful. By mounting both your Eureka App VM and your local workstation, you can seamlessly move files between systems without making calls to gsutil. See the following links for more information:
Frequently Asked Questions: Moving data in/out of Eureka App VM
What types of data or objects can I move out of Eureka to a workspace that does not comply with my Institution's HIPAA policies?
Data that is not sensitive such as PHI can be moved outside of Eureka. This includes data that is de-identified or summarized statistically or graphically visualizations. Also make sure you are following your Institution's policies about PHI data and safe storage outside of Eureka.
What should I do if I need to move PHI out of Eureka to a workspace that does not comply with my Institution's HIPAA policies?
Before doing this please contact Compass and your Institution for further directions.
Preinstalled Applications
Each Eureka App-VM is preinstalled with the following default suite of analytical tools and applications:
Ant - Java application build tool
Apache HTTP Server - Web server
Apache Maven - Build automation tool
Apache Tomcat - Web application platform
Atom Text Editor - Code editor
Dbeaver - Database management tool
Docker - Application container manager
Eclipse Oxygen - Software development IDE
GCSFuse - Utility to mount GCS buckets to local filesystem
GNU Make - Code compilation manager
GNU Octave - Statistical analysis toolkit
Google Cloud SDK -Utilities to access & manage Google Cloud Platform resources
Java - Programming language
Keras - Machine Learning toolkit
LaTeX - Document prep system for typesetting in CLI
LibreOffice - Office productivity suite
Google Data Studio - Analytic dashboards and reporting tool
Microsoft Cognitive Toolkit - Machine Learning toolkit
Neo4j - Graph database service
Pandas - Machine Learning toolkit
pgAdmin3 - Administrative tool for PostgreSQL
PostgreSQL - Database server
PyCharm - Python IDE
R - Statistical analysis toolkit
R Studio Desktop - Desktop-based IDE for R
R Studio Server - Browser-based IDE for R
Standard Linux Dev Tools - Default packages installed from yum groupinstall "development tools"
TensorFlow - Machine Learning toolkit
Texmaker - LaTeX editor with a user interface
Valgrind - Debugging/memory management tool
Visual Studio Code - Code editor
Tip: To discover which libraries are preinstalled on your Eureka App-VM run the following in the Eureka terminal:
rpm -qa | grep devel
Updating Preinstalled Applications
Users may choose to update preinstalled applications on their Eureka App-VMs. Below you will find guidance on updating commonly used applications using the Eureka terminal.
R Studio Desktop
Run the following once
sudo yum -y install /srv/repos/eureka/7/v2/files/rstudio-2021.09.1-372-x86_64.rpm
Linux Compiler Dev Tools
Run the following once
sudo yum -y install devtoolset-10
Run the following each time before using the tool(s)
source /opt/rh/devtoolset-10/enable
Git-gui and gitk
Run the following once
sudo yum install rh-git218-git-gui rh-git218-gitk
Using R
Each Eureka instance also comes with a dedicated URL to your custom RStudio on Eureka. Before login into RStudio for the first time, make sure you have logged in to your App VM as this establishes your user home directory that will be used by RStudio as well. You can directly connect to RStudio using this URL (note there is no restriction on the number of concurrent active sessions with R Studio like there is with NoMachine). User your Compass User Account to login to RStudio.
R 3.6 is currently preinstalled on each App VM. If you need to update the version of R installed on your Eureka App VM, run the following command in the Eureka terminal:
sudo yum upgrade R
Frequently Asked Questions: Using R
How can I access a Compass Data Mart using R?
bigrquery is an option to use R from the Eureka App VM and connect to data directly in Google BigQuery. bigrquery is a package available from CRAN.
Which popular R package repositories can I access within Eureka?
See our section “Limited Internet Access from Eureka” for details on accessing sites like CRAN, Github and others for downloading packages into Eureka.
What if I need a package or library that is not available via limited internet access in Eureka presently?
Here is some guidance on how to install additional packages and libraries:
Installing R packages from other Sources
Installing R Packages from other sources is possible. Take special care that you only download and install packages from trusted sources.
For .Zip files do the following:
Download the .zip file containing the package you wish to install to your local workstation.
Follow the instructions in Moving Files Into Eureka to copy the .zip file to your Eureka virtual machine. The file can be stored anywhere on the virtual machine, but you may wish to place it in a folder to contain packages you install in this way, e.g., ~/mypackages.
From R, use install.packages(), specifying the path to the .zip file containing the package, e.g.:
install.packages('~/mypackages/the-package.zip', repos=NULL)
*Note: If you get an error stating "embedded nul in string", then the .zip file is probably suffering from the same incompatibility as described in Installing Packages from GitHub. Follow those instructions to unzip the repository and install it from its unzipped subfolder.
Stringi R Package:
This popular R package does not reside in the CRAN repository. To install from R, run the following command:
install.packages("stringi", configure.vars="ICUDT_DIR=/srv/repos/eureka/7/v2/files")
Installing missing Linux Libraries
Some R packages depend on Linux operating system libraries that may not be installed on your Eureka virtual machine by default. If install.packages returns errors about missing libraries, you can install these from the CentOS mirror maintained by Health Data Compass.
Do the following:
Identify the name of the package you wish to install, e.g., "curl"
From the command prompt of your Eureka App VM, install it using the yum package manager, e.g.:
sudo yum install curl
Installing R packages from CRAN
Access to CRAN is available through the Limited Internet Access feature. Follow the Limited Internet Access steps below to get connected to CRAN. Once CRAN is accessible you can install packages from CRAN using install.packages() in the usual way.
Installing R packages from GitHub
Many R packages are hosted on GitHub. With Limited Internet Access from Eureka, you can reach GitHub this way. For older versions of Eureka that do not have access to GitHub, the usual install_github() command in R will return an error. In addition, there is an incompatibility between GitHub and R in the way that .zip files are handled, which requires some additional steps.
Do the following:
Locate the repository containing the package you wish to install on github.com
Use the green button on the home page of the repository to download the repository in .zip format
Follow the instructions in Moving Files Into Eureka to copy the .zip file to your Eureka virtual machine. The file can be stored anywhere on the virtual machine, but you may wish to place it in a folder to contain packages you install in this way, e.g., ~/mypackages.
From the command prompt of your Eureka virtual machine, unzip the .zip file, e.g.:
unzip github-repo.zip
From R, use install.packages(), specifying the path to the folder containing the package, e.g.:
install.packages('~/mypackages/github-repo-master', repos=NULL)
Installing R packages from other Sources
Installing R Packages from other sources is possible. Take special care that you only download and install packages from trusted sources.
For .Zip files do the following:
Download the .zip file containing the package you wish to install to your local workstation.
Follow the instructions in Moving Files Into Eureka to copy the .zip file to your Eureka virtual machine. The file can be stored anywhere on the virtual machine, but you may wish to place it in a folder to contain packages you install in this way, e.g., ~/mypackages.
From R, use install.packages(), specifying the path to the .zip file containing the package, e.g.:
install.packages('~/mypackages/the-package.zip', repos=NULL)
*Note: If you get an error stating "embedded nul in string", then the .zip file is probably suffering from the same incompatibility as described in Installing Packages from GitHub. Follow those instructions to unzip the repository and install it from its unzipped subfolder.
Stringi R Package:
This popular R package requires ICU4C code that cannot be compiled on the present Eureka OS. To install the OS-compatible version of stringi (2020 version) please execute the following code:
install.packages("https://cloud.r-project.org/src/contrib/Archive/stringi/stringi_1.5.3.tar.gz", repos=NULL, type="source", configure.vars="ICUDT_DIR=/srv/repos/eureka/7/v2/files")
Installing missing Linux Libraries
Some R packages depend on Linux operating system libraries that may not be installed on your Eureka virtual machine by default. If install.packages returns errors about missing libraries, you can install these from the CentOS mirror maintained by Health Data Compass.
Do the following:
Identify the name of the package you wish to install, e.g., "curl"
From the command prompt of your Eureka App VM, install it using the yum package manager, e.g.:
sudo yum install curl
Manually Installing Dependencies
Many R packages are dependent on other R packages. Dependencies in CRAN will be resolved and installed automatically through Health Data Compass's CRAN mirror.
Unfortunately, dependencies hosted in other locations, such as GitHub, will need to be manually installed. You can simply attempt to install the base package using install.packages(), wait for an error complaining of a missing package, install the missing package, attempt to install the base package again, and repeat until all dependencies are found. But if the base package has many dependencies, it may be more efficient to view the DESCRIPTION file found within the base package .zip file. Look for the Imports and Suggests tags, which will list any required and suggested dependencies, respectively. You can then proactively install each of these dependent packages one at a time, using the instructions above, and then install the base package when all dependencies are in place.
Limited Internet Access from Eureka App VM
Eureka App VM v3 has the ability to connect to the following URLs from within Eureka via the Eureka Limited Internet App. Google Chrome is the only optimized browser to use in Eureka App VM with the limited internet access functionality.
The first time you use the Eureka Limited Internet App you will need to run the following command from your Eureka App VM and follow the prompts. It may provide you with a long URL which you should paste into a web browser in Eureka App VM and authenticate using your Eureka credentials.
gcloud auth login
There are two ways to interact with the Eureka Limited Internet App.
Option #1: Open a terminal window and type eureka-internet and then hit the tab key twice. You will then see all the possible choices you can type.
Option #2: Locate the Eureka Limited Internet App in the applications directory and select the URL you wish to connect.
After you select the site from the Eureka Limited Internet App, go to a Chrome browser within the Eureka App VM and type in the URL. This may take up to 15 minutes to connect since we have layered security in place that enables access to each URL independently. Access to the site is limited to 30 minutes, if you need the connection open for longer, re-select the site from the Eureka Limited Internet App and that will add another 30 minutes of connection.
https://code.visualstudio.com/
NOTE: Some R Packages require access to GitHub at the same time to CRAN so make sure you select both sites from the Eureka Limited Internet App to ensure complete installation of those packages.
When you are done with your session and no longer need to use the Eureka Limited Internet App, you can logout of GCloud by running the following command from your Eureka App VM:
gcloud auth revoke
Internet Security & Eureka App VM
Security is a group effort between you and Compass. We cannot do it without you. Please be sure to follow all rules in the Eureka User Agreement.
Some common problems with software downloaded from the internet include:
Outdated software with known security vulnerabilities
Software that includes poor programming or security practices
Malicious software such as viruses
You must ensure that you have carefully reviewed software from any source for these problems, but be particularly careful with container hubs (such as Docker Hub) and software from GitHub that is not widely used. Due to the difficulty of determining the trustworthiness of software on container hubs, we discourage their use. You are responsible for vetting software you upload to Eureka.
You must not store confidential information on sites outside Eureka, unless you have received specific permission. You must never store confidential information on GitHub.
Frequently Asked Questions: Limited Internet Access
How does enabling limited internet access work?
Eureka users can enable connections to any of the pre-defined URLs above from their Eureka instance using the Chrome browser installed on your Eureka App VM. After selecting the site from the Eureka Whitelist App, access to that site from Eureka will persist for 30-minutes and then the connection will automatically terminate. If you need the connection open for longer, reselect the site from the Eureka Whitelist App and it will reset the timer for another 30-minutes.
I'm in need of access to a URL not on the list above, what can I do?
Please contact Compass with the specific URL and details about why access to this site is needed. Compass will then complete an analysis and if it passes, add it to the Eureka Whitelist App, and update the list of of sites above.
Using Python with Eureka App VM
Compass highly recommends using python through pycharm and using personal virtual environment in python, using these steps:
From your Eureka App VM confirm you are logged into gcloud by opening a terminal window and entering: gcloud auth list
If you are not logged in enter gcloud auth login and follow the prompts provided.
In search bar, type white and an orange icon will appear. Click on the website which you would use to download packages.
In a Eureka App VM terminal, enter pycharm and hit enter. A GUI will appear and accept the agreement from pycharm.
Pycharm is setup to create a virtual environment, it will ask for location and name to create the virtual environment.
Once the virtual environment is created, in the left bottom corner locate an icon for terminal and click on it.
You can now install packages in this virtual environment from terminal by pip install package_name
Google Cloud Source Repository
Each Eureka App VM instance has Google Cloud Source Repository set up and enabled for sharing code files between multiple users on a shared Eureka instance.
Note that sensitive data like PHI should never be included in code files. This includes those shared on other code sharing platforms like GitHub.
Idle Shutdown of Eureka App VM
Each Eureka instance is pre-configured to shut down the VM after 30minutes of undetected usage of the VM. If you want to temporarily disable the idle shut down, run the following command from your VM terminal window:
sudo systemctl stop idleshutdown
If you disable the idle shut down, you are responsible for manually shutting down the VM if you are not longer using it.
The pre-configured idle shutdown will be re-enabled anytime the VM is rebooted, until then you will need to manually shut down the VM.