Globus
Contents
Globus#
Author: Fu Yin
Update: Dec 18, 2022
Reading: 30 min
Introduction#
Globus is a secure, reliable research data management service. It works as a bridge between the remote endpoint and the local endpoint, and it’s a better alternative tool for rsync.
What is the Globus?
With Globus, subscribers can move, share, publish & discover data via a single interface, whether your files live on a supercomputer, lab cluster, tape archive, public cloud or your laptop, you can manage this data from anywhere, using your existing identities, via just a web browser.
Create Account#
Create Globus ID#
Globus needs to know your identity.
Signing in with your
Google
orORCiD ID
credentials, or if yourOrganization
has been set up with Globus, showing in Fig. 25.Otherwise, create and sign into Globus using a
Globus ID
PS: if you are a student in Joanthan’s group, just email Rice IT team to add
condo
from jonathan’s storage into yourcollection
.

Fig. 25 Globus Login with Google
or ORCiD ID
#
Download Globus Connect Personal for Mac#
Most users might try to transfer data between cluster's endpoint
and personal PC's endpoint
, so users need to create new personal endpoint which can be refered to the
tutorial.

Fig. 26 Globus Personal App#
Transfer Data#
1. WebApp#
After logging in globus, you will get a webpage like Fig. 27, which is called as
Globus Web App
.You can add different
collection
and mark them inBOOKMARKS
, and choose the two panel mode for your two endpoints to transfer data between two PCs.After globus submits your work, you can find the mission process in
ACTIVITY
part.

Fig. 27 Globus Web App#
2. Globus-Cli#
Sometimes you may have a lot of files (not a single file) to transfer, you need to write a script to do this work. globus-cli
provides a command-line-shell
tool to transfer data, and install it following the open source code in github:
pip install globus-cli
Login:
When you login successfully, it will generate a file in
.globus/cli/storage.db
, which will remember your login info, avoiding the repeated login actions next time.
globus login

Fig. 28 Globus Login in Terminal#

Fig. 29 Globus Login in Terminal with Authorization Code#
Endpoint ID:
Approach-1: get Endpoint ID via
globus-cli
, takeFu's M1-max
for example which is my personal endpoint name:
# get endpoint
globus endpoint search "Fu's M1-max"
# copy and paste the desired Endpoint ID from the search results
ep1=f34fbbd4-6708-11ed-8422-xxxxxxxxxx
# now we can use the endpoint in a human readable fashion
globus endpoint show $ep1

Fig. 30 Globus M1-max Endpoint ID#
Approach-2: get Endpoint ID via
Globus Web App
, click the specified collections to show detailed info, then theUUID
is the Endpoint ID:

Fig. 31 Globus M1-max Endpoint ID in Globus Web App#
Filesystem Operations
Demonstrates the synchronous commands of
mkdir
,rename
, andls
.
# Tutorial Endpoint ID found from 'globus endpoint search Tutorial'
$ ep1=ddb59aef-6d04-11e5-ba46-22000b92c6ec
# Make a new directory
$ globus mkdir $ep1:\~/cli_example_dir
The directory was created successfully
# Rename the directory
$ globus rename $ep1:\~/cli_example_dir $ep1:\~/cli_example_dir_renamed
File or directory renamed successfully
# Show the directory contents after the changes
# (assuming ~/ was empty before these commands)
$ globus ls $ep1:\~/
cli_example_dir_renamed/
Single Item Transfers
Submits transfer requests for a file and a directory from one Globus Tutorial Endpoint to another
# Tutorial Endpoint IDs found from 'globus endpoint search Tutorial'
$ ep1=ddb59aef-6d04-11e5-ba46-22000b92c6ec
$ ep2=ddb59af0-6d04-11e5-ba46-22000b92c6ec
# transfer file1.txt from one endpoint to another
$ globus transfer $ep1:/share/godata/file1.txt $ep2:\~/file1.txt --label "CLI single file"
Message: The transfer has been accepted and a task has been created and queued for execution
Task ID: 466a5962-dda0-11e6-9d11-22000a1e3b52
# recursively transfer the godata folder from one endpoint to another
$ globus transfer $ep1:/share/godata $ep2:~/godata --recursive --label "CLI single folder"
Message: The transfer has been accepted and a task has been created and queued for execution
Task ID: 47477b62-dda0-11e6-9d11-22000a1e3b52
Batch Transfers
Uses a
input.txt
file to request multiple files in one transfer request.
# this is the contents of in.txt:
# a list of source paths followed by destination paths
file1.txt file1.txt
file2.txt file2.txt # inline-comments are also allowed
file3.txt file3.txt
Use
--batch
oninput.txt
# Tutorial Endpoint IDs found from 'globus endpoint search Tutorial'
$ ep1=ddb59aef-6d04-11e5-ba46-22000b92c6ec
$ ep2=ddb59af0-6d04-11e5-ba46-22000b92c6ec
# pass `--batch` mode an input .txt file
# all paths from stdin are relative to the paths supplied here
$ globus transfer $ep1:/share/godata/ $ep2:~/ --label "CLI Batch" --batch input.txt
Message: The transfer has been accepted and a task has been created and queued for execution
Task ID: 306900e0-dda1-11e6-9d11-22000a1e3b52
Only One Task?
Note that only one task was needed even though there are multiple files to be transferred.
3. Globus-Sdk#
This SDK provides a convenient Pythonic interface to Globus web APIs
, including the Transfer API
and the Globus Auth API
(tools for authenticating logins). Documentation for the APIs is available at https://docs.globus.org/api/, install it firstly:
pip install globus-sdk
There are some app-examples using the python SDK, https://github.com/globus/native-app-examples. But it is hard to use, and I recommend to use globus-cli
to transfer data.
4. Python Script#
Here is an example using globus-cli
to transfer data (single file and directory) between two endpoints, and note the input.txt
file has two columns, the first column is source file's path
and the second column is destination file's path
.
import os
input_txt = "input.txt"
label = "CLI batch single file and directory"
source_endpoint_id = "xxxx-xxxx..." # jonathan's condo
dest_endpoint_id = "xxxx-xxxx..." # nots's scratch
### single_file
a1 = "/hpc-demo/docs/getbatchError.m" # source file's path
a2 = "/fy21/hpc-demo/docs/getbatchError.m" # destination file's path
### directory
b1 = "/hpc-demo/docs"
b2 = "/fy21/hpc-demo/docs_new"
fout = open(input_txt, 'w')
fout.write('# [source_file_path]\t[dest_file_path]\n')
fout.write(a1)
fout.write('\t')
fout.write(a2)
fout.write('\n')
fout.write(b1)
fout.write('\t')
fout.write(b2)
fout.write('\t --recursive') # these are recursive transfers in the batch
fout.write('\n')
fout.close()
command = "globus transfer %s %s --label '%s' --batch %s" % (source_endpoint_id, dest_endpoint_id, label, input_txt)
os.system(command)
The input.txt
file is showing below:
# [source_file_path] [dest_file_path]
/hpc-demo/docs/getbatchError.m /fy21/hpc-demo/docs/getbatchError.m
/hpc-demo/docs /fy21/hpc-demo/docs_new --recursive