Skip to content

Idtoken Authentication

The need for idtoken authentication for remote commands launched by the user

Idtoken authentication has been implemented on the HTCondor pool to:

  • allow users to remotely authenticate to certain HTCondor pool services,
  • and thus allow these users to execute commands requiring calls to other machines in the HTCondor pool than the one on which the connection was established (e.g., a "lappui" machine).

Access to certain commands, and the underlying interactions with certain HTCondor services, were indeed impossible with the only authentication method used until then: the "FS" method, which only allows communication with the local HTCondor daemons of the machine to which the user is connected.

For example, until now, if you had submitted jobs from several different Access Points (AP) (e.g., lappui9a and lappui9b), it was impossible for you to view the list of all these jobs from one of the APs using the condor_q command:

  • this command launched without parameters only returns the list of jobs launched from the local machine to which you are connected (lappui9a, for example).
  • And the command launched with the "global" parameter (condor_q -global) could not succeed, because the processes launched to generate the output of this command query HTCondor services residing on other machines in the pool than the one from which the command is launched. This is the meaning of the error that can be seen below:
$ condor_q -global
Error: Couldn't contact the condor_collector on
cm01.tests.local.fr?sock=collector, cm02.tests.local.fr?sock=collector.

Extra Info: the condor_collector is a process that runs on the central
manager of your Condor pool and collects the status of all the machines and
jobs in the Condor pool. The condor_collector might not be running, it might
be refusing to communicate with you, there might be a network problem, or
there may be some other problem. Check with your system administrator to fix
this problem.

If you are the system administrator, check that the condor_collector is
running on cm01.tests.local.fr?sock=collector,
cm02.tests.local.fr?sock=collector, check the ALLOW/DENY configuration in
your condor_config, and check the MasterLog and CollectorLog files in your
log directory for possible clues as to why the condor_collector is not
responding. Also see the Troubleshooting section of the manual.

Since the implementation of idtokens, this command returns information on all the jobs of the user who launches it, even if these jobs were launched on a machine other than the one from which the command is executed.

$ condor_q -global

-- Schedd: ce04.tests.local.fr : <192.168.96.172:9618?... @ 11/12/25 10:20:31
OWNER BATCH_NAME      SUBMITTED   DONE   RUN    IDLE   HOLD  TOTAL JOB_IDS

Total for query: 0 jobs; 0 completed, 0 removed, 0 idle, 0 running, 0 held, 0 suspended
Total for john: 0 jobs; 0 completed, 0 removed, 0 idle, 0 running, 0 held, 0 suspended
Total for all users: 1904 jobs; 0 completed, 0 removed, 316 idle, 1588 running, 0 held, 0 suspended


-- Schedd: ce06.tests.local.fr : <192.168.96.174:9618?... @ 11/12/25 10:20:31
OWNER BATCH_NAME      SUBMITTED   DONE   RUN    IDLE   HOLD  TOTAL JOB_IDS

Total for query: 0 jobs; 0 completed, 0 removed, 0 idle, 0 running, 0 held, 0 suspended
Total for john: 0 jobs; 0 completed, 0 removed, 0 idle, 0 running, 0 held, 0 suspended
Total for all users: 1837 jobs; 0 completed, 0 removed, 290 idle, 1547 running, 0 held, 0 suspended


-- Schedd: lapthui9d.tests.local.fr : <192.168.10.188:9618?... @ 11/12/25 10:20:31
OWNER BATCH_NAME      SUBMITTED   DONE   RUN    IDLE   HOLD  TOTAL JOB_IDS

Total for query: 0 jobs; 0 completed, 0 removed, 0 idle, 0 running, 0 held, 0 suspended
Total for john: 0 jobs; 0 completed, 0 removed, 0 idle, 0 running, 0 held, 0 suspended
Total for all users: 2 jobs; 0 completed, 0 removed, 0 idle, 2 running, 0 held, 0 suspended


-- Schedd: lappui9a.tests.local.fr : <192.168.10.162:9618?... @ 11/12/25 10:20:31
OWNER    BATCH_NAME    SUBMITTED   DONE   RUN    IDLE   HOLD  TOTAL JOB_IDS
john ID: 13906   11/12 10:16      _      _      1      _      1 13906.0

Total for query: 1 jobs; 0 completed, 0 removed, 1 idle, 0 running, 0 held, 0 suspended
Total for john: 1 jobs; 0 completed, 0 removed, 1 idle, 0 running, 0 held, 0 suspended
Total for all users: 1 jobs; 0 completed, 0 removed, 1 idle, 0 running, 0 held, 0 suspended


-- Schedd: lappusmb9a.tests.local.fr : <192.168.96.231:9618?... @ 11/12/25 10:20:31
OWNER    BATCH_NAME    SUBMITTED   DONE   RUN    IDLE   HOLD  TOTAL JOB_IDS

Total for query: 0 jobs; 0 completed, 0 removed, 0 idle, 0 running, 0 held, 0 suspended
Total for john: 0 jobs; 0 completed, 0 removed, 0 idle, 0 running, 0 held, 0 suspended
Total for all users: 7 jobs; 0 completed, 0 removed, 6 idle, 1 running, 0 held, 0 suspended


-- Schedd: lappusmb9c.tests.local.fr : <192.168.96.228:9618?... @ 11/12/25 10:20:31
OWNER    BATCH_NAME    SUBMITTED   DONE   RUN    IDLE   HOLD  TOTAL JOB_IDS

Total for query: 0 jobs; 0 completed, 0 removed, 0 idle, 0 running, 0 held, 0 suspended
Total for john: 0 jobs; 0 completed, 0 removed, 0 idle, 0 running, 0 held, 0 suspended
Total for all users: 4 jobs; 0 completed, 0 removed, 3 idle, 1 running, 0 held, 0 suspended

Using idtokens

Generating an idtoken

After seeing the usefulness of using "idtoken" authentication, let's see how to create an idtoken. From a "lappui" server, an authenticated user is autonomous in creating an idtoken with the following command:

$ condor_token_fetch -token <tokenname>

This command will create:

  • a file named after the parameter passed to the command (here 'tokenname'),
  • and automatically place it in the user's ~/.condor/tokens.d/ directory.

The identity contained in the token is identical to that of the user connected to the machine where the token creation command was launched.

The user can view the list of tokens in their possession with the command "condor_token_list". The output displays the metadata associated with the token (notably: identity and file location), as shown below.

$ condor_token_list
Header: {"alg":"HS256","kid":"POOL"} Payload: {"iat":1758549431,"iss":"tests.local.fr","jti":"22bd332c9b38f7541a84b56735f93a4e","sub":"john@tests.local.fr"} File: /home1/john/.condor/tokens.d/idtoken_maq
Header: {"alg":"HS256","kid":"POOL"} Payload: {"iat":1758548684,"iss":"tests.local.fr","jti":"b3ade2dd0f1607109d110c8c38df47fb","sub":"john@tests.local.fr"} File: /home1/john/.condor/tokens.d/idtokentest

Possibilities offered by the commands

Get the list of submitted jobs, regardless of the machine they were submitted from

In the output of the command below (condor_q -global), launched on machine 'ui02', a job launched on machine 'ui03' appears in the list of jobs.

$ condor_q -global


-- Schedd: ui03.tests.local.fr : <192.168.136.3:9618?... @ 06/30/25 11:40:43
OWNER    BATCH_NAME    SUBMITTED   DONE   RUN    IDLE   HOLD  TOTAL JOB_IDS
john ID: 141579        6/30 11:40      _      1      _      _      1 1.0

Total for query: 1 jobs; 0 completed, 0 removed, 0 idle, 1 running, 0 held, 0 suspended
Total for john: 1 jobs; 0 completed, 0 removed, 0 idle, 1 running, 0 held, 0 suspended
Total for all users: 1 jobs; 0 completed, 0 removed, 0 idle, 1 running, 0 held, 0 suspended

Detailed information about a job

To get detailed information about a particular job, below, the one with id '41579', you can use the "condor_q" command with the parameter "-better-analyze", and thus easily find the reasons explaining why a job is not, for example, handled by a computing node, or why it fails.

To get this information even if the job was launched from a machine other than the one from which you want to view this information, you must add the "-global" option.

$ condor_q -global -better-analyze 41579
(...)
-- Schedd: lappui9f.tests.local.fr : <192.168.10.175:9618?...
The Requirements expression for job 41579.000 is

    ((regexp(".*k80.*",TARGET.GPU_MODEL,"i")) ||
      (regexp(".*p6000.*",TARGET.GPU_MODEL,"i")) ||
      (regexp(".*1g.5gb.*",TARGET.GPU_MODEL,"i"))) && GpuUsageRequirements

Job 41579.000 defines the following attributes:

    FileSystemDomain = "condor.must.tests.local.fr"
    GpuUsageRequirements = (Machine =!= LastRemoteHost) && (TARGET.Arch == "X86_64") && (TARGET.OpSys == "LINUX") && (TARGET.Disk >= RequestDisk) && (TARGET.Memory >= RequestMemory) && (TARGET.Cpus >= RequestCpus) && (TARGET.GPUs >= RequestGPUs) && ((TARGET.FileSystemDomain == MY.FileSystemDomain) || (TARGET.HasFileTransfer))
    RequestCpus = 8
    RequestDisk = 102400 (kb)
    RequestGPUs = 1
    RequestMemory = 8192 (mb)

The Requirements expression for job 41579.000 reduces to these conditions:

         Slots
Step    Matched  Condition
-----  --------  ---------
[1]          14  regexp(".*p6000.*",TARGET.GPU_MODEL,"i")
[5]           3  GpuUsageRequirements
[6]           0  [1] && [5]

No successful match recorded.
Last failed match: Fri Sep 19 16:50:31 2025

Reason for last match failure: no match found

41579.000:  Run analysis summary ignoring user priority.  Of 206 machines,
    206 are rejected by your job's requirements
      0 reject your job because of their own requirements
      0 match and are already running your jobs
      0 match but are serving other users
      0 are able to run your job

Get information about the status of the HTCondor pool

With the condor_status command, it is now possible to get information on the status of the pool: names of GPU machines, types of GPUs, occupancy rate, characteristics of a machine...

For example, to know the list of servers with graphics cards (GPUs), the type of graphics card and the details of their availability, you can use the following command:

$ condor_status -gpu
Name                           ST User            GPUs GPU-Memory GPU-Name

slot1@wngpu004.local.fr   Ui _                      1   15.8 GB    Tesla V100-PCIE-16GB
slot1@wngpu005.local.fr   Ui _                      1   23.9 GB    Quadro P6000
slot1@wngpu006.local.fr   Ui _                      4   14.6 GB    Tesla T4
slot1@wngpu007.local.fr   Ui _                      2   39.5 GB    NVIDIA A100-PCIE-40GB
slot1_2@wngpu007.local.fr Cb lafloud@local.fr       1   39.5 GB    NVIDIA A100-PCIE-40GB
slot1@wngpu008.local.fr   Ui _                      0   39.5 GB    NVIDIA A100-PCIE-40GB
slot1_1@wngpu008.local.fr Cb thiboup@local.fr       1   39.5 GB    NVIDIA A100-PCIE-40GB
slot1_2@wngpu008.local.fr Cb lafloud@local.fr       1   39.5 GB    NVIDIA A100-PCIE-40GB
slot1_3@wngpu008.local.fr Cb lafloud@local.fr       1   39.5 GB    NVIDIA A100-PCIE-40GB
slot1@wngpu009.local.fr   Ui _                      0   39.5 GB    NVIDIA A100-PCIE-40GB
slot1_1@wngpu009.local.fr Cb versoil@local.fr       1   39.5 GB    NVIDIA A100-PCIE-40GB
slot1@wngpu010.local.fr   Ui _                      3   79.3 GB    NVIDIA A100 80GB PCIe
slot1@wngpu011.local.fr   Ui _                      2   79.3 GB    NVIDIA A100 80GB PCIe
slot1_1@wngpu011.local.fr Cb aliseapo@local.fr      1   79.3 GB    NVIDIA A100 80GB PCIe
slot1@wngpu012.local.fr   Ui _                      1   79.3 GB    NVIDIA A100 80GB PCIe
In this output, we can see, for example, that the 3 GPU slots of the 'wngpu008' machine are occupied by 2 different users, and that none are available, as indicated by the 'slot1@wngpu008.local.fr' line. Conversely, the 4 available slots on the 'wngpu006' machine are available, as indicated by the 'slot1@wngpu006' line and the absence of a 'slot1_N@wngpu006' type line.

To get detailed information on a given machine:

$ condor_status -l wn02.tests.must-dcc.fr

To get an overview of the scheduler's occupancy status:

$ condor_status -sched
Name                Machine             RunningJobs   IdleJobs   HeldJobs

lapce24.local.fr    lapce04.local.fr         2002        243          0
lapce25.local.fr    lapce05.local.fr          706         20          0
lapui9a.local.fr    lapui9a.local.fr            2          0          0
lapui9b.local.fr    lapui9b.local.fr            0          0          0
lappusmb9a.local.fr lappusmb9a.local.fr         1          5          0
lappusmb9b.local.fr lappusmb9b.local.fr         0          0          0
lapthui9c.local.fr  lapthui9c.local.fr          0          0          0
lapthui9d.local.fr  lapthui9d.local.fr          3          0          0

Security

An idtoken therefore allows you to identify yourself to the HTCondor pool and execute commands under your identity: you have already seen that the information contained in the idtoken refers to your identity. Possession of this idtoken is therefore equivalent to knowing your password. It is therefore very important to secure access to this idtoken, to prevent someone from getting hold of it and thus executing commands under your identity.

By default, when they are created, access to an idtoken is secured, as it is stored in your personal directory, with restrictive permissions, since:

  • you are the owner
  • and you are the only one who can read it,

as seen below:

$ ls -al ~/.condor/tokens.d/
total 16
-rw------- 1 john calcul  244 Sep 22 16:18 idtokentest

Once an idtoken is created, the simplest thing is probably not to move it, and not to modify these default permissions, so as not to allow other people access to it.

However, to further minimize security risks, we have decided to regularly regenerate the key used to sign these tokens. This operation will invalidate idtokens that were signed with this key, leading to an authentication problem when launching commands for which a valid idtoken is required. To avoid asking questions related to error messages that might then occur, we invite you to delete your idtokens at the end of your sessions, and to regenerate a new one at the beginning of each session: this way, you will always have a valid idtoken.