32

If I have a script that I need to run against multiple computers, or with multiple different arguments, how can I execute it in parallel, without having to incur the overhead of spawning a new PSJob with Start-Job?

As an example, I want to re-sync the time on all domain members, like so:

$computers = Get-ADComputer -filter * |Select-Object -ExpandProperty dnsHostName
$creds = Get-Credential domain\user
foreach($computer in $computers)
{
    $session = New-PSSession -ComputerName $computer -Credential $creds
    Invoke-Command -Session $session -ScriptBlock { w32tm /resync /nowait /rediscover }
}

But I don't want to wait for each PSSession to connect and invoke the command. How can this be done in parallel, without Jobs?

Mathias R. Jessen
  • 24,907
  • 4
  • 62
  • 95

4 Answers4

58

Update - While this answer explains the process and mechanics of PowerShell runspaces and how they can help you multi-thread non-sequential workloads, fellow PowerShell aficionado Warren 'Cookie Monster' F has gone the extra mile and incorporated these same concepts into a single tool called Invoke-Parallel - it does what I describe below, and he has since expanded it with optional switches for logging and prepared session state including imported modules, really cool stuff - I strongly recommend you check it out before building you own shiny solution!


With Parallel Runspace execution:

Reducing inescapable waiting time

In the original specific case, the executable invoked has a /nowait option which prevents blocking the invoking thread while the job (in this case, time re-synchronization) finishes on its own.

This greatly reduces the overall execution time from the issuers perspective, but connecting to each machine is still done in sequential order. Connecting to thousands of clients in sequence may take a long time depending on the number of machines that are for one reason or another inaccessible, due to an accumulation of timeout waits.

To get around having to queue up all subsequent connections in case of a single or a few consecutive timeouts, we can dispatch the job of connecting and invoking commands to separate PowerShell Runspaces, executing in parallel.

What is a Runspace?

A Runspace is the virtual container in which your powershell code executes, and represents/holds the Environment from the perspective of a PowerShell statement/command.

In broad terms, 1 Runspace = 1 thread of execution, so all we need to "multi-thread" our PowerShell script is a collection of Runspaces that can then in turn execute in parallel.

Like the original problem, the job of invoking commands multiple runspaces can be broken down into:

  1. Creating a RunspacePool
  2. Assigning a PowerShell script or an equivalent piece of executable code to the RunspacePool
  3. Invoke the code asynchronously (ie. not having to wait for the code to return)

RunspacePool template

PowerShell has a type accelerator called [RunspaceFactory] that will assist us in the creation of runspace components - let's put it to work

1. Create a RunspacePool and Open() it:

$RunspacePool = [runspacefactory]::CreateRunspacePool(1,8)
$RunspacePool.Open()

The two arguments passed to CreateRunspacePool(), 1 and 8 is the minimum and maximum number of runspaces allowed to execute at any given time, giving us an effective maximum degree of parallelism of 8.

2. Create an instance of PowerShell, attach some executable code to it and assign it to our RunspacePool:

An instance of PowerShell is not the same as the powershell.exe process (which is really a Host application), but an internal runtime object representing the PowerShell code to execute. We can use the [powershell] type accelerator to create a new PowerShell instance within PowerShell:

$Code = {
    param($Credentials,$ComputerName)
    $session = New-PSSession -ComputerName $ComputerName -Credential $Credentials
    Invoke-Command -Session $session -ScriptBlock {w32tm /resync /nowait /rediscover}
}
$PSinstance = [powershell]::Create().AddScript($Code).AddArgument($creds).AddArgument("computer1.domain.tld")
$PSinstance.RunspacePool = $RunspacePool

3. Invoke the PowerShell instance asynchronously using APM:

Using what is known in .NET development terminology as the Asynchronous Programming Model, we can split the invocation of a command into a Begin method, for giving a "green light" to execute the code, and an End method to collect the results. Since we in this case are not really interested in any feedback (we don't wait for the output from w32tm anyways), we can make due by simply calling the first method

$PSinstance.BeginInvoke()

Wrapping it up in a RunspacePool

Using the above technique, we can wrap the sequential iterations of creating new connections and invoking the remote command in a parallel execution flow:

$ComputerNames = Get-ADComputer -filter * -Properties dnsHostName |select -Expand dnsHostName

$Code = {
    param($Credentials,$ComputerName)
    $session = New-PSSession -ComputerName $ComputerName -Credential $Credentials
    Invoke-Command -Session $session -ScriptBlock {w32tm /resync /nowait /rediscover}
}

$creds = Get-Credential domain\user

$rsPool = [runspacefactory]::CreateRunspacePool(1,8)
$rsPool.Open()

foreach($ComputerName in $ComputerNames)
{
    $PSinstance = [powershell]::Create().AddScript($Code).AddArgument($creds).AddArgument($ComputerName)
    $PSinstance.RunspacePool = $rsPool
    $PSinstance.BeginInvoke()
}

Assuming that the CPU has the capacity to execute all 8 runspaces at once, we should be able to see that the execution time is greatly reduced, but at the cost of readability of the script due to the rather "advanced" methods used.


Determining the optimum degree of parallism:

We could easily create a RunspacePool that allows for the execution of a 100 runspaces at the same time:

[runspacefactory]::CreateRunspacePool(1,100)

But at the end of the day, it all comes down to how many units of execution our local CPU can handle. In other words, as long as your code is executing, it does not make sense to allow more runspaces than you have logical processors to dispatch execution of code to.

Thanks to WMI, this threshold is fairly easy to determine:

$NumberOfLogicalProcessor = (Get-WmiObject Win32_Processor).NumberOfLogicalProcessors
[runspacefactory]::CreateRunspacePool(1,$NumberOfLogicalProcessors)

If, on the other hand, the code you are executing itself incurs a lot of wait time due to external factors like network latency, you can still benefit from running more simultanous runspaces than you have logical processors, so you'd probably want to test of range possible maximum runspaces to find break-even:

foreach($n in ($NumberOfLogicalProcessors..($NumberOfLogicalProcessors*3)))
{
    Write-Host "$n: " -NoNewLine
    (Measure-Command {
        $Computers = Get-ADComputer -filter * -Properties dnsHostName |select -Expand dnsHostName -First 100
        ...
        [runspacefactory]::CreateRunspacePool(1,$n)
        ...
    }).TotalSeconds
}
mklement
  • 514
  • 4
  • 11
Mathias R. Jessen
  • 24,907
  • 4
  • 62
  • 95
5

Adding to this discussion, what's missing is a collector to store the data that is created from the runspace, and a variable to check the status of the runspace, i.e. is it completed or not.

#Add an collector object that will store the data
$Object = New-Object 'System.Management.Automation.PSDataCollection[psobject]'

#Create a variable to check the status
$Handle = $PSinstance.BeginInvoke($Object,$Object)

#So if you want to check the status simply type:
$Handle

#If you want to see the data collected, type:
$Object
Mathias R. Jessen
  • 24,907
  • 4
  • 62
  • 95
Nate Stone
  • 49
  • 1
  • 3
3

Check out PoshRSJob. It provides same/similar functions as the native *-Job functions, but uses Runspaces which tend to be much quicker and more responsive than the standard Powershell jobs.

Rosco
  • 31
  • 1
3

@mathias-r-jessen has a great answer though there are details I'd like to add.

Max Threads

In theory threads should be limited by the number of system processors. However, while testing AsyncTcpScan I achieved far better performance by choosing a much larger value for MaxThreads. Thus why that module has a -MaxThreads input parameter. Keep in mind that allocating too many threads will hinder performance.

Returning Data

Getting data back from the ScriptBlock is tricky. I've updated the OP code and integrated it into what was used for AsyncTcpScan.

WARNING: I wasn't able to test the following code. I made some changes to the OP script based on my experience working with the Active Directory cmdlets.

# Script to run in each thread.
[System.Management.Automation.ScriptBlock]$ScriptBlock = {

    $result = New-Object PSObject -Property @{ 'Computer' = $args[0];
                                               'Success'  = $false; }

    try {
            $session = New-PSSession -ComputerName $args[0] -Credential $args[1]
            Invoke-Command -Session $session -ScriptBlock { w32tm /resync /nowait /rediscover }
            Disconnect-PSSession -Session $session
            $result.Success = $true
    } catch {

    }

    return $result

} # End Scriptblock

function Invoke-AsyncJob
{
    [CmdletBinding()]
    param(
        [parameter(Mandatory=$true)]
        [System.Management.Automation.PSCredential]
        # Credential object to login to remote systems
        $Credentials
    )

    Import-Module ActiveDirectory

    $Results = @()

    $AllJobs = New-Object System.Collections.ArrayList

    $AllDomainComputers = Get-ADComputer -Filter * -Properties dnsHostName

    $HostRunspacePool = [System.Management.Automation.Runspaces.RunspaceFactory]::CreateRunspacePool(2,10,$Host)

    $HostRunspacePool.Open()

    foreach($DomainComputer in $AllDomainComputers)
    {
        $asyncJob = [System.Management.Automation.PowerShell]::Create().AddScript($ScriptBlock).AddParameters($($($DomainComputer.dnsName),$Credentials))

        $asyncJob.RunspacePool = $HostRunspacePool

        $asyncJobObj = @{ JobHandle   = $asyncJob;
                          AsyncHandle = $asyncJob.BeginInvoke()    }

        $AllJobs.Add($asyncJobObj) | Out-Null
    }

    $ProcessingJobs = $true

    Do {

        $CompletedJobs = $AllJobs | Where-Object { $_.AsyncHandle.IsCompleted }

        if($null -ne $CompletedJobs)
        {
            foreach($job in $CompletedJobs)
            {
                $result = $job.JobHandle.EndInvoke($job.AsyncHandle)

                if($null -ne $result)
                {
                    $Results += $result
                }

                $job.JobHandle.Dispose()

                $AllJobs.Remove($job)
            } 

        } else {

            if($AllJobs.Count -eq 0)
            {
                $ProcessingJobs = $false

            } else {

                Start-Sleep -Milliseconds 500
            }
        }

    } While ($ProcessingJobs)

    $HostRunspacePool.Close()
    $HostRunspacePool.Dispose()

    return $Results

} # End function Invoke-AsyncJob
phbits
  • 206
  • 1
  • 8