Scaling sipXacd to hundreds of agents

Skip to end of metadata
Go to start of metadata

The problem

I wan to scale sipXacd to 300 talking agents!

But sipXecs ACD is not designed to handle anything beyond 30ppl:

  • built-in limit for 50 agents
  • built-in limit for 30 connections to a single server (agents + queued)
  • built-in limit for 12 queues
  • random behaviour with login/logout
  • unstable reporting with heavy traffic (in-memory event handling eats Your RAM)
  • some features present in GUI but not implemented (eg queue priority)
  • new config activation restarts the whole ACD
    • terminates all current calls
    • looses agent presence info until they do full logout/login
  • ACD is B2BUA with media anchoring
    • G.711 only,
    • starves Your CPU (see limits)
    • difficult to follow original SIP Call-ID
    • no way to transparently forward custmer data in SIP headers
  • No HA clustering

The solution (1st attempt)

Solved issues:

  • removed sipXconfig limit of 50 agents
  • hacked the sipXacd C++ code to lift the 30 calls limit
  • hacked the sipXportlib c++ code to up the internam msg queue to 1000 for all processes
  • hacked the sipXacd agent scheduling with my patch

Remaining issues:

  • Activation causes restarting
  • Media anchoring
  • Unstable sipXpresence
  • Agents half loggedout after Activation/restart of th ACD
  • Only partial clustering using additional native Windows sipXcti for Agents
    • redundant sipXpresence login
    • SIP switchover to the hot-standby sipXacd with same configuration

End result

  • stability issues with testbeds (not performance releated)
    • every second call has no audio when testbed is above 200 connected calls to sipXacd
    • TestBed
      • 1000 logged in users, 200 of them being agents
      • 11 queues with differet priorities
      • Inbound call generator with calls between 60 and 120 seconds with 90sec average
      • Self-adjusting inbound flow to allow for 30 queued calls all the time
      • 15sec wrap-up time, no overflows, no media while queued,

The solution (2nd attempt)

I was able to convince some ppl @ Pingtel to switch over to FreeSwitch for media handling.
sipXecs 4.0 switches media to FS for IVR and Conferencing leaving Paging, Bridge, Presence, ACD & Voicemail to other dedicated components.

How about moving ACD to FreeSwitch as well?

FS has built in FIFO queues but they are not compatible with sipXacd way.
So lets just use FS as a softswitch with Event Socket Library - same as with sipXivr but do it the clustered way so it can handle 5000 agents in multi site / HA manner.
Let it :

  • remove all shortcomings of sipXacd
  • record calls
  • do intelligent inbound routing (ODBC queries, customer info/authentication)
  • publish presence to IM platforms (eg OpenFire)
  • allow real-time statistics for the cluster,
  • do call(customer) based priority
  • scale with additional HW with no necessity for any Agent/Queue partitioning
  • be Plug&Play by replacing sipXreport, sipXacd, sipXpresence RPMs

Solved issues

  • Found a sponsor
  • Designed a state machine
  • Sketched a simple drawing:
  • Found some ppl willing to write the JTAPI and ACD and CallState engines

Remaining issues

  • We have several FS roles here - inbound, ACD and softphone.
    • have to decide which one is the transfer point
    • agree on SIP headers semantics for marking record, transfer and border points (FS instances)
    • how JTAPI should reach other FS roles beyond softphone - ESL, SIP Message, XML/RPC ?
  • agree if we need persistent storage in ACD nodes (hope to avoid that) ?
  • choose MemCacheDB or Tokyo Tyrant for Presence clustering (sync vs async replication)
  • design ACID agent reservation based with hash DBs like MemC (blah)
  • design storing lists in hash DBs like MemC (yuck!)
    • memcachedb supports rget which would be cool if supported by PHP/JAVA clients
  • dispatch all this work to ppl.
Labels:
None
Enter labels to add to this page:
Please wait 
Looking for a label? Just start typing.