transcript for section 4 - intro to sip servlet programming

SIP Servlet specification defines application programming interface for writing server

components (in Java) that can converse in SIP. In other words

inspect, create, and modify SIP messages.

a UAC, UAS, or a proxy. Its design

implementing web applications. In fact,

apparent from the definition of the class defined in SIP Servlet spec, SipServlet

Servlet.

Just like HTTP servlet, a SIP Servlet executes inside a container; the SIP Servlet container. This

container takes care of various lower

container host, for receiving SIP messages. Another example

Servlet – is managing the lifecycle of SIP transactions.

You know, according to the protocol specification (of SIP), an INVITE / BYE message

a corresponding OK / ACK. A pair of INVITE and OK, for example, and all the messages exchanged

in between make up a single SIP transaction.

In the transport layer, things can go wrong (of course), such that message

not delivered in time…, and it may lead to some ugly consequences

circumstances, they came up with SIP

protocol specification, you’ll understand it’s quite a difficult task doing it right (on our own).

Thankfully, SIP container takes care of

Please allow me to take you through a little detour

Now I’d like to introduce you to another important term, SIP Dialog.

represents a call-leg. An invite and the corresponding success response (e.g.: OK) marks the

beginning of a dialog. On the other hand, a BYE

So that was SIP dialog or call-leg, what about the “call” itself?

directly interconnected we wouldn’t really be able to distinguish a call from a call

the call-leg is the call itself. The distinction is more visible in the cases where the call

brokered by a B2B user agent. Let’s have a look at the (following) diagram:

This line stretching from party 1 to party 2 represents what people normally wo

“call”. While the line connecting party 1 and the

party 2 and the b2bua. Throughout the lifetime of a call

shown in the following diagram:

SIP Servlet specification defines application programming interface for writing server

components (in Java) that can converse in SIP. In other words, a component that accepts,

messages. With those capabilities, the component can act either as

design is quite similar to HTTP servlet that’s been widely used for

. In fact, the specification was derived from HTTP servlet, as it is

f the class defined in SIP Servlet spec, SipServlet; it extends the class


container takes care of various lower-level tasks, for instance keeping a socket opened in the

container host, for receiving SIP messages. Another example – and this one is specific to SIP

is managing the lifecycle of SIP transactions.

You know, according to the protocol specification (of SIP), an INVITE / BYE message


in between make up a single SIP transaction.

In the transport layer, things can go wrong (of course), such that messages don’t get delivered /

may lead to some ugly consequences. To compensate for those

circumstances, they came up with SIP message retransmission mechanism. If you read the SIP


takes care of it so it wouldn’t distract us.

take you through a little detour. You just picked up a new term, SIP transaction.

to another important term, SIP Dialog. Simply put, a SIP dialog

leg. An invite and the corresponding success response (e.g.: OK) marks the

her hand, a BYE message marks the end of it.

leg, what about the “call” itself? Well in case the user-agents are

directly interconnected we wouldn’t really be able to distinguish a call from a call-leg; (because)

The distinction is more visible in the cases where the call

Let’s have a look at the (following) diagram:

This line stretching from party 1 to party 2 represents what people normally would refer to as a

the line connecting party 1 and the b2b-ua is a call-leg, and so is the line connecting

Throughout the lifetime of a call-leg, there can be several transactions

:

SIP Servlet specification defines application programming interface for writing server-side

accepts, sends,

the component can act either as

that’s been widely used for

servlet, as it is

extends the class


socket opened in the

and this one is specific to SIP

must end with


don’t get delivered /

To compensate for those

If you read the SIP


ked up a new term, SIP transaction.

a SIP dialog

leg. An invite and the corresponding success response (e.g.: OK) marks the

agents are

leg; (because)

The distinction is more visible in the cases where the call is being

refer to as a

leg, and so is the line connecting

there can be several transactions, as

Allright, that was the little detour, but please keep the definition of those terms in mind. I can’t

stress enough the importance of using the correct terminology to express your domain.

Let’s go back to SIP Servlet. Now that most lower-level details are handled by the SIP servlet

container, what is left to us as servlet programmers is writing codes that reacts on incoming SIP

messages, be it a request or a response. It’s relatively a simple task; yet another event-driven

component programming.

It’s time for us to know where to handle those events in our code. So, let’s implement our first SIP

Servlet. This servlet is going to answer every call requests it receives. It’s not very useful, but it’s as

simple as we can get.

[Open netbeans]

Open up your netbeans and create a project of type web-application. [Click-click]. Now let’s import

additional libraries required to compile SIP Servlet code. The jar comes with the installation of BEA

WLSS [show it, click, click].

Now, create a class that extends SIPServlet. [click-click]. Let’s examine what we have in the

SipServlet class by reading the API doc.

[Scroll to doService(…)]. The method doService(…) is the entry point to the SIP Servlet; (that is to

say) the container passes the incoming messages to the SIP Servlet through this method. We

normally wouldn’t want to modify this method, whose default behavior is to determine whether

the message is a request or a response, and then pass the message further to the method

doRequest(…) or doResponse(…) accordingly.

Then either doRequest(..) or doResponse(…) inspects the message further, (by) reading the header

of the message to figure out the type of the request or the response. For example, if the request is

an INVITE, then the method doInvite(…) will be invoked, passing along the request to it. Similarly,

if the response is an OK, then the method doSuccessResponse(…) will be invoked.

You can see now, these are the slots where we might have to fill-in with our message-handling

logics. [Show all the doXXX methods in the API].

So, we want to deal with INVITE requests, here’s what we have to do: [write down the doInvite(…)

method, start with comments / pseudo-code]:

doInvite(…) {

//log the message [to demonstrate the structure of a message obj]

//send ringing then sleep for 5 seconds

//send OK

}

We also want to handle BYE message, because we have to respond a BYE with an OK, so that the

sender can do a propert clean up. Here’s what to do [write down do BYE]:

doBye(…) {

//log the message [to demonstrate the structure of a message obj]

//send OK

}

That’s all the Java codes needed for now, and but we still have one more thing to do: writing the

descriptor. Think of it as SIP Servlet’s equivalent of web.xml in an HTTP Servlet application. The

filename of the descriptor is sip.xml, and must be placed in the WEB-INF folder. Let me just copy-

and-paste it and I’ll explain the content.

[Create file, copy-paste]

In the sip.xml you define the SIP servlets that should be made available. We do it the same way as

we would do it for HTTP servlet in web.xml:

[Highlight the <servlet> section]

We also have to define a mapping for the servlet.

[Highlight the <servlet-mapping> section]

The difference from the mapping in web.xml is: instead of specifying the pattern of the URL for

invoking the servlet, we specify the pattern of the SIP message that would invoke the (SIP) servlet.

The pattern is a Boolean expression that will be matched against the value of various fields in the

initial IP message. Please take notice: the pattern only filters __initial__ SIP messages, which are

message that don’t belong to an existing dialog.

In this example we’re pretty lose with our filtering; we accept any INVITEs.

Now let’s build the WAR. Yes, I mean it, let’s make war. [Click-click]. Finally, we can deploy our

application to BEA wlss, watch me doing it: [Click-click].

It’s time for a call [start x-lite]. Remember this is a demo of a P2P, direct call from the phone to the

SIP application server. Therefore, we don’t need to have the requests coming out of our SIP phone

goes through any proxy. Let’s make sure of it by checking its configuration [open X-lite account

config].

Hang on, almost there. We just need to add a SIP URI (of the application server) in the SIP phone’s

address book. [Click-click]…, and fire.

[Look at the log and explain]

Now let’s make another call, this time with wireshark turned on in the machine running the

application server.

[Do it again, now with wireshark] The trace here is similar to the one we’ve seen before (in the

part where SIP protocol was briefly explained, on the direct-call case…, so I wouldn’t explain it

again (stroll down for 1 minutes – isi keterangan 1.5 minutes of clicking through the messages).

Let’s move on our second SIP Servlet. This time, our SIP Servlet will act as a B2B User agent. So,

any call coming in to this SIP Servlet will be bridged to, uhm… let’s say a person named Alice, who

has her SIP user-agent running on 192.168.22.34.

A common practice here is to write a signaling diagrams that depict the possible scenarios. For

now we’ll just consider one scenario, that is the normal scenario, everything go smoothly. So here

it is.

[Draw the signaling diagram with artrage]

Now, we’re facing the design challenge. First, we should notice, that we have two sip dialogs

running side by side. Let’s label the to the left here CallerDialog (because it’s facing the caller).

And… for the one to the right, we’ll name it CalleeDialog.

In SIP Servlet programming the object that roughly represents a SIP dialog is SIPSession

[Show the API doc of SIPSession].

What’s the use of it? Well, it can be used to store session-wide data…, akin to the use of

HttpSession in HTTP servlet programming.

The other use – that we will employ here – is: creation of requests. You see, at one point (here)

[hover your mouse over the BYE], the SIP Servlet will have to create and send a BYE request. That

can be achieved by calling createRequest(…) method on the SIPSession object [show the method

doc in the API doc] that represents the dialog you want to tear down.

Now back to the diagram, by now it’s clear that our SIP Servlet will have to handle two SIPSessions

throughout the scenario. Further, we also (should) realize that we need to logically link the two

sessions. By this I mean: we must be able to navigate from one session to the other. Just keep that

in mind for now, as we’ll jump on to the next step… that is doing a mapping between the diagram

and various doXXX methods of SIPServlet.

We start from the top, the INVITE request. We already know that we deal with it in doInvite(…)

method. What exactly do we do there? Well, first we’ll have to send back a response to the caller,

to indicate him that the INVITE has been received, and letting him know that the SIP Servlet is

trying to fulfill it. The name of the response signal is…, surprise, TRYING (code 100).

Ok, another little detour.... Regarding sending this TRYING signal. It’s a good thing to do, in order

to avoid network congestion. You know, the caller, after sending an INVITE, will start a timer.

When it fires off, and no response has been received, it will re-send the INVITE. So, let’s keep that

from happening, and sending the TRYING signal is the way to go.

That’s all for doInvite(…)…, let me mark this part of the diagram, denoting that it falls within the

scope of the doInvite(…) method.

Next…. When the ringing signal is received from the callee, we have to relay it to the caller. We do

that inside the doProvisionalResponse() method.

[Again, highlight]

Later, as soon as the callee picks up, our SIP Servlet will receive an OK signal. This one, also, has to

be relayed to the caller. We do that inside the doSuccessResponse() method.

[Again, highlight]

Finally, after a while, either the caller or the callee will hangup first. Yep, time for total

annihilation. So, what we have here is an incoming BYE request. Again what we have to do on that

event is: relay it to the other side…, and we do it inside the doBye(…) method.

[Again, highlight]

Allrighty, we’re done the mapping, now it’s time to do the actual coding.

[Start from doInvite:]

doInvite(inviteFromCaller):

inviteToCallee = SIPFactory.createRequest(invitationCaller.getSession().getApplicationSession(),

to = create using sipfactory, from = copy from the inviteFromCaller).

…………….

Notice a new type just came up, applicationSession (show SIPApplicationSession API DOC).

Basically it’s an object that binds inter-related SIPSessions together. The reason why we pass an

instance of SIPApplicationSession when creating the request is because we want the request to be

in a completely new SIPSession [different from that of the inviteFromCaller], but… (emphasis), that

new session must belong to the same application session (as that of the inviteFromCaller). Think of

it as a common storage for both sessions.

SIPFactory is a helper class. Other helper method it has is the one for creating sip URI (like this one

… highlight the code the creation for the To header).

As for the FROM header for the inviteToCallee, we simply copy the value of From header of the

inviteFromCaller. The idea is to let the callee know the identity of the originator of the call…, not

of the application server.

In these lines [show the line where I copy the message] we copy the value of Content-Type header

from the inviteFromCaller to the inviteToCallee. The same goes for the message body here. By

doing so, the callee will send the voice data __directly__ -- without passing through the

application server -- to the caller’s user-agent when the conversation finally takes place. This is an

important aspect of SIP: there’s a separation between signaling space and media space. The voice

data can take a different routing, through a completely different network, from the signaling

messages.

Then here, we simply send the invite.

At this point we have 2 sip sessions, our task now is to link them. The reason we’re doing this is

because at some point in the scenario we will need to do a cross-over. For example: at the point

we receive a BYE from the caller, what we’ll have is the SIPSession that corresponds to the dialog

with the caller. However, we will also need to have the SIPSession of the dialog with the callee in

order to be able to create a BYE request to be sent to the callee.

One way to link them is by using the SipSession. So basically in the SipSession of the caller we store

a reference to the SipSession of the callee, and vice versa. Like this [type the two lines of code].

Another approach for linking is storing both sessions in the SipApplicationSession, this way [show

code].

Well, what approach is better? Well, in this case, there’s no technical merit of one over the other.

But of course, in other cases, it depends ☺.

That’s all, we’re done with the doInvite(…), so we can move now to the doProvisionalResponse(…).

Provisional response is any response whose status starts with 1. The TRYING response we just

made, for example, has status code 100. Another example: 183, for early-media, and 180 for

ringing.

Here we want to adhere strictly to the diagram, we wouldn’t consider anything that is not in the

diagram. So we will be selective here on the type of provisional we will handle…, we’ll only deal

with Ringing.

[Type the if block]

Now, to create a SIPResponse object to be sent to the caller, we need to get hold of the SIPSession

object corresponding to the dialog with the caller. The reason: because that’s the object that has

the createResponse method we need.

[show the API doc of createResponse].

This is where the linking we did comes handy; we simply look for it in the SipApplicationSession.

[Type response.getApplicationSession().getAttribute(“callerSession”)]

[Type createresponse]

And finally we sent it [type send response].

Next…. The doSuccessResponse method. Success response is any response whose status starts

with 2. However here we’re only interested in the OK response, whose code is 200.

[type if-block code]

First thing to do, send an acknowledgement to the callee, otherwise the callee will keep sending

OK to us.

[type send acknowledge]

Then we create the OK response to be sent to the caller…, the same way as we did it for the

TRYING response.

[Type create response].

Also, we copy the message body of the OK from the callee to the OK for the caller. This way ,

during the conversation, the caller will send the voice data directly to the callee’s user-agent.

Finally, and this time I mean it, the doBye(…) method.

Let’ me just type it first. It’s very simple, just a repetition of what we’ve done in the other

methods, we relay the BYE.

That’s all for the second example. Now we’re ready to touch the interesting subject in VoiP

domain: convergence. Looks like many people in this domain are trying to profit from it.

To me convergence is quite an overloaded term. I thought I knew what it mean, just by using the

fact that SIP Servlet application can be viewed as “just another TCP/IP application”, that can be

made to work together with other TCP/IP applications, and a little bit of imagination, tada, we

have “convergence”. That was my initial impression of convergence. However, preparing the

material of this video forced me to evaluate my (old) understanding.

I found an interesting note in a whitepaper published by a company named AudioCodes, where

the author mentions three types of convergence:

- Device convergence

- Network convergence

- and Service convergence.

Here, let me show you the whitepaper. [show the whitepaper on the screen].

Network convergence is about enabling access to a common core IP network through various

types of access networks (such as GSM, PSTN, etc).

Device convergence… uhm… I guess it can be understood as the ability for a single device to use

multiple access networks.

Lastly, service convergence, is more about enabling access to a common service in the core IP

network through various access network, and various devices. Well, uhm, I believe so.

It should be obvious that we, in this video, are more interested in the service convergence. So,

let’s think of a scenario of a “converged service”….. [pause 15 seconds].

Allright, I have to admit I had difficulties finding a real-world example of such service. Well, my

only resource is google, and I couldn’t find anything (weird, heh?). I already asked some of my

friends who live in Japan about this. I thought this kind of thing is already common there. He

commented that for the realizations of “converged services” requires a more advanced

infrastructure, that is 4G, which will be available in Japan starting on 2010. I was like: what?

another waiting? Why? I thought with the availability of 3G networks half of the world problems,

including converged service, have been solved?!

Aaannyway, if we see this in positive light: we’re just in time for that. We start investing time

learning the concepts and the development skills now, and we’ll reap the benefits within one or

two years ☺.

Ok, let’s trackback to the “definition” of service convergence: “access to a common service

through multiple access network”.

Well, if that’s what it’s all about, maybe it’s not that new after all. I mean, you remember WAP? To

my understanding it was introduced to complement its big brother (WWW), by extending the

audiences of web applications to mobile phone users, right? That’s an access to a common service

through multiple access-networks, I believe.

Additionaly, VoiceXML. Many applications with user-interface developed in VoiceXML have been

in operation since 2002, at least. For example back in 2003, I and my friends wrote a VoiceXML

application that lets people listen to the emails in their inbox, on the phone (including from regular

PSTN phone). Can we call it “converged service”? I don’t know for sure, but I know I’m not

content with that.

Search-search-search, and I found a whitepaper from Cisco that suggests the emphasis of service

convergence (nowadays) is on the continuity across access, for customer loyalty and stickiness.

So I guess, the really interesting aspect of converged service – that we need to explore more in our

search of killer services – is the continuity.

Let’s imagine the following scenario: you’re a model employee. Just like any other model

employee, you play massively multiplayer online role playing game in the office. It’s 5 PM, and

now you’re in the middle of a long campaign. But it’s 5 PM, you have to go home. But you can’t

just scrap your adventure, it’s a six-hours worth of works! Oh, dilemma.

But, thanks to service-convergence, you can switch on your old-clunky cellular, and you dial to the

game. Once you’re in, you turn off your workstation and you can leave your desk. Nothing is lost,

everything you had in the session while you’re playing on PC are intact. You continue playing, in

that session, on your way driving home. Isn’t that sweet?

Yeah, I know, the example sounds rather stupid, with no economic value whatsoever (except for

the game publisher). But, I hope got the gist of it. It’s the continuity.

Now let’s try to draw a line between what I just described and SIP Servlet technology. You see,

that continuity of service – especially real-time service – requires a mechanism for sharing or

transferring the data stored in one session to another session (for example, from the http session

to the call session, sip session in this case). This mechanism exists in SIP Servlet technology. That’s

what we’re going to learn in this third, the last, example.

Remember the SIPApplicationSession that we learned in the second example? It binds multiple –

inter-related – SIPSession together. Now, additional fact for you: SIPApplicationSession can

actually have a mix of SIPSessions and HttpSessions, and some others. That is the characteristic

we’re going to learn how to exploit.

So here’s the general overview of the application: The application is composed of two parts, the

web part, and the sip servlet part. The user starts by accesing the web part, that displays a web-

form containing two text fields, and a button, like in this picture.

Pressing the make call button causes the data in the fields to be sent to a plain HTTP Servlet, that

initialized the HTTP session for that user. Additionally, the HTTP Servlet triggers the creation of a

SIPSession by initiating a call to the “destination sip-uri” specified in one of those text fields. Now

we have two sessions, of different protocols, and we bind them together in the same

SIPApplicationSession.

What will happen next is, on the web browser, the page shows the progress of the call. The page

refreshes every one second. It should be something like this:

[show page]

It’s easy to imagine that from the JSP page / HTTP servlet that generates the page, you need to

access the SIPSessions whose progress you’d like to display. It can be achieved easily by navigating

from the HTTPSession to the SIPApplicationSession, and finally you arrive at the SIPSesions you’re

interested in.

Now, let’s turn our attention a little bit to the SIPServlet. This SIP servlet will be doing the B2B UA

role, bridging two call-legs, one is to the “user sip-uri”, and the other one is to “destination sip-

uri”. Once those two legs are bridged the user, on the web page will be presented a page that

contains a button “Drop”, like this.

[show page]

Clicking the “drop” button will cause the call to be dropped.

That is all. Conceptually it’s fairly simple, but this very useful in IP call-center applications, for

example. Now let’s just code it:

[Todo: coding]

As you could see, converged application is still a bit messy in SIP Servlet 1.0. Fortunately it’s been

improved in SIP Servlet 1.1. For example, in converged container you can cast a plain HttpSession

to an instance of ConvergedHttpSession (only available in SIP Servlet 1.1) that gives you a cleaner,

direct path to the SipapplicationSession. Newer concepts in J2EE such as injection by annotation is

also supported, so you can have an easy access to SIPFactory from your J2EE components such as

EJB.

That’s all for the introduction of SIP Servlet programming. Please feel free to post your doubts /

question / correction / suggestions here. The next section will be the introduction to VoiceXML

programming. Zaijian.

transcript for section 4 - intro to sip servlet programming

Documents