Metrics to gather when doing performance exercises for Oracle SOA Suite

In a lot of my performance gigs the one question that keeps coming up a lot is what are you trying to achieve i.e. what is the end goal:

Throughput (# of messages or # completed transactions, TPS)
Response Time
Reliability
Data Resiliency (also known no loss of Data)
Any combination of the above
All of the above?

Once the objective has been highlighted the next question is the diagnostics, what are the logs/data that need to be gathered while doing a performance run to do your performance tuning. This topic was also the subject of my presentation at OOW 2008 which to my surprise fostered a lot of post presentation conversations, so I thought that I should put it up on my blog.

Metrics can be broken down in these main sections:

Box/OS
BPEL/ESB
JVM
DB

Box/OS

Machine where SOA Suite is running

CPU
vmstat
prstat
iostat
VisualGC
top

CPU
AWR Reports
Redo Logs
iostat

BPEL

Collect Performance metrics from the Metrics page --> they can show you bottlenecks and help you troubleshoot.
You can rely on the response time on the BPEL console for Synchronous processes only!
For asynchoronous processes use the CUBE_INSTANCE table in the orabpel schema.

Use the start time of the first asynch BPEL process and end time of all last BPEL process in a given time (completed processes only) --> will give you the time taken to complete
Count the amount of completed BPEL instances
Divide the difference in time/amount of instances to calculate TPS
For better variance calculate the median time difference

Thread dumps if required

Example of BPEL SQL Script for asynchronous processes:

select PROCESS_ID, COUNT, BEGIN_TIME, END_TIME, DURATION_IN_SECOND, (COUNT/DURATION_IN_SECOND) TPS , MEDIAN from (

select count(*) COUNT, process_id PROCESS_ID, max(modify_date) END_TIME , min(creation_date) BEGIN_TIME,(extract(day from max(modify_date) - min(creation_date))*86400+ extract(hour from max(modify_date) - min(creation_date))*3600+ extract(minute from max(modify_date) - min(creation_date))*60+ extract(second from max(modify_date) - min(creation_date))) duration_in_second,median(extract(day from modify_date - creation_date)*86400+ extract(hour from modify_date - creation_date)*3600+ extract(minute from modify_date - creation_date)*60+ extract(second from modify_date - creation_date)) MEDIAN from cube_instance where state = 5 and process_id like <process_name> group by process_id

);

Results of the SQL Script:

Process name	Count	Begin Time	End Time	Duration in seconds	TPS	Median
Process1
Process2

I have not shown the data here, but you can run the above scripts after every performance run to get the TPS for your asynchronous BPEL processes and chart out the above table as graphs (TPS, Median - on Y and Process on the X axis). The graphs will show how your system is behaving after each tuning exercise.

ESB

Metrics from the ESB Console.
Metrics gathered from log.xml - not intuitive but you can get a lot of information from this log for e.g. Time taken for the ESB to complete a transaction.
Monitor the iostat for the ESB process, sometimes based on OS you may get different results on iostat (AIX being the most IO intensive)

Visual GC

VisualGC helps in monitoring how your JVM is behaving and helps in capturing any thread deadlocks. If you see a flat line in your Eden Space while the test is running its usually a sign of a deadlock and its a good time to gather thread dumps at regular intervals to see what is going on in the JVM.

Analysis of Thread Dump

Once you have gathered the thread dumps you can analyse them using a free tool called Samurai.

The red colours are signs of potential deadlocks, by clicking on them you can see where the deadlock is:

Another great tool that has been added in SOA Suite 10.1.3.4 for BPEL is the new statistics page which provides information about the amount of threads that BPEL is using , adapter threads and various other statistics that can help provide a better picture.

All of the above metrics should be gathered regardless of the performance objectives. The OS and DB level metrics can be automated so that they are gathered by scripts. The thread dumps would depend entirely on whether VisualGC is showing any flat lines, while BPEL/ESB stats page. and the BPEL script can provide first hand view of what is happening in your system.

The next question is how do you tune your system - well that ties into the first question of what are you trying to achieve, throughput or response time, since both are inversely related there is always a price to pay, so there is always a choice to be made.

Happy gathering!

3 comments:

Raph said...: Hi Deepak,

your post reminds me lots of souvenirs... :)

Cheers.; November 27, 2008 at 6:28 PM
tjain said...: Hi Deepak,

Your post is very straight forward and has helped me lot. I have developed some experiments using Design of Experiments to fine tune BPEL PM. Any body can get them from by blog.

http://oracle-fusion-middlware.blogspot.com/2008/12/oracale-bpel-process-manager.html

Tushar; December 17, 2008 at 7:48 AM
CK said...: Hi Deepak,
I came across your blog and this interesting article on SOA suite performance, as I just starting to learn SOA suite from Administrator perspective.
From what I read or learnt Response time and throughput are inversely related for a given set of conditions only but not always (for ex zero latency).

Thanks,
Rai; October 25, 2012 at 4:30 PM