In a lot of my performance gigs the one question that keeps coming up a lot is what are you trying to achieve i.e. what is the end goal:
- Throughput (# of messages or # completed transactions, TPS)
- Response Time
- Reliability
- Data Resiliency (also known no loss of Data)
- Any combination of the above
- All of the above?
Once the objective has been highlighted the next question is the diagnostics, what are the logs/data that need to be gathered while doing a performance run to do your performance tuning. This topic was also the subject of my presentation at OOW 2008 which to my surprise fostered a lot of post presentation conversations, so I thought that I should put it up on my blog.
Metrics can be broken down in these main sections:
- Box/OS
- BPEL/ESB
- JVM
- DB
Box/OS
Machine where SOA Suite is running
- CPU
- vmstat
- prstat
- iostat
- VisualGC
- top
DB
- CPU
- AWR Reports
- Redo Logs
- iostat
BPEL
- Collect Performance metrics from the Metrics page --> they can show you bottlenecks and help you troubleshoot.
- You can rely on the response time on the BPEL console for Synchronous processes only!
- For asynchoronous processes use the CUBE_INSTANCE table in the orabpel schema.
- Use the start time of the first asynch BPEL process and end time of all last BPEL process in a given time (completed processes only) --> will give you the time taken to complete
- Count the amount of completed BPEL instances
- Divide the difference in time/amount of instances to calculate TPS
- For better variance calculate the median time difference
- Thread dumps if required
Example of BPEL SQL Script for asynchronous processes:
select PROCESS_ID, COUNT, BEGIN_TIME, END_TIME, DURATION_IN_SECOND, (COUNT/DURATION_IN_SECOND) TPS , MEDIAN from (
select count(*) COUNT, process_id PROCESS_ID, max(modify_date) END_TIME , min(creation_date) BEGIN_TIME,(extract(day from max(modify_date) - min(creation_date))*86400+ extract(hour from max(modify_date) - min(creation_date))*3600+ extract(minute from max(modify_date) - min(creation_date))*60+ extract(second from max(modify_date) - min(creation_date))) duration_in_second,median(extract(day from modify_date - creation_date)*86400+ extract(hour from modify_date - creation_date)*3600+ extract(minute from modify_date - creation_date)*60+ extract(second from modify_date - creation_date)) MEDIAN from cube_instance where state = 5 and process_id like <process_name> group by process_id
);
Results of the SQL Script:
Process name | Count | Begin Time | End Time | Duration in seconds | TPS | Median |
Process1 | ||||||
Process2 |
I have not shown the data here, but you can run the above scripts after every performance run to get the TPS for your asynchronous BPEL processes and chart out the above table as graphs (TPS, Median - on Y and Process on the X axis). The graphs will show how your system is behaving after each tuning exercise.
ESB
- Metrics from the ESB Console.
- Metrics gathered from log.xml - not intuitive but you can get a lot of information from this log for e.g. Time taken for the ESB to complete a transaction.
- Monitor the iostat for the ESB process, sometimes based on OS you may get different results on iostat (AIX being the most IO intensive)
Visual GC
VisualGC helps in monitoring how your JVM is behaving and helps in capturing any thread deadlocks. If you see a flat line in your Eden Space while the test is running its usually a sign of a deadlock and its a good time to gather thread dumps at regular intervals to see what is going on in the JVM.
Analysis of Thread Dump
Once you have gathered the thread dumps you can analyse them using a free tool called Samurai.
The red colours are signs of potential deadlocks, by clicking on them you can see where the deadlock is:
Another great tool that has been added in SOA Suite 10.1.3.4 for BPEL is the new statistics page which provides information about the amount of threads that BPEL is using , adapter threads and various other statistics that can help provide a better picture.
All of the above metrics should be gathered regardless of the performance objectives. The OS and DB level metrics can be automated so that they are gathered by scripts. The thread dumps would depend entirely on whether VisualGC is showing any flat lines, while BPEL/ESB stats page. and the BPEL script can provide first hand view of what is happening in your system.
The next question is how do you tune your system - well that ties into the first question of what are you trying to achieve, throughput or response time, since both are inversely related there is always a price to pay, so there is always a choice to be made.
Happy gathering!
DA