BPMS in production environment

I couldn’t find a better topic than “BPMS (Activos 6.1) in production” to close up the whole series.
Although  ActiveVOS is certainly a cool product there is as usual a space for future improvements. Production environment is something special and as something special it should be treated. If the production environment is down there is simply no business. Empowering the business is the main objective of BPMS, isn’t it? So technology should be ready to cope with that kind of situations. To cut a long story short. Every feature which supports maintainability, reliability, security and sustainability in day-to-day life is highly appreciated.
During the development life-cycle it can be hard (especially in the early stage of the development) to foresee  how the system will be maintained, what the standard procedures looks like, etc. The goal is to mitigate the probability of a process, human or a technical error as low as possible  taking into consideration an ease of problem detection as well.
The following pieces of functionality were found as highly desirable. Some of them are possible to avoid or at least lower the impact  during the design time. For the rest of them some developers’ effort need to be taken into consideration.
  • Different modes of Console – there are no distinct modes neither for development nor production environment. This comes in handy when you need to grant an access to operations for their  day-to-day routine and you don’t wanna let them modify all server settings. For example you just wanna restrict the permission to deploy new processes, start and stop services.
  • Reliable fall over – maybe this question is more on the side of infrastructure. As BPMS fully lives in a DB typical solution consists of cloning a production DB to a backup DB instance. In case of a failure, this instance is started.  If some kind of inconsistency gets into the DB during the crash of a main instance then it is immediately replicated to a backup instance. Does it make sense to start a backup instance?
  • Lack of data archive  procedures – the solution itself doesn’t offer any procedure how to archive completed processes. Because of legal restrictions specific to business domain you are working in you cannot simply delete completed processes.  As your DB grow in size  the response time of BPMS grows as well. You can easily get into trouble with time-out policy. Data growth 200GB per month is feasible. You cannot simply work this problem out by using some advanced features of the underlaying DB like partitioning because you wanna have processes which logically belongs together in one archive. You will be struggling to find out  such a partitioning criteria which could be used in practice and fulfills mentioned requirement.
  • Process upgrade – one of the killer features,  process migration of already running processes to an upgraded version works only in case of small changes of  the process. More over what if your process consumes an external WS which lives completely on its own? What if someone enhance that service and modify that interface? Yea, versioning of the interfaces comes to attention. Having process upgrade feature without versioned interfaces is almost nonsense or at least need a special attention while releasing. Even with versioned interfaces it is not applicable in all situations, eg. sending new data field which presence in the system is not guaranteed.  In large companies this feature is a must. Otherwise it is hard to manage and coordinate all the releases of all connected application.
  • Consider product road map – actually this item belongs to project planning phase where we make decisions about what technology to use. In some environment like banking, insurance etc. there can be legal requirements to have all products from production environment supported by a vendor. If the vendor’s release strategy is a new major version every half a year and support scope is current major version plus two major back than this could pose a problem for a product maintenance team during product life cycle. Migration of all non terminated processes may not be a trivial thing and as such this represents an extra cost.

Testing BPMS component

I do remember a discussion with one of our QA guys regarding BPMS testing I want to share. I was asking QA for requirements on a system and curious what methodology is being used for this component.  The answer I got and  I will probably never forget was: BPMS is a minor part of the system hence we are not supposed test it at all. The motivation behind this article is simply based on fact that this approach wasn’t correct an provide some insight what’s going on. There is no ambition to provide complete methodology or best practices regarding testing of BPMS component. That is the role of skilled QA.
As BPMS is a solution for orchestration of your business services inside the house. Simply it drives the work flow. BPMS isn’t usually a decision maker. Decision making rules are typically required to be flexible, expect frequent changes. It should reflex business changes as quick as possible. So because of that it is not a good practice to hard-code them into processes in a form of “spaghetti code structure” (structure of if-else in several levels) which is error-prone and hard to maintain . Those are reasons for having a separate component responsible for decision making – BRE (business rule engine). So the QA task can be divided into two main objectives for  functional testing. Verify for given input data:
  • all the necessary data  for making a decision present at specified point? This can be difficult because of large amount of incoming path to the decision point. Despite the execution path you are verifying that all the data needed were gathered in the system.
  • based on decision results are the steps actioned in the correct order? Verification of the required business process.
  • are the fault recovery procedures working correctly? Switching the system to fault recovery mode and verification that the system stored all data correctly and data completeness.
For sure there can be more aspects but those are considered as the main ones. The main problem of the testing is that those aspects cannot be tested in isolation. By isolation I mean that you cannot use standard methodologies (e.g. black box, white box, … whatever it is) and point somewhere in the system. BPMS is a system component that has “memory”. That means  you cannot simply arbitrarily divide the process to parts which are you going to test in separation. Some systems can have something like “point of synchronization” (despite the execution path the system has defined data set) but this depends on a design and hence it isn’t mandatory.
Let’s have a look at possibilities. Product itself offers feature called BUnit what is alternative to JUnit in java world. It is feature facilitating process unit testing. All invoke activities within the process are mocked – the xml reply is recorded. Xml manipulation expressions and  gathering data within flow ( aspect 1) can be tested this way by correct choice of recorded data. But the tests are still taking place in artificial conditions. Aspect 3 – testing fault recovery can be tested relatively easily by this approach if there is no awkward decision during design phase. Test analyst is the key role during this process. No need to talk about documentation of the system itself. Unit testing of BRE is completely separate chapter not discussed here.
Having verified basic functionality of the blocks – processes and subprocesses we can continue with integration testing. Usually this kind of systems are systems with high degree of integration so it is really handy to have all back-end systems under your control. Reason no. 1 – data driven system – behavior of your data depends on a data in those systems. Reason no. 2 – BPMS has “memory” (it is “state full”). If you wanna test from certain point in process you have to bring the system in this point. You need to do it repetitively and in a well-defined way. Approach used in web application testing – modification of data in DB to bring the order, application and etc. to certain state is not sufficient here. Having simulators of real beck-end systems was proved as really good practice. This way you simply isolate your system and time to error localization is significantly lower. This way you can conduct integration testing of  bigger functional blocks up to end-to-end testing. There is no doubt that higher level of automation is a must.

BPMS Lesson Learned – Developer’s experience

This part of BPMS series is focused on developer’s interaction with BPMS system, design tools, etc. It describes subjective experiences with ActiveVOS designer version 6.1. At the time of writing this article ActiveVOS v. 9  was announced. I believe that a lot of stuff described in this article were enhanced and the product has made a great step ahead. Here are some key points which were recognized as limitation.
  • Mitigate your expectation regarding BPMS designer IDE. Designer is Eclipse based IDE so for those who is familiar with Eclipse shouldn’t be a problem to start work with. Just expect that not all features are fully elaborated like code completion is completely missing. Xpath, Xquery error highlighting, code formating (even tab ident is completely missing). Message transformation can be really painful from that point of view. It is good that these problems were  at least partially addressed in future releases.
  • Team collaboration is a bit difficult. Not because of missing integration to a version control systems like SVN, CVS, etc. But just simply because of generated artefacts like deployment descriptors (pdd files), ant scripts they all contain absolute paths to files. What simply doesn’t work on different PC. Fortunately this problem is easy to avoid by replacing this absolute path by relative ones.
  • Be prepared that some product feature are not reliable or doesn’t work. As nobody is perfect even ActiveVOS BPMS is not an exception. Just name those which we have to cope with:
    • eventing – On Weblogic application server running in cluster this feature were not reliable.
    • instanceof – Xml processor used by ActiveVOS  ( saxon library )  doesn’t support keyword instanceof used for element identification in inheritance hierarchy.
    • time-out on asynchronous callback “receive” were no reliable – Once it time-out after 5 minutes (required 3 min), next time 1 hour, …

BPMS Lesson Learned – Design

This blog contribution tries to summarize experiences gained when designing BPMS solution using ActiveVOS v. 6.1. and highlight key design decesions. It describes the consequences of these decisions in the context of a bigger picture (impact on production, maintainability, day-to-day routines, …).  As the overall architecture is a set of design decisions which depends on a business context and objectives you are trying to achieve there is no simple copy-paste solution applicable in every situation. Let’s make long story short and have a look at those key areas:
  • BPMS is not a web flow framework! It can be tempting to use a WS-HT interface for each interaction with a client/user. Beside the fact that you have to initiate somehow the interaction with the client ( create a human task) you are not aware of, it has a negative impact on DB space consumption. Every human task has several underlying processes consuming space  depending on a log level and persistence settings but moreover this doesn’t hold any business relevant info. In newer versions this is treated in a better way.
  • Think of your error handling strategy. Used implementation offers feature “suspend on fail” so in case of an error the process is put in suspended state and waiting for a manual interaction. The operator can replay the process. Be very careful with this feature. Overuse of this pattern leads to high database space consumption as a precondition of this feature is a full process logging and persistence enabled. More over what the operator can do with the failing process? In a “properly tested” system  this can happen due to the two main reasons: either data missing or technical reason like service is unavailable. In the first case do you really think that the operator will know your client’s  insurance number somewhere in the system? Certainly no. What about auditing in such a system, that’s another really important question. Fail over to human task seems like a reasonable solution. In the second case, implementation offers a feature called “retry policy”. Taking advantage of this feature you can achieve short-time outage immunity of the system. 
  • Structure your processes to smaller blocks. Although I can understand why people tend to design long processes, experiences prove the opposite. With one long process which realizes the whole business you’ll gain  readability but you’ll lose re-usability, maintainability  and more importantly scalability of the system. Not all the processes and sub-processes have the same level of importance at least from the technical perspective and realized business as well. You can use different levels of process/sub-process logging, persistence and polices such as retry policy. All this has a positive impact on lowering the database  consumption ratio, improving stability and robustness of the system. One important thing which we shouldn’t leave aside is spatial scalability. Every sub-process is just another web service so in case we need to improve throughput of the system we are free to setup a new instance and deploy those processes there. We are absolutely free from the point of infrastructure like load balancing, clustering, …  The only thing we need to keep in mind is that created human tasks are running within “BPEL engine” (no different component). The current version of implementation wasn’t able to read human tasks from more than one “BPEL engine” .
  • Sort your data. In every business domain there are some legal requirements such as auditing and archiving those information for a number of years. Naive approach I’ve met can be: “BPMS holds all the info”. That certainly does. But how the data are structured? Are all the data stored in BPMS relevant to business hence worth of archiving? BPMS is definitively keeping a lot of system information not relevant to business in his own internal structure. E.g. incoming messages as xml objects in variables. That means to get a specific info you would need to find that variable out and within xml object locate that specific piece of information. Moreover this information doesn’t have a time scope guaranteed. If the process doesn’t have full logging enabled then just the latest status is kept and no track of changes. The best approach is to solve this auditing requirements in advance. Store all your audit info into a separate DB schema in a well-defined readable structure. You save a significant amount of consumed DB space and you don’t have to create a “data mining” solution from the BPMS schema. No need to talk about expenses for additional disk capacity.
  • Reason over every feature used in the human task. Human tasks offer a lot of features which support interaction with users e.g. email, task escalation, task notification, etc. Especially pay attention to the task notification feature. Internal implementation is equal to a human task which simply doesn’t show up in the inbox as a new task but only as a notification. It was measured that one human task consumes roughly 1MB. Overuse of this feature can have a big impact on disk space consumed by the DB. The most dangerous thing about notifications is that underlaying system processes are in the running state until the user confirm delivery. Hence it is difficult to delete them from the console. Also associate the notification to the human task feature is missing so there is no other way to cope with the problem then a manual cancellation of the internal system process. This process can be really time-consuming when your system generates three thousand messages over the night.
  • Use “proxy back-end call pattern”. Every back-end system has his own domain model and message structure. It is really tedious and time-consuming to build messages on the BPMS side where the only tools available are Xpath, XQuery, etc. Proven by experience the better and more efficient approach is calling a “proxy method” which is responsible for building the message up, sending the message out and processing the result passed back to BPMS.