Friday, June 23, 2006

JLAK’s Law of Troubleshooting

Simply stated, “When a complex system is being troubleshot, an expert in a subsystem always suspects the subsystem he knows least about”.

I arrived at this ‘law’ after repeated observation of this law at play when CNC machine tools were being tested, tuned and qualified. This being a control system with multiple control loops, the effect of one fault would manifest itself in all the variables – jerky motion of the axes, unsteady speed reference to the drive controller, unsteady tacho voltage, unsteady encoder feedback, etc.

This is how the scenario would look:

All the experts have been called because, during the production testing the movement of the axis is found to be uneven and the normal fault tree and troubleshooting chart have not helped.

CNC expert: Have you checked if there is stick-slip? (Being a digital electronics guy he knows a little about the drive electrics. So, he does not question that.)

Drives expert: Are you sure there is no backlash? (Notice that the two suspected causes sound pretty impressive and have a ‘certain something’ (as Asterix would say) about their sound.

The Mechanical expert: Have you tuned the drive?

It would go on and on. Some ‘poor generalist’, who is a down to earth guy, is pottering away at finding the root cause of the problem. Shielding that has come off, a dry solder or whatever would be found and corrected and the system would start behaving itself.

Then all the experts would leave shaking their heads and clucking their tongues muttering “these production guys never do anything right”!

Even in the trivial case of a simple motor controller troubleshooting, the test engineer soldiers on trying to fix the problem all on his own, until he sees a specialist in motors. He immediately starts suspecting all kinds of esoteric problems with the motor. One of the first things the motor man himself would ask is perhaps “is the incoming phase sequence OK?” or “Have you checked the tacho coupling?” – even if the drive is a single phase one or the motor has an integral shaft mounted tacho.-----Contributed by JLAK

Monday, June 05, 2006

A truly "dynamic" issue

I had been on a servicing trip to a remote part of India. There was this Aluminium rod mill, which had one of our thyristor DC drives. The motor operated at very low speeds most of the time and its speed had started drifting. After some observation, all I had to do was change the operation amplifier on the drive control card and things were back to normal. The customer was very happy. With the traditional Indian hospitality made keener by the remoteness of the place, the customer’s maintenance engineer took me to a nearby dam, one of the largest in the world.

By the time we returned from the trip disaster awaited me. (Incidentally, when we were returning, as we crested a small hillock, the engineer said that there must be problem at the plant as the plant had stopped. I asked how he knew. He said he could not hear the sound of the plant. The sound of the plant? Over the roar of a diesel engine jeep? A kilometre away from the factory? I have never been able to fathom this and the subsequent events did not permit to investigate the issue)

Three semiconductor fuses protecting the thyristors had failed and the mill was at a standstill. The customer was very unhappy and concluded that whatever I had done had caused this disaster. In spite of being tired after the trip to the dam site, I started analysing, testing and troubleshooting late into the evening. After much poring over the drawings, I found that there was a serious flaw in the control logic design of the drive.

The drive had dynamic braking facility and one of its basic requirements had been overlooked. When a motor is running and it has to be stopped quickly, dynamic braking is used. The sequence of events is, open the power contactor connecting the drive to the motor, then and only then close the contactor connecting the motor armature to a bank of high power, low ohmic value resistances. However the designed logic sequence had a flaw and, theoretically, it was possible that once in a while, the dynamic braking contactor could come on before the main contactor had opened. When I discovered this and told the maintenance engineer about it, he would not believe me. His argument was simple. This drive has been working without a problem for the last three years or more. Every shift, we turn this drive off at least twice – once at “lunch time” and once at the end of the shift. The drive works three shifts a day. This means that it is turned off 6 times a day. That means it has been, at the least, turned off two thousand times and this has never happened before.

My arguments about probabilities and the change in the time characteristics of the relays and contacts would not move this man. He did not allow me to change the logic sequence until he was satisfied. I had to set up an experiment to prove that this was happening. I was really worried. What if the rare event does not occur for the next two thousand times?

With no other option, I did set up the experiment and as luck would have it, the fault occurred within a short time and small number of trials. But, you had to carefully observe two sets of lamps. A had to go off before B came on. And the time difference is just a fraction of a second. I saw it and no one else did. After another few trials the maintenance engineer’s assistant saw it but not the big man himself. So it went on and on. And finally, it occurred with such a huge overlap (A and B both on) that the big man himself saw it. And then he granted me the permission to change the logic design. I had to be very careful and finally did the change on paper and executed it in the panel too.

The experiments restarted to make sure that there was no repeat. Thus was solved a tricky problem. It is another matter that I had to wait for days to get the replacement parts as Christmas and New Years day intervened. Finally I got the parts and restored the drive and was allowed to get back home.

--JL Anilkumar