Re: WaitForSingleObject() will not deadlock



So EDEADLCK is a condition that can be detected in no more that 2 CPU clock cycles, right?
I'd like to see the EXACT SEQUENCE OF INSTRUCTIONS issued in the locking sequence, and I
would have to compare that with the EXACT SEQUENCE OF INSTRUCTIONS issued by, say, a
CRITICAL_SECTION, before I'd be willing to opine that one is faster than the other.

My issue about the 2 CPU clock cycles is that once the lock is set, you can test the
is-visited flag in two instructions. So the cost of a successful lock must be identical
to the cost of a failed lock, and this has to be compared to the cost of a successful
recursive lock.

One thing I really learned during my 15 years of doing performance measurement is that
without actual live data to back an opinion about performance, any opinion about
performance is completely useless. Only measurements matter. I'm willing to grant that
the non-recursive mutex MIGHT be faster, but without substantiating data and controlled
experiments to validate that data, I have no reason to believe that this is faster than
the use of a recursive mutex and an is-visible flag. And I have no reason to believe that
the recursive mutex is faster, either.

Note the exact sequence of instructions can still be misleading because of issues of cache
writeback required, so a single instruction with a LOCK prefix can be 20-100 times slower
than the same instruction without the LOCK prefix. Maybe. But instruction counts are not
a bad first approximation.
joe
On Thu, 05 Jul 2007 22:59:25 -0700, Frank Cusack <fcusack@xxxxxxxxxxx> wrote:

On Thu, 05 Jul 2007 16:50:31 -0700 Frank Cusack <fcusack@xxxxxxxxxxx> wrote:
I will post code.

cycle detection using non-recursive mutex:

[frank@shell:~]$ cat cycle.c
/*
* cycle.c
*/

#ifndef _REENTRANT
#define _REENTRANT /* Solaris wants this */
#endif

#include <errno.h> /* annoyingly required for Solaris */
#include <pthread.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

struct node_t {
pthread_mutex_t mutex;
int val;
struct node_t *next;
};

static void
detect_cycle(struct node_t *node)
{
struct node_t *head = node;

while (node) {
int rc = pthread_mutex_lock(&node->mutex);
if (rc == EDEADLK) {
/* we've already visited this node */
(void) printf("cycle detected\n");
*****
But how do you know the deadlock is your thread deadlocking ITSELF vs. a bad locking
sequence between two different threads creating a deadlock situation?
*****
exit(1);
} else if (rc != 0) {
(void) printf("pthread_mutex_lock: %s\n", strerror(rc));
exit(1);
}
node = node->next;
}

/* since we traversed the entire list, there is no cycle */
(void) printf("no cycle detected\n");
/* now we have to traverse it again to release all mutexes */
node = head;
while (node) {
(void) pthread_mutex_unlock(&node->mutex);
node = node->next;
}
****
How is a double-traversal "faster" than a single traversal with a recursive lock? Perhaps
I'm missing something in the definition of the word "faster"? And how is this algorithm
more robust than one that can do it with just one pass?
****
}

int
main(int argc, char *argv[])
{
int i;
struct node_t *head, *node;
/* we will create a cycle 9->4 */
struct node_t *node4, *node9;

pthread_mutexattr_t attr;
if (pthread_mutexattr_init(&attr))
exit(1);
/* make sure we create mutexes that detect deadlock */
if (pthread_mutexattr_settype(&attr, PTHREAD_MUTEX_ERRORCHECK))
exit(1);

head = malloc(sizeof(struct node_t));
(void) pthread_mutex_init(&head->mutex, &attr); /* assume success */
head->val = 0;
head->next = NULL;

node = head;
for (i = 1; i < 10; ++i) {
node->next = malloc(sizeof(struct node_t));
if (!node->next)
exit(1);
node = node->next;
(void) pthread_mutex_init(&node->mutex, &attr); /* assume success */
node->val = i;
if (i == 4)
node4 = node;
else if (i == 9)
node9 = node;
}

detect_cycle(head);
/* create a cycle */
node9->next = node4;
detect_cycle(head);

return 0;
}
[frank@shell:~]$ gcc -o cycle cycle.c -lpthread
[frank@shell:~]$ ./cycle
no cycle detected
cycle detected
[frank@shell:~]$

-frank
Joseph M. Newcomer [MVP]
email: newcomer@xxxxxxxxxxxx
Web: http://www.flounder.com
MVP Tips: http://www.flounder.com/mvp_tips.htm
.



Relevant Pages

  • Re: Semi-multithreaded application in VB6
    ... > Those libraries control a motion system and wait most of their time ... The cycle takes about 30 s to ... like your code is generating a series of instructions to the external machine. ... then go back to the send-each-instruction loop. ...
    (comp.lang.basic.visual.misc)
  • Re: [PATCH RFC/RFB] x86_64, i386: interrupt dispatch changes
    ... cycles to the bus. ... LOCK slowness is not because of the bus. ... maybe 150-200 regular pipelined, superscalar instructions. ...
    (Linux-Kernel)
  • Re: AMD CodeAnalyst MASM only?
    ... that limited when instructions could dispatch together. ... can execute instructions out of order, so it is a little more difficult to ... unitused, decode cycle, execute cycle, and retire/writeback cycle. ... Next I have the decode field. ...
    (comp.lang.asm.x86)
  • Re: Teaching Assembly Language Programming
    ... at the description of the "Test and Set"-ish instructions in the Intel ... With only a mutex or a spin lock, ... is acquire lock, increment counter, release lock. ... lets say you have a pointer to the first element ...
    (alt.lang.asm)
  • Re: Compile Barrier
    ... > Barrier addressing issue with instructions reordering (pipelining feature on ... So a barrier is a mechanism for a programmer ... To address this issue compiler writes recognize some functions ... > - a will be written to the main memory before the lock is released ...
    (microsoft.public.win32.programmer.kernel)