Re: memory leak?
- From: Joseph M. Newcomer <newcomer@xxxxxxxxxxxx>
- Date: Tue, 31 Jul 2007 15:43:59 -0400
See below...
On Tue, 31 Jul 2007 12:38:55 -0500, "Doug Harrison [MVP]" <dsh@xxxxxxxx> wrote:
On Tue, 31 Jul 2007 12:36:46 -0400, Joseph M. Newcomer****
<newcomer@xxxxxxxxxxxx> wrote:
In the case of the x86 architecture, where sizeof(int) > sizeof(unsigned char) this is
indeed correct. The issue is that in the TMS30C90, with 80-bit character types, this
*would* work, but in fact it is not actually portable.
Though not relevant to what I've been discussing, you're talking about this
example Dan posted:
unsigned char tc= EOF;
unsigned int ti;
ti= tc;
This is absolutely well-defined for all implementations.
Yes, it is well-defined in terms of what happens, but in fact if EOF is a constant which
is out-of-band for all char/unsigned char values (which is how it is done, and the basis
of your argument), then you could not write
if(ti == EOF)
because it has already been truncated at the assignment of tc=EOF. A good compiler (or
one with /W4 or /WALL enabled) would complain about the truncation. It would nonetheless
work, it just wouldn't produce the apparently desired effect of being able to write the
if-statement above. His comment about "doesn't work" makes no sense because he in no way
defines what he expects is defined by "works" and why this does not conform to that
expectation. It may well be the case that the expectation is incorrect.
****
****
Define "doesn't work:", by the way. What does it do? I would expect that ti gets the
low-order 8 bits of whatever value is used for EOF, which would conform to the issue that
something of int precision cannot be stored in something of char precision, which is no
surprise if sizeof(char) < sizeof(int).
Let me break it down. We start off with:
unsigned char tc= EOF;
The conversion of an integer type to unsigned char obeys the congruence
relation that governs conversion of integer types to unsigned integer
types. (The relation works really naturally for two's complement signed
integers, but that's really just happy coincidence. Note that "congruence
relation" implies it's defined over all such types.) If EOF is -1, this
results in the largest value of the unsigned type. Then we have:
unsigned int ti;
ti= tc;
Because sizeof(unsigned int) >= sizeof(unsigned char), ti receives the
value of tc.
So again, this is absolutely well-defined for all implementations, but it's
not relevant to what I've been talking about.
I agree. I was pointing out it was a red herring.
****
*****
However, the question is whether or not something
for which sizeof(char) == sizeof(int) works, given the context of sizeof(int) being able
to represent value that is out-of-band for any possible "character". Doug is arguing a
very formal position of the fact that if you have 80 bits to represent char or unsigned
char that there is no way to represent EOF because it could be one of the 2**80 possible
'char' values. So then the question arises based on the pragmatics of whether or not, in
a practical implementation, any legitimate value that could be returned as a character
value could ever occupy all 80 bits.
We are not talking about "character" values. We are talking about unsigned
char, AKA "byte". Let me emphasize that, because it keeps getting lost:
////////// unsigned char //////////
You are unintentionally arguing that some possible byte values are not
legal, which is absurd.
No, I understand where you're coming from. The question now is, it is it *ever* possible
to create a conforming implementation on a word-addressed machine. Or, alternatively,
would it be possible to create a conforming implementation if the character code was
UTF-32, and sizeof(char) == sizeof(int).
Formally, I agree with you. That since every possible value of the 2**80 bits is legal as
an unsigned char, then there is no potential "out-of-band" value that would be allowed to
represent EOF, because every theoretically possible value is already established. The
question is, whether or not the pragmatics of a given implementation would ever generate a
negative 80-bit value as a character code. So if the limitation is relaxed so that it is
defined not in terms of the raw bits, but in terms of the actual value domain of
characters, then UTF-32 cannot deliver a code point that is a negative 80-bit number, and
for that matter, even if you allowed surrogates, a surrogate might never occupy an 80-bit
number. So the question is are we arguing fine points of C semantics, or can we accept
the pragmatics that no 'char' value can have a negative value?
Is it possible to build a conforming version of C for wide characters where wide
characters might be 4 bytes and therefore sizeof(wchar_t) == sizeof(int)? Is "wide
character" specifically limited to 16 bits or can it be an "implementation-defined" width?
(For the same reason my Unicode book is not handy, my various C manuals are not
handy...I'm about 70 feet from the books and don't want to walk over to get them). What
happens if you have a wide-character fgetc that hits EOF? What value can be used?
If these questions can't be answered, how is it possible to create a conforming
implementation of C that isn't tied to the PDP-11?
By "word-addressed" I mean that all values are stored as n bit words, Every addressing
operation always references a value n bits wide. n >= 8, but more likely, values like
n==32, n==36, n==64 or n==80 are the likely widths. Since the only possible address names
a very wide value, all sizeof() of integer-valued values are the same. In the case of the
TMS30C90, sizeof(char) == sizeof(int) == sizeof(double) == sizeof(float) == sizeof(void *)
== 1. There is one physical representation of a numeric value, or a pointer: an 80-bit
representation. "word-addressed" refers to the fact that there is no finer structure of a
numeric value than a single addressible unit. Thus an x86 is not "word addressed",
because types like an int requires 4 addressible units to represent, and a double requires
8 addressible units to represent. Examples of word-addressed machines include the IBM
7090 and 1130, and the Digital PDP-7, PDP-10, PDP-20, and the TI TMSxxCxx DSP chips. If I
went inside and upstairs and grabbed Bell & Newell on Computer Architectures I could
probably identify a dozen more; these are the ones that come to mind right now. (Note the
7090, PDP-10 and PDP-11 had 36-bit words, and the 1130 and PDP-7 had 18-bit words). There
are a number of military computers that had weird word sizes like 17, 19, and 23 bits,
leading to the joke that if it is a military computer its word width is a prime number
(because of the high cost of memory, a computer was designed with no more instruction and
address bits than absolutely required. The only way you could sneak in enough bits to
handle index registers was to make the case that the code was often something like wired
ROM and therefore could not be self-modifying. Seriously!)
joe
Joseph M. Newcomer [MVP]
This is getting into really fine points of C/C++;
that is, is it ever possible to create a conforming implementation of C/C++ on a
word-addressed machine?
joe
Define "word-addressed". If it means what I think it does, the only
constraint is that the word size is at least 8 bits, which is the minimum
number of bits necessary to represent char.
email: newcomer@xxxxxxxxxxxx
Web: http://www.flounder.com
MVP Tips: http://www.flounder.com/mvp_tips.htm
.
- References:
- Re: memory leak?
- From: Doug Harrison [MVP]
- Re: memory leak?
- From: Joseph M . Newcomer
- Re: memory leak?
- From: Doug Harrison [MVP]
- Re: memory leak?
- From: Joseph M . Newcomer
- Re: memory leak?
- From: Doug Harrison [MVP]
- Re: memory leak?
- From: Joseph M . Newcomer
- Re: memory leak?
- From: Doug Harrison [MVP]
- Re: memory leak?
- From: Dan Bloomquist
- Re: memory leak?
- From: Joseph M . Newcomer
- Re: memory leak?
- From: Doug Harrison [MVP]
- Re: memory leak?
- Prev by Date: Re: Need help resolving modeless dialog issue
- Next by Date: Re: have to read data from Excel
- Previous by thread: Re: memory leak?
- Next by thread: Re: memory leak?
- Index(es):
Relevant Pages
|