Fast string operations



I've been perf testing an application of mine and I've noticed that there
are a lot (and I mean A LOT -- megabytes and megabytes of 'em) System.String
instances being created.

I've done some analysis and I'm led to believe (but can't yet quantitatively
establish as fact) that the two basic culprits are a lot of calls to:

1.) if( someString.ToLower() == "somestring" )

and

2.) if( someString != null && someString.Trim().Length > 0 )


ToLower() generates a new string instance as does Trim().

I believe that these are getting called many times and churning up a bunch
of strings faster than the GC can collect them, or perhaps there's some
weird interning/caching thing going on. Regardless, the number of string
instances grows and grows. It gets bumped down occasionally, but it's
basically 5 steps forward, 1 back.

For reference, this is an ASP application calling into .NET ComVisible
objects. So I assume this uses the workstation GC, right?


Anyhow, so I think that I can solve problem (1) with String.Compare() which
can perform in-place case-insensitive comparisons without generating new
string instances.

Problem (2), however, is more complicated. There doesn't appear to be a
TrimmedLength or any type of method or property that can give me the length
of a string, minus whitespace and without generating a new string instance,
in the BCL.

I suppose I could do some unsafe, or even unmanaged code (which is what MSFT
did for all their string handling stuff inside System.String and using the
COMString stuff), but I'd like to try to avoid that, or at least use a
library that's already written and well tested.

Any thoughts?

Thanks in advance,
Chad Myers


.



Relevant Pages

  • Re: Complex Specified Information - Pitman Formula
    ... Therefore a significant match between a reference and a test ... string is good evidence of non-random production. ... and there are no finite algorithms to compute their digits. ... probabilities of the different symbols the information source can produce. ...
    (talk.origins)
  • Re: String Reference Type
    ... All unary and binary operators have predefined implementations that are ... Therefore its always allocated in the heap and a variable of string ... As with all classes in this case y and x both reference the same String ... language depandant matter as below. ...
    (microsoft.public.dotnet.framework.aspnet)
  • Re: Abstract class variables question
    ... But as I think you've seen elsewhere in this thread, a value type can exist inside a class and in that case the value type is stored in the heap with the rest of the class instance. ... But as far as the "faster" goes, yes...to some extent value types have less overhead than reference types, and so can perform better in certain cases. ... Well, that would be true for a string object too, if there was any way to actually change a string. ... Seriously though, it is practically always the case that when you are writing an assignment to a reference, you're replacing the reference held by the variable. ...
    (microsoft.public.dotnet.languages.csharp)
  • Re: Measurement of pitch
    ... as the method used by the Pythagoreans. ... of these reference units in the quantity to be measured. ... vibrating string seems as good as anything. ... The string or pendulum in question could no doubt be specified exactly, ...
    (sci.physics)
  • Re: Abstract class variables question
    ... I think I understand boxing a little better now. ... the object that is on the heap. ... value types are copied to the heap and made into an object and reference ... String types are already reference types and all we are doing when we do ...
    (microsoft.public.dotnet.languages.csharp)