Re: Can anyone tell me of any optimations I could do to this to make it faster?
- From: dnncampbell1@xxxxxxxxxxx
- Date: 31 May 2006 09:12:48 -0700
the tables that it grabs the headers from is temporary. I don't have
the rest of the prog wrote yet. it will remove the headers from the db
that are complete for a single post. Also I am only doing specific
groups so that part on the periods is not an issue yet. Will redo that
later mainly just want to get this to work faster at the moment. There
are at least 1 million headers in each table right now if I just pull
from one of them it will take up around 500megs of ram and about the
same for VM. As far as the regex I am not sure what you mean. It is
finding a pattern in the subjects that are unique to each post and vary
in size. If there is a way to make that better please tell me.
Nicholas Paldino [.NET/C# MVP] wrote:
Extremest,
There are a few things I can see you doing here.
First though, I have to ask about your database structure. You are
storing the different headers in different tables with the name of the group
as the table. I don't know that this is necessarily a good idea. The
reason is that all of the tables share the same structure, and they are all
related, the only thing differentiating messages being the group that they
are in.
Because of that, I think that you should have one single table with
messages in them, and add a column which has the name of the group that the
message is in. Of course, the message could be in multiple groups (because
of crossposting). In this case, you would have another table which would
have a group id in it, as well as the name of the table that the message was
in. Doing this, you would then have a record in the main table which had
the message details, as well as another table saying which groups the
message was in.
Doing it like this also fixes an error in your code. You were removing
the periods from the group names in your tables. This brings up the
following situation. Hypothetically, you could have two groups:
alt.my.stuff
alt.mystuff
In your algorithm, they are treated the same way, and are in the same
table. In MySql, you should be able to use some sort of escape mechanism to
allow periods in your table names (something like square brackets in SQL
Server).
Moving on, I would not use regular expressions to perform basic
replacement functions as you are doing. I would use the Replace method on
the string class to do this. I think you will find this MUCH faster. The
same goes for the finding of a string (you match on the subject), as well as
the split functionality. All of this is offered on the string class, and
since you are not using wildcards or patterns, there is no reason to use the
regular expression classes.
When reading from the data reader, you don't have to call ToString. You
can cast the results to string directly.
Finally, I would recommend selecting out all of the messages from all of
the groups out at once, then processing them in order. You can sort the
results by group name, and then process them. This will save you from
having to make repeat trips to the database.
Hope ths helps.
--
- Nicholas Paldino [.NET/C# MVP]
- mvp@xxxxxxxxxxxxxxxxxxxxxxxxxxx
"Extremest" <Extremest@xxxxxxxxxxxxx> wrote in message
news:mchfg.128691$9d7.127189@xxxxxxxxxxxxxxxxxxxxxxxx
I know there are ways to make this a lot faster. Any
newsreader does this in seconds. I don't know how they do
it and I am very new to c#. If anyone knows a faster way
please let me know. All I am doing is quering the db for
all the headers for a certain group and then going through
them to find all the parts of each post. I only want ones
that are complete. Meaning all segments for that one file
posted are there.
using System;
using System.Collections;
using System.Text;
using MySql.Data;
using System.Text.RegularExpressions;
namespace createfiles
{
class Program
{
static MySql.Data.MySqlClient.MySqlConnection conn
= new MySql.Data.MySqlClient.MySqlConnection();
static MySql.Data.MySqlClient.MySqlCommand cmd =
new MySql.Data.MySqlClient.MySqlCommand();
static string myConnectionString = "server=
127.0.0.1;uid=root;pwd=password;database=test;";
static ArrayList master;
static string group;
static string table;
static string[] groups = {
"alt.binaries.games.xbox", "alt.binaries.games.xbox360",
"alt.binaries.vcd" };
static Regex reg = new Regex("\\.");
static Regex seg = new Regex("\\([0-9]*/[0-9]*
\\)",RegexOptions.IgnoreCase);
struct Header
{
public string numb;
public string subject;
public string date;
public string from;
public string msg_id;
public string bytes;
}
static void Main(string[] args)
{
for (int x = 1; x < 2; x++)
{
table = reg.Replace(groups[x], "");
group = groups[x];
getheaders();
Console.WriteLine("Have this many headers
{0}", master.Count);
Header one = (Header)master[0];
Console.WriteLine("first one {0} {1}",
one.numb, one.subject);
find();
master.Clear();
}
}
static void getheaders()
{
conn.ConnectionString = myConnectionString;
conn.Open();
cmd.Connection = conn;
cmd.CommandText = "select * from " + table + "
where subject like '%(%/%)%'";
MySql.Data.MySqlClient.MySqlDataReader reader;
reader = cmd.ExecuteReader();
Header h = new Header();
master = new ArrayList();
while (reader.Read())
{
h.numb = reader.GetValue(0).ToString();
h.subject = reader.GetValue(1).ToString();
h.from = reader.GetValue(2).ToString();
h.date = reader.GetValue(3).ToString();
h.msg_id = reader.GetValue(4).ToString();
h.bytes = reader.GetValue(5).ToString();
master.Add(h);
}
reader.Close();
conn.Close();
}
static void find()
{
while (master.Count > 0)
{
Header start = (Header)master[0];
master.RemoveAt(0);
Match m = seg.Match(start.subject);
string segsplit = m.ToString();
segsplit = segsplit.Replace("(", "");
segsplit = segsplit.Replace(")", "");
string[] segments = segsplit.Split('/');
int max = int.Parse(segments[1]);
max += 1;
int counter = 1;
Header[] found = new Header[max];
string testsubject = seg.Replace
(start.subject, "");
int index = int.Parse(segments[0]);
//int temp = master.Count;
if (index < max)
{
found[index] = start;
for (int x = 0; x < master.Count; x++)
{
Header test = (Header)master[x];
if (test.subject.Contains
(testsubject))
{
//master.Remove(test);
master.RemoveAt(x);
x = x - 1;
Match t = seg.Match
(test.subject);
string tsplit = t.ToString();
string tsegsplit =
tsplit.Replace("(", "");
tsegsplit = tsegsplit.Replace
(")", "");
string[] tsegments =
tsegsplit.Split('/');
index = int.Parse(tsegments
[0]);
//Console.WriteLine(counter);
if (index < max)
{
found[index] = test;
counter++;
}
}
}
//Console.WriteLine("counter = {0}",
counter);
int testmax = max-1;
if (counter == testmax)
{
master.TrimToSize();
Console.WriteLine("We Have a Match
{0}", found[1].subject);
}
}
}
}
}
}
--
----------------------------------------------
Posted with NewsLeecher v3.0 Final
* Binary Usenet Leeching Made Easy
* http://www.newsleecher.com/?usenet
----------------------------------------------
.
- Follow-Ups:
- Re: Can anyone tell me of any optimations I could do to this to make it faster?
- From: Nicholas Paldino [.NET/C# MVP]
- Re: Can anyone tell me of any optimations I could do to this to make it faster?
- References:
- Can anyone tell me of any optimations I could do to this to make it faster?
- From: Extremest@xxxxxxxxxxxxx
- Re: Can anyone tell me of any optimations I could do to this to make it faster?
- From: Nicholas Paldino [.NET/C# MVP]
- Can anyone tell me of any optimations I could do to this to make it faster?
- Prev by Date: Re: Bell Character
- Next by Date: Passing values between forms
- Previous by thread: Re: Can anyone tell me of any optimations I could do to this to make it faster?
- Next by thread: Re: Can anyone tell me of any optimations I could do to this to make it faster?
- Index(es):
Relevant Pages
|
Loading