RE: Scalable Text Mining with SQL Server 2005
- From: "Bogdan Crivat" <bogdanc@xxxxxxxxxxxxx>
- Date: Tue, 28 Nov 2006 18:40:07 GMT
One thing you could try to do is to sort the initial query (that returns the documents) and trust the Term Lookup to preserve the order. The number of rows will be smaller (the number of documents and not the number of keywords)
Or, you could stage the Term Lookup results in a SQL Server table then build the model on top of the table
thanks
bogdan
.
Hi,
I have a large dataset of website content that I am trying to classify using the Logistic Regression Algorithm. I am running into a memory issue with the Sort Transform after the Term Lookup. With about 45,000 sites in the source DB the number of rows after term lookup is 6,790,311. The Sort transform is unable to handle this large an input. If I do not use the sort transform, then the prediction query returns an error.
Can anyone help me out on how to make the text mining example provided at http://www.sqlserverdatamining.com/dmcommunity/_tutorials/688.aspx more scalable?
Thanks,
Mohit
- Next by Date: RE: newest data mining techniques?
- Next by thread: RE: newest data mining techniques?
- Index(es):
Loading