RE: Scalable Text Mining with SQL Server 2005



One thing you could try to do is to sort the initial query (that returns the documents) and trust the Term Lookup to preserve the order. The number of rows will be smaller (the number of documents and not the number of keywords)

Or, you could stage the Term Lookup results in a SQL Server table then build the model on top of the table

thanks
bogdan

Hi,

I have a large dataset of website content that I am trying to classify using the Logistic Regression Algorithm. I am running into a memory issue with the Sort Transform after the Term Lookup. With about 45,000 sites in the source DB the number of rows after term lookup is 6,790,311. The Sort transform is unable to handle this large an input. If I do not use the sort transform, then the prediction query returns an error.

Can anyone help me out on how to make the text mining example provided at http://www.sqlserverdatamining.com/dmcommunity/_tutorials/688.aspx more scalable?

Thanks,

Mohit

.


Loading