Search fast in million rows

I have a table .This table has many columns but search performed based on 1 columns ,this table can have more than million rows.
The data in these columns is something like funny,new york,hollywood
User can search with parameters as funny hollywood .I need to take this 2 words and then search on column whether that column contain this words and how many times .It is not possible to index here .If the results return say 1200 results then without comparing each and every column i can't determine no of results.I need to compare for each and every column.This query is very frequent .How can i approach for this problem.What type of architecture,tools is helpful.
I just know that this can be accomplished with distributed system but how can i make this system. I also see in this website that LinkedIn uses Lucene for search .Is Lucene is helpful in my case.My table has also lots of insertion ,however updation in not very frequent.

Re: Search fast in million rows

I don't understand why you cannot use an index. To be blunt, without the use of indexes you cannot search through millions of rows quickly. How do you expect a database engine to do that unless everything is laid out in a structured logical order i.e. accessed using indexes?
If funny or hollywood is a category, then you would normally represent these as categoryid's from a related lookup table. Then you would search the database with the where predicate based on a selected categoryid. With a clustered index on this column this would normally be very fast. In Sql Server 2005, I have a huge indexed view with a similar amount of columns and perform such queries in under a second. Without the use of indexes on the source tables and the view however, the query time can go up to 10 seconds! You cannot query millions of rows without indexes and expect high performance, especially if you have anything other than narrow tables containing only ints.

Re: Search fast in million rows

Yes, this would be a good application of an inverted index like Lucene.

Re: Search fast in million rows

Thanks for your reply
In column ,data is like "funny ;hollywood ;fast ;youtube;".
For every column i need to compare each and every word contained in that column .If i use substring kind of function then in that case database don't use index.I want to store tags in column and use it for searching.

Re: Search fast in million rows

you can use "innodb" for mysql or oracle to billion rows but you need some information to use or download from web site innodb

Re: Search fast in million rows

Is there any other solution like Lucene or Hadoop.I heard Lucene is very good in searching but not sure about its scalability.Is we can use Hadoop for online search or it is only suitable for indexing our data offline.

Re: Search fast in million rows

Check out Sphinx search. I think if you can find a way of pushing your keyword values from that column into attribute fields you'll be able to get it to do what you want.

http://www.sphinxsearch.com/

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

Post new comment

The content of this field is kept private and will not be shown publicly.
  • Web page addresses and e-mail addresses turn into links automatically.
  • Allowed HTML tags: <a> <em> <strong> <cite> <code> <ul> <ol> <li> <dl> <dt> <dd><div ?=?><p ?=?> <img ?=?><h1 ?=?><h2 ?=?><h3 ?=?>
  • Lines and paragraphs break automatically.
  • Glossary terms will be automatically marked with links to their descriptions
  • You may link to webpages through the weblinks registry

More information about formatting options

To combat spam, please enter the code in the image.