I have a table .This table has many columns but search performed based on 1 columns ,this table can have more than million rows.
The data in these columns is something like funny,new york,hollywood
User can search with parameters as funny hollywood .I need to take this 2 words and then search on column whether that column contain this words and how many times .It is not possible to index here .If the results return say 1200 results then without comparing each and every column i can't determine no of results.I need to compare for each and every column.This query is very frequent .How can i approach for this problem.What type of architecture,tools is helpful.
I just know that this can be accomplished with distributed system but how can i make this system. I also see in this website that LinkedIn uses Lucene for search .Is Lucene is helpful in my case.My table has also lots of insertion ,however updation in not very frequent.
Hi,
I want to implement a search engine with lucene.
To be scalable, I would like to execute search jobs asynchronously (with a job queuing system).
But i don't know if it is a good design... Why ?
Search results can be large ! (eg: 100+ pages with 25 documents per page)
With asynchronous sytem, I need to store results for each search job.
I can set a short expiration time (~5 min) for each search result, but it's still large.
What do you think about it ?
Which design would you use for that ?
Thanks
Mat
Update: Anatomy of a crash in a new part of Yandex written in Django. Writing to a magic session variable caused an unexpected write into an InnoDB database on every request. Writes took 6-7 seconds because of index rebuilding. Lots of useful details on the sizing of their system, what went wrong, and how they fixed it.
Yandex is a Russian search engine with 3.5 billion pages in their search index. We only know a few fun facts about how they do things, nothing at a detailed architecture level. Hopefully we'll learn more later, but I thought it would still be interesting. From Allen Stern's interview with Yandex's CTO Ilya Segalovich, we learn:
Recent comments
1 hour 3 min ago
1 hour 7 min ago
1 hour 59 min ago
3 hours 27 min ago
4 hours 7 min ago
9 hours 34 min ago
9 hours 56 min ago
10 hours 4 min ago
10 hours 33 min ago
14 hours 36 min ago