Tuesday, November 19, 2013

R and Solr Integration Using Solr's REST APIs


Solr is the most popular, fast and reliable open source enterprise search platform from the Apache Luene project.  Among many other features, we love its powerful full-text search, hit highlighting, faceted search, and near real-time indexing.  Solr powers the search and navigation features of many of the world's largest internet sites.  Solr, written in Java, uses the Lucene Java search library for full-text indexing and search, and has REST-like HTTP/XML and JSON APIs that make it easy to use from virtually any programming language including R.  

We invested significant amount of time integrating our R-based data-management platform with Solr using HTTP/JSON based REST interface.  This integration allowed us to index millions of data-sets in solr in real-time as these data-sets get processed by R.  It took us few days to stabilize and optimize this approach and we are very proud to share this approach and source code with you.  The full source code can be found and downloaded from datadolph.in's git repository

The script has R functions for:
  • querying Solr and returning matching docs
  • posting a document to solr  (taking a list and converting it to JSON before posting it)
  • deleting all indexes, deleting indexes for a certain document type and for a certain category within document type
     # query a field for the text and return docs
      querySolr <- function(queryText, queryfield="all") {
        response <- fromJSON(getURL(paste(getQueryURL(), queryfield, ":", queryText, sep="")))
        if(!response$responseHeader$status) #if 0
          return(response$response$docs)
      }

      # delete all indexes from solr server
      deleteAllIndexes <-function() {
        response <- postForm(getUpdateURL(),
                             .opts = list(postfields = '{"delete": {"query":"*:*"}}',
                                          httpheader = c('Content-Type' = 'application/json', 
                                                         Accept = 'application/json')
                                          ssl.verifypeer=FALSE
                             )
        ) #end of PostForm
        return(fromJSON(response)$responseHeader[1])
      }

      # delete all indexes for a document type from solr server 
      # in this example : type = sports
      deleteSportsIndexes <-function() {
        response <- postForm(getUpdateURL(),
                             .opts = list(postfields = '{"delete": {"query":"type:sports"}}',
                                          httpheader = c('Content-Type' = 'application/json', 
                                                         Accept = 'application/json'),
                                          ssl.verifypeer=FALSE
                             )
        ) #end of PostForm
        return(fromJSON(response)$responseHeader[1])
      }

      # delete indexes for all baskeball category in sports type from solr server 
      # in this example : type = sports and category: basketball
      deleteSportsIndexesForCat <-function(category) {
        response <- postForm(getUpdateURL(),
                             .opts = list(postfields = 
                               paste('{"delete": {"query":"type:sports AND category:', category, '"}}', sep=""),
                                          httpheader = c('Content-Type' = 'application/json', 
                                                         Accept = 'application/json'),
                                          ssl.verifypeer=FALSE
                             )
        ) #end of PostForm
        return(fromJSON(response)$responseHeader[1])
      }
      #deletePadIndexesForCat("baskeball")

      #Post a new document to Solr
      postDoc <- function(doc) { 
        solr_update_url <- getUpdateURL()
        jsonst <- toJSON(list(doc))
        response <- postForm(solr_update_url,
                             .opts = list(postfields = jsonst,
                                          httpheader = c('Content-Type' = 'application/json', 
                                                         Accept = 'application/json'),
                                          ssl.verifypeer=FALSE
                             )) #end of PostForm
        return(fromJSON(response)$responseHeader[1])
        ########## Commit - only if it doesn't work the other way ###############
        #return(fromJSON(getURL(getCommitURL())))
      }

Happy Coding!

24 comments:

  1. Excellent post, this has been extremely useful to me. I work with a lot of Russian language texts, and to make this work with utf-8 characters you will want this as the first line in querySolr()
    response <- fromJSON(getURL(paste(getQueryURL(), queryfield, ":", curlEscape(queryText), sep="")))

    Just thought it might save you or someone else a headache!
    R

    ReplyDelete
    Replies
    1. All Things R: R And Solr Integration Using Solr'S Rest Apis >>>>> Download Now

      >>>>> Download Full

      All Things R: R And Solr Integration Using Solr'S Rest Apis >>>>> Download LINK

      >>>>> Download Now

      All Things R: R And Solr Integration Using Solr'S Rest Apis >>>>> Download Full

      >>>>> Download LINK 9b

      Delete
  2. Thanks, this is a good insight, very useful! We have faced this issue too in other place.

    ReplyDelete
  3. Great post! Thank you for sharing.. Here is a great new course on youtube for beginners and Data Science aspirants. The content is great and the videos are short and crisp. New ones are getting added, so I suggest to subscribe.
    https://www.youtube.com/watch?v=BGWVASxyow8&list=PLFAYD0dt5xCzTQHDhMPZwBoaAXWeVhZzg&index=19

    ReplyDelete
  4. X Frame with Banner Services Company - Businesses, whether large, medium scale or small scale often use X Frame with banners to promote their businesses like new product announcement, sales event, opening of a new branch, new offers and more such promotion-oriented messages. We are online of the leading printing and design company in USA.

    ReplyDelete
  5. Thank you for taking the time to provide us with your valuable information. We strive to provide our candidates with excellent care and we take your comments to heart.As always, we appreciate your confidence and trust in us
    Digital Marketing Company in India

    ReplyDelete
  6. Its fantatic explaintion lot of information gather it...nice article....
    seo company in Chennai

    ReplyDelete
  7. This comment has been removed by the author.

    ReplyDelete
  8. This comment has been removed by the author.

    ReplyDelete
  9. i think your blog is great. thank you for stopping by here. Always great to have new eyes and opinions.
    Email Support

    ReplyDelete
  10. Great Post. Good Luck
    Seems like it will be pretty effective
    https://callpcexpert.com/dell-computer-support-phone-number.php

    ReplyDelete
  11. Hiiii....Thanks for sharing Great information....Nice post...Keep move on...
    SAP HANA Training in Hyderabad

    ReplyDelete
  12. Proses transaksi juga akan di lakukan sangat cepat, proses deposit ataupun withdraw Anda hanya perlu menunggu paling lama 5 menit saja. Minimal deposit disini juga sangat murah hanya dengan Rp.10.000,- saja Anda sudah bisa memainkan semua permainan yang ada disini.
    Jadi kalian tunggu apa lagi ? segera daftarkan diri kalian disini situs poker online.

    Judi Poker Online
    Situs Poker Online
    Agen Poker Online

    ReplyDelete
  13. Thanks for sharing this amazing piece of info, Letting you know we are the Guest Post Blogger, You can send your articles to us. Just have a look at some piece of work.

    Happy New Year Wishes
    108 Names of Lord Ganesha
    Places to Visit in Varanasi
    Top 10 Reasons for Breakups

    ReplyDelete
  14. Hi, I am ELLy Leone is currently working with HP Printer Official which is a top notch company in USA provides HP printer customer service for HP users. We are 24/7 available over the phone, call +1 888-309-0939.

    HP Officejet 5255 Setup
    HP Officejet 5255 Wireless Setup

    ReplyDelete
  15. Setting up Plex on Roku - it’s simple to link your Plex account server to your Roku device..If in any case you still have issues you can call us at our Customer Service Number – 1 888-309-0939. We have an expert team that will assist.

    Activate Plex on Roku
    Plex.tv/link

    ReplyDelete
  16. Compre documentos en línea, documentos originales y registrados.
    Acerca de Permisodeespana, algunos dicen que somos los solucionadores de problemas, mientras que otros se refieren a nosotros como vendedores de soluciones. Contamos con cientos de clientes satisfechos a nivel mundial. Hacemos documentos falsos autorizados y aprobados como Permiso de Residencia Español, DNI, Pasaporte Español y Licencia de Conducir Española. Somos los fabricantes y proveedores de primer nivel de estos documentos, reconocidos a nivel mundial.

    Comprar permiso de residencia,
    permiso de residenciareal y falso en línea,
    Compre licencia de conducir en línea,
    Compre una licencia de conducir española falsa en línea,
    Comprar tarjeta de identificación,
    Licencia de conducir real y falsa,
    Compre pasaporte real en línea,

    Visit Here fpr more information. :- https://permisodeespana.com/licencia-de-conducir-espanola/
    Address: 56 Guild Street, London, EC4A 3WU (UK)
    Email: contact@permisodeespana.com
    WhatsApp: +443455280186

    ReplyDelete
  17. Plumbing & HVAC Services San Diego
    Air Star Heating guarantees reliability and quality for all equipment and services
    Air Star Heating is specializing in providing top-quality heating, ventilating, air conditioning, and plumbing services to our customers and clients.
    Our company is leading the market right now. By using our seamless and huge array of services. Our customers can now have the privilege of taking benefit from our services very easily and swiftly. To cope up with the desires and needs of our clients we have built an excellent reputation. We are already having a huge list of satisfied customers that seem to be very pleased with our services.

    Plumbing & HVAC Services in San Diego. Call now (858) 900-9977 ✓Licensed & Insured ✓Certified Experts ✓Same Day Appointment ✓Original Parts Only ✓Warranty On Every Job.
    Visit:- https://airstarheating.com

    ReplyDelete
  18. It is very helpful and informative blog post. I would like to thankful to you providing such a information I have also have website providing very good information
    internship for web development | internship in electrical engineering | mini project topics for it 3rd year | online internship with certificate | final year project for cse

    ReplyDelete
  19. All Things R: R And Solr Integration Using Solr'S Rest Apis >>>>> Download Now

    >>>>> Download Full

    All Things R: R And Solr Integration Using Solr'S Rest Apis >>>>> Download LINK

    >>>>> Download Now

    All Things R: R And Solr Integration Using Solr'S Rest Apis >>>>> Download Full

    >>>>> Download LINK 6A

    ReplyDelete
  20. methyl pro is it good or bad for health?
    get more information about this product from this site https://methyllifepro.com

    ReplyDelete