|
|
FrankT
|
|
Joined: 16 May 2010 Posts: 14 Wed 26 Jan 2011
|
Fast Download - where is it? |
|
Hello,
I have a database with about 3000 patent families / 12000 patents total. In order to screen and sort these out quickly, I need to have the bibliographic data available. For about 50% I have it already from previous work.
Now when I want to download the bibliographic data, claims, legal status etc., I select all patents (assuming it does not request the data for existing patents and only updates the legal status where applicable. I hope I'm not doing anything stupid this way.
For the first 30 seconds of the download it is "blazing" through and I expect it to finish in about 2 minutes (I'm on a fast 50 MBit DSL line). But then suddenly it stalls, downloads only 5-10 entries every ~10 seconds, which probably takes now a day or two to finish.
Is this intended behavior (which makes MP, or at least my "work flow" unusable)?
What can be done to fix it or what am I doing wrong?
Thanks much,
Frank |
|
|
|
|
Back to top |
|
FrankT
|
|
Joined: 16 May 2010 Posts: 14 Wed 26 Jan 2011
|
|
|
Just a little extra information on the speed:
The queue currently shows 61208 - I assume requests. 16 minutes ago it was at 61660, which means a download rate of 452 requests in 16 minutes or ~28 per minute.
With the remaining 61028 "information pieces" this request will need 1.5 days to complete. MP estimates currently 3h:20 minutes. I think it startet at like 1.5 minutes but continues to go up (which makes me question what the estimation is good for).
Anyhow, I'm hoping desperately for a solution to this, which has been the case for the past 2 years if not longer.
I'm running the very latest version of MP 9.5 on a Core2Duo with 8GB RAM and Windows 7 x64.
Thanks, Frank |
|
|
|
|
Back to top |
|
Mannina Site Admin
|
|
Joined: 06 Jan 2005 Posts: 978 Wed 26 Jan 2011 Location: Marseille
|
|
|
Dear Frank,
Matheo Patent respects the Espacenet Fair Use policy. Consequently, The software is not designed for bulk downloading of Espacenet Patent information.
For very large requests, the speed decreases because of two main reasons:
- your computer is considered as a bulk download robot and Espacenet slows down the download flow
- in order to respect the fair use of Espacenet, Matheo Patent integrates time out in order to send a relevant number of request to Espacenet.
You should consider a download strategy involving less patents in each request. Anyway our advise for large download is to process it out of EU office hours (during the night in Europe) using the fast download option of Matheo Patent. _________________ Regards / Cordialement
Bruno Mannina
Technical Support / Support Technique |
|
|
|
|
Back to top |
|
FrankT
|
|
Joined: 16 May 2010 Posts: 14 Wed 26 Jan 2011
|
|
|
Dear Bruno,
Thanks for the reply. I have one thought and one question:
1) Say everyone respects a "fair use policy" (whatever that means in kb/s) and schedules the downloads to be distributed over a larger period of time by throttling the download speed. How is the total download data volume reduced that way?
Answer: It isn't - or only by a certain percentage of users who are frustrated and give but because they cannot wait that long.
Whether we download the data in 3 minutes and are done, freeing the bandwidth for others, or blocking it for two days at a smaller rate - the volume remains the same. If everyone waits two days there is no reduction in terms of server load and transfer volume at all.
I think the times that the servers are connected through low bandwith has been over 3 or 5 years ago. By throttling the bandwidth you can prevent peak transfer rates from individual users (which is ok) but not significantly reduce the total data volume. Throttling to a data rate an old 56k modem would exceed, however, is certainly not up to todays standards. I wouldn't mind a limitation, but it should be certainly by a factor of 10 or 100 faster that it currently is.
Just out of curiosity - what is the current MP download data rate?
2) You mention the fast download option in MP. Could you explain what it is and how I enable it?
Thanks much, Frank |
|
|
|
|
Back to top |
|
Mannina Site Admin
|
|
Joined: 06 Jan 2005 Posts: 978 Wed 26 Jan 2011 Location: Marseille
|
|
|
Dear Frank,
Quote: | 1) Say everyone respects a "fair use policy" (whatever that means in kb/s) and schedules the downloads to be distributed over a larger period of time by throttling the download speed. How is the total download data volume reduced that way? |
This is the only information about OPS connection service we have actually:
# The maximum traffic volume allowed is approximately 1 Mbit per second. The maximum data volume and connection rates may vary due to operational circumstances.
# It is strongly recommended that data retrieval activities by robots are scheduled at night (19.00 to 07.00 hrs CET) or during the weekend.
Quote: | Answer: It isn't - or only by a certain percentage of users who are frustrated and give but because they cannot wait that long.
Whether we download the data in 3 minutes and are done, freeing the bandwidth for others, or blocking it for two days at a smaller rate - the volume remains the same. If everyone waits two days there is no reduction in terms of server load and transfer volume at all. |
I understand but we can't do more because Espacenet/OPS has choose to limit robot.
Quote: | I think the times that the servers are connected through low bandwith has been over 3 or 5 years ago. By throttling the bandwidth you can prevent peak transfer rates from individual users (which is ok) but not significantly reduce the total data volume. Throttling to a data rate an old 56k modem would exceed, however, is certainly not up to todays standards. I wouldn't mind a limitation, but it should be certainly by a factor of 10 or 100 faster that it currently is. |
Matheo Team works every days on this problem to find the best solution. We try to find the best solution to satisfy all kinds of users (users with big requests and users with small requests)
Users with small requests must have quickly their answer so we let bandwidth 100% etc...
Quote: | Just out of curiosity - what is the current MP download data rate? |
Actually MP, while OPS does not send the robot detection message, the rate is 100%. As soon as the message is detected, MP adds delay between each requests as long the message appears.
With this solution if the message disappears then rate comeback to 100%.
We work hard to add a new method of limitation. This new method adds bandwidth limitation. But actually components that MP uses haven't bandwidth limitation option available.
Quote: | 2) You mention the fast download option in MP. Could you explain what it is and how I enable it? |
It concerns a new method of data downloading from OPS.
Before, MP had an only way to download bibliographics data, one request by patent. Since August, MP has the possibility to download 100 patents and its bibliographics data with one request. It's the "Fast & Light" option. _________________ Regards / Cordialement
Bruno Mannina
Technical Support / Support Technique |
|
|
|
|
Back to top |
|
FrankT
|
|
Joined: 16 May 2010 Posts: 14 Thu 27 Jan 2011
|
|
|
Dear Bruno,
1 MBit/s does not seem to be bad - thats roughly 100 kByte/s. I assume the bibliographic data per patent contains maybe 10k characters (everything, claims, abstract, references, legal status...), which means it would download the data for ~10 patents per second. So even if MP would neglect that it has the information for ~50% of the patents already, my request should be completed within 20 minutes.
As this is obviously not the case, something seems to be wrong!
Just to let you know how the process continued:
I let MP complete the request over night. Unfortuntely it must have stopped downloading - this morning it estimated over a year to complete and no data was coming in anymore. I assume the internet got diconnected and my router got a new IP assigned. So I clicked on skip all. The status bar went all the way to the right - but then nothing. I let it sit for 2h - nothing. It didn't finish up and save the data. Then I wanted to close MP, which resulted in the error message "Could not allocate memory for image". While I had a few applications running, there were 2 GB of RAM free. I could click OK on the error window, but it popped up again whenever I wanted to close MP. Interestingly I could save the project, and then close it.
Now when I reopend it, I noticed some WO applications belonging to a family do not even have a title anymore. Instead of immedeately advancing the progress bar to ~50% (which I guess means that MP realized it has the information already downloaded) it starts from scratch at zero percent and I have lost all the bibliographic data in my database?
On a completely different note, I specified for about 50% of the patents already a pertinence. However, this seems to be lost, but I don't know whether this was during the upgrade to 9.5 or a previous version.
I do have a backup, and I can try with 9.4 again. The problem I have is, I don't have the time to fillde with backups and different versions of MP as my client needs an answer by tomorrow. All I want is a reliable and affordable patent database software.
Can you feel the pain I am in?
Thanks,
Frank |
|
|
|
|
Back to top |
|
Mannina Site Admin
|
|
Joined: 06 Jan 2005 Posts: 978 Mon 31 Jan 2011 Location: Marseille
|
|
|
Frank,
First sorry for the late reply,
Quote: | 1 MBit/s does not seem to be bad - thats roughly 100 kByte/s. I assume the bibliographic data per patent contains maybe 10k characters (everything, claims, abstract, references, legal status...), which means it would download the data for ~10 patents per second. So even if MP would neglect that it has the information for ~50% of the patents already, my request should be completed within 20 minutes.
As this is obviously not the case, something seems to be wrong!
|
You don't have wrong, in fact if OPS lets free access without robot protection this is exactly the time that MP needs to download documents. BUT now OPS limits connection and we can not do anything, just add delay between requests to get a new "session".
Quote: | Just to let you know how the process continued:
I let MP complete the request over night. Unfortuntely it must have stopped downloading -
...
... Instead of immedeately advancing the progress bar to ~50% (which I guess means that MP realized it has the information already downloaded) it starts from scratch at zero percent and I have lost all the bibliographic data in my database?
|
This case is not possible with MP because all information during the download are not saved on the HardDisk.
Quote: | On a completely different note, I specified for about 50% of the patents already a pertinence. However, this seems to be lost, but I don't know whether this was during the upgrade to 9.5 or a previous version.
|
Hum may be the data.mp file has been corrupted...I don't know exactly but if information are missing I deduced that.
Quote: | I do have a backup, and I can try with 9.4 again. The problem I have is, I don't have the time to fillde with backups and different versions of MP as my client needs an answer by tomorrow. All I want is a reliable and affordable patent database software.
|
Using v9.4 is not a good solution because:
- If the database has been done with a 9.5 version then it's not fully compatible
- Several links between MP and OPS do not work anymore
the 9.4 version is obsolete due to the change on OPS webservice.
Quote: | Can you feel the pain I am in? |
I understand and I'm very sorry,
Every days we try to improve MP connection with OPS.
This week, we added a new process to download picture in "Fast & Light" mode. This process a step concerning picture download.
It reduces by 30 the number of link concerning the pre-select of picture.
(this revision is not available, because we wait some information concerning the question below)
So, I have a question, do you think it's necessary to download picture and family members when the user checks "Fast & Light" option ?
I explain:
i.e. If you have a request with 300 patents, then MP do:
3 * 1 link for 300 patents (100 patents by link - biblio data)
300 * 1 link for information concering family members
10 * 1 link to know if picture is available (1 link for 30 patents)
300 * 1 link to get the picture (the number is generally less
So actually for "Fast & Light" option, MP made:
3 + 300 + 10 + 300 = 613 links
so it's yet big.
If you think picture/family can be download later, then
this number can be reduce to 3 links only for 300 patents !
What do you think about the idea to suppress Family and picture ? _________________ Regards / Cordialement
Bruno Mannina
Technical Support / Support Technique |
|
|
|
|
Back to top |
|
FrankT
|
|
Joined: 16 May 2010 Posts: 14 Mon 07 Feb 2011
|
|
|
Dear Bruno,
after returning from my vacation I was wondering if there is any status update - but then I realize that my post apparently did not appear? I wrote a detailed message on your questions - maybe by accident I send it by PM? If not I'll rewrite. Pls. let me know if you got my reply.
Thanks, Frank |
|
|
|
|
Back to top |
|
Mannina Site Admin
|
|
Joined: 06 Jan 2005 Posts: 978 Mon 07 Feb 2011 Location: Marseille
|
|
|
Dear Frank,
I don't receive new message on my PM.
Could you re-send, or email me directly to support at matheo-software.com ?
PS: we work hard to reduce significaly the number of links when MP downloads patents. Actually in the new version of MP (v9.6 not yet available) we work with BandWidth Limit.
And with the mode "Fast & Light", we work actually to suppress Family request and build families with the family-id of OPS directly.
thanks a lot,
Bruno _________________ Regards / Cordialement
Bruno Mannina
Technical Support / Support Technique |
|
|
|
|
Back to top |
|
FrankT
|
|
Joined: 16 May 2010 Posts: 14 Mon 07 Feb 2011
|
|
|
Dear Bruno,
ok, I'll try again:
Quote: | So, I have a question, do you think it's necessary to download picture and family members when the user checks "Fast & Light" option ?
...
What do you think about the idea to suppress Family and picture ? |
The way I work is this:
1) I search relevant patents based on key words and IPC codes. Then I rate them according to their relevance, for which I need the claims, description and - important - the image. Ideally the image of every individual patent and not only the image of one patent within a family that is displayed for all members. This results in maybe 100 patents that are very relevant.
2) I do a citation search on the relevant patents including all family members. Depending on the field I may get say 1000 patents. I rate these again for relevance, for which I again need image, claims and description. I use filters and IPC classes to sort out patens with little or no relevance and work my way through the others.
I may do some other specific searches for inventor, company name or new key words to augment the results with other hits.
3) Repeat step 2) until no new patents are discovered.
This results in a fairly large database but I cannot delete the unimportant patents as they would reappear, but they get a pertinence of 0. This way I can be quite sure to have all relevant patents for a given topics. I use keywords only to a certain extent as I would have not gotten relevant patents this way - some inventors/companies try to hide their inventions by úsing uncommon expressions or IPC classes.
So, for me, images, claims and description is essential for every single patent.
Instead of individually selecting 457 patents, I mark ALL and select to download claims, description and image. I hope MP is smart enough to see what is already on the disk and what is missing and needs to be downloaded?
Three suggestions:
1) It would be nice to have a visual indication whether the displayed image is from the actual patent or an other family member. Maybe a green/red thin border around, or any other color-coded information.
2) Download rate. Maybe this is the same you indicated with bandwidth limit: As I see it, if a user has fast internet access, MP runs into the OPS limitation, and currently MPs only rescue is to wait and open a new connection. The end result is that such a user has effectively a much lower download rate than the 1 Mbit/s. Possible solution: a) Let the user select a bandwidth b) figure out the max. current bandwidth before a stop signal is sent. The latter could be done by starting at 1 Mbit, if one gets a stop cut in half to 0.5 MBit, if this goes well for a while step up to 0.75, if this also goes well step up to 0.875, if not go down to 0.625 etc. Always cut the last increment in half. There are probably other algortihms that work equally if not better. However ,this way MP would download near the maximum allowed bandwidth, which I guess may change according to server load.
3) Sometimes something may get messed up. Patents were merged by acccident to the same family. Or legal status needs to be updated etc. For these cases I would want to tell MP to really download everything and overwrite existing data on the disk. Then there are other cases where I just want missing data to be completed. Could you implement such an option so the user can select whether to just download missing data, or to overwrite existing data?
Keep up the great work Bruno!
Kind regards,
Frank
PS: Different topic: is there an "unmerge" command so accidental merges can be reversed - or what would be the best way to correct any grouping errors in the database? |
|
|
|
|
Back to top |
|
Mannina Site Admin
|
|
Joined: 06 Jan 2005 Posts: 978 Tue 08 Feb 2011 Location: Marseille
|
|
|
Dear frank,
First, thank you for all your feedback and comment, it's really appreciable.
Quote: | 1) It would be nice to have a visual indication whether the displayed image is from the actual patent or an other family member. Maybe a green/red thin border around, or any other color-coded information.
|
Ok, I will see if I can add the information concerning the original patent.
Quote: | 2) Download rate. Maybe this is the same you indicated with bandwidth limit: As I see it, if a user has fast internet access, MP runs into the OPS limitation, and currently MPs only rescue is to wait and open a new connection. The end result is that such a user has effectively a much lower download rate than the 1 Mbit/s. Possible solution: a) Let the user select a bandwidth b) figure out the max. current bandwidth before a stop signal is sent. The latter could be done by starting at 1 Mbit, if one gets a stop cut in half to 0.5 MBit, if this goes well for a while step up to 0.75, if this also goes well step up to 0.875, if not go down to 0.625 etc. Always cut the last increment in half. There are probably other algortihms that work equally if not better. However ,this way MP would download near the maximum allowed bandwidth, which I guess may change according to server load.
|
Please, contact me on support at matheo-software.com, I will give you a link with the v9.6 beta version with bandwidth limit.
We choose to fix the bandwidth to a value.
Quote: | 3) Sometimes something may get messed up. Patents were merged by acccident to the same family. Or legal status needs to be updated etc. For these cases I would want to tell MP to really download everything and overwrite existing data on the disk. Then there are other cases where I just want missing data to be completed. Could you implement such an option so the user can select whether to just download missing data, or to overwrite existing data?
|
We will see how can we improve MP for that.
Todo List + 1
Quote: | PS: Different topic: is there an "unmerge" command so accidental merges can be reversed - or what would be the best way to correct any grouping errors in the database? |
Sorry, actually there is no solution to "unmerge" patents.
Todo List + 1 _________________ Regards / Cordialement
Bruno Mannina
Technical Support / Support Technique |
|
|
|
|
Back to top |
|
Mannina Site Admin
|
|
Joined: 06 Jan 2005 Posts: 978 Tue 08 Feb 2011 Location: Marseille
|
|
|
Quote: | 1) It would be nice to have a visual indication whether the displayed image is from the actual patent or an other family member. Maybe a green/red thin border around, or any other color-coded information.
|
Added, but available only in the next version 9.6 _________________ Regards / Cordialement
Bruno Mannina
Technical Support / Support Technique |
|
|
|
|
Back to top |
|
|
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum
|
|
|