User talk:Fæ

From Wikimedia Commons, the free media repository
Jump to: navigation, search
Notice The mass upload project Fleuron is running. If you have concerns you can raise them at User_talk:Fæ/Project_list/Fleuron. This project is a large number of crops of 18th century books with printers decorative marks (fleurons) originally identified by a Cambridge University computer imaging project. The only categories affected are 'bucket categories' created for the project, so the known issue of having a small percentage of misidentified images, which may still be in-scope, along with potentially many prints of the same fleuron type should not be an inconvenience. Discussion continues on how best to manage later housekeeping of the uploaded images. -- (talk) 12:44, 31 May 2017 (UTC)
Notice If you are concerned that a category gets flooded with automated uploads, check that a template like {{Disambig}}, {{Photographs}}, {{Categorise}}, {{CatDiffuse}} or {{CatCat}} has been applied before complaining. In the case of my batch upload projects, any category marked this way will not be added to new photographs. -- (talk) 17:16, 26 March 2017 (UTC)
Notice I have enabled two-factor authentication for my account. I have started using OAuth credentials for terminal logins, so I can continue to use old housekeeping and uploading scripts. Refer to Pywikibot OAuth and OAuth registration, it's not as complex as it looks. The BotPasswords alternative does not currently work for Pywikibot. -- (talk) 11:39, 15 November 2016 (UTC)
(Backlog task)

After a recent village pump discussion on Biodiversity Heritage Library duplicate files, the batch upload will continue but I shall be spending some time researching how to map the BHL uploaded jpeg files to their zipped/archived JP2 equivalent pages. If this is possible, then lossless PNG equivalents can be generated or significantly higher quality jpeg versions created than are available from BHL. Keep an eye on User:Fæ/Project list/Biodiversity Heritage Library for the results of investigation in a few weeks some months. 11:57, 30 October 2015 (UTC)

Archives.png

2017

FP Promotion[edit]

High Trestle Trail Bridge, Madrid, Iowa, United States (Unsplash F9o7u-CnDJk).jpg
This image has been promoted to Featured picture!

The image File:High Trestle Trail Bridge, Madrid, Iowa, United States (Unsplash F9o7u-CnDJk).jpg, that you nominated on Commons:Featured picture candidates/File:High Trestle Trail Bridge, Madrid, Iowa, United States (Unsplash F9o7u-CnDJk).jpg has been promoted. Thank you for your contribution. If you would like to nominate another image, please do so.

Cscr-featured.svg

/FPCBot (talk) 13:01, 21 May 2017 (UTC)

FP Promotion[edit]

Army Athletics Long Jumper at The Inter Corps Athletics Competition at Tidworth, Wiltshire MOD 45152793 (cropped).jpg
This image has been promoted to Featured picture!

The image File:Army Athletics Long Jumper at The Inter Corps Athletics Competition at Tidworth, Wiltshire MOD 45152793 (cropped).jpg, that you nominated on Commons:Featured picture candidates/File:Army Athletics Long Jumper at The Inter Corps Athletics Competition at Tidworth, Wiltshire MOD 45152793 (cropped).jpg has been promoted. Thank you for your contribution. If you would like to nominate another image, please do so.

Cscr-featured.svg

/FPCBot (talk) 21:01, 30 May 2017 (UTC)

FP Promotion[edit]

Abandoned bus in San Pedro de Atacama (Unsplash).jpg
This image has been promoted to Featured picture!

The image File:Abandoned bus in San Pedro de Atacama (Unsplash).jpg, that you nominated on Commons:Featured picture candidates/File:Abandoned bus in San Pedro de Atacama (Unsplash).jpg has been promoted. Thank you for your contribution. If you would like to nominate another image, please do so.

Cscr-featured.svg

/FPCBot (talk) 05:01, 11 June 2017 (UTC)

Image hashes - tests on PD US collections[edit]

Category:Faebot identified duplicates

Page watchers may be interested in knowing that I have been looking at image hashing in odd moments in the last week, along with generating SHA1 values for images based solely on the visual image without any EXIF (as we get a lot of non-identical duplicates with altered EXIF). Here's an interesting small sample test which returned 9 pairs of duplicates based on perceptual hashes, starting with 362 images which happen to all be the same height and width (not important, just a way of getting a set containing possible duplicates). The test run took about 10 minutes.

Note, these are non-identical jpeg duplicates, they just look identical but some have quite different file sizes, even though they are the same image resolution. They are not currently 'findable' using any normal tools.

Putting this here as a milestone, with the intention of getting some useful results for maintenance over the next couple of months, in particular for some of my larger historic batch uploads with known duplicate issues. :-) -- (talk) 19:31, 11 June 2017 (UTC)

Continued testing looks promising, so the next step of highlighting duplicates on Commons is starting, with the creation of Category:Faebot identified duplicates. The idea of a non-EXIF SHA1 test has been dropped in favour of relying on perceptual hashes (pHash), which themselves are not now calculated from the full image, but just using thumbnails at 160px width. -- (talk) 02:29, 12 June 2017 (UTC)

Hi Fæ! Have you got some code for the above for anyone interested in implementing it as part of their own batch upload (post-)processing? /André Costa (WMSE) (talk) 09:10, 12 June 2017 (UTC)
Not yet... but you could run similar experiments as the main routine is very simple as I'm using off the shelf open source modules. See the Fleuron experiment at User:Fæ/Project_list/Fleuron#Imagehash, and https://pypi.python.org/pypi/ImageHash. The SHA1 values for stripped EXIF is as simple as re-saving the image using PIL and then using standard hashlib, however I have dropped this option for the moment.
The current experiment is not really reusable for other general mass uploads, as to be realistic in the absence of Commons-wide image hash searches, there must be a way of constraining how you find potential duplicates and this will vary massively by the upload project. In the current experiment I'm starting with Category:Images from DoD uploaded by Fæ (check needed) for 'seed' images, then doing a Commons search against each seed (along with 'memory' to avoid running the same search twice) for images of the identical image size which are also PD US licensed. Currently I discard search returns of more than ~9500 matches, something that I may investigate later. Duplicates were a known concern for the specific sample space, as the DoD changed EXIF data routinely and uploads over the last 5 years swapped from a DoD image library service to an agency, DVIDS, and in that process images have both lost traceability and the EXIF is always different. Other ways of finding likely duplicates would probably have far fewer returns per processing run.
I'll consider writing up the experiment as a project page, depending on what is learned.
My intuitive approach is to establish some case studies and then think about resurrecting our long term discussions about having a system wide imagehash working in the same way as SHA1 can be searched for using the API. Processing 33 million images is entirely do-able, even without being on labs, and working linearly with just one processing thread on a desktop busy with other stuff, I can create useful hashes for something like 100,000 images in a day. Were we to introduce parallel processing using machines suitable for graphics, the throughput can be easily a magnitude larger than the current Wikimedia Commons upload rates even on our busiest days. -- (talk) 09:35, 12 June 2017 (UTC)
Thanks for the explanation. In my case it's a batch upload from a GLAM where I know some of the images have previously been uploaded (without the unique id) but generally included in the category for the GLAM. The pre-existing images both include smaller resolution versions and a few higher resolution ones which had been made available outside of the normal routines. I'll keep an eye on your Imagehash sub-pages and give it a spin in the next upload cycle. Thanks /André Costa (WMSE) (talk) 15:22, 14 June 2017 (UTC)
If you can keep batches to be tested under 10,000 (i.e. the sample size where duplicates might exist, like slicing by author, date or resolution), then my current script can help. See User:Fæ/Imagehash. I'm close to releasing a version, but due to other stuff, tidying the code will wait for another week or so. Some smart person will probably be able to solve the 10,000 pagegenerator search limit, it's something I have not looked at, as I preferred to get some case studies out the door first. :-) -- (talk) 16:17, 14 June 2017 (UTC)

Code has been swapped to using difference hash, based on description at hackerfactor. This may result in slightly faster processing and for searching out near identical matches may be just as useful. pHash might be better at finding near matches, but that's a guess. -- (talk) 13:50, 12 June 2017 (UTC)

Illustrative triple duplicate, showing how unique ID (VIRIN)1 gets lost for PD US military related images. In this case all have a correct {{PD-USGov-Military-Navy}} license and each was created by different Commons uploaders:

1 - hammering the point about using a unique ID. Any collection with consistent unique ID does not need hashes to check future batch uploads as a simple Commons (API or other tool) text search for the ID will highlight non-identical versions already on Commons.

What a great job on this, Fæ... Something like this crossed my mind a while ago, but it never even got to the vaporware stage: it seemed like just too much work. Fantastic job on actually doing it, and pulling it off very well. Cheers, Storkk (talk) 08:59, 13 June 2017 (UTC)
What Storkk wrote. This is amazing and should become a very useful tool. Hats off! De728631 (talk) 13:26, 13 June 2017 (UTC)
Thanks much appreciated. :-). I'm fiddling around a little with the script today, and will be writing up the case studies and project at User:Fæ/Imagehash. I'm travelling a little later this week, but I have started a general routine which looks at hashing any search set of images, once I remove the dead code from it and it seems stable, I'll publish the code. After a few more experiments, I'll probably start a phabricator request to see if it could be realistic to choose one simple open source image hash to make available for all hosted images on Commons. -- (talk) 13:35, 13 June 2017 (UTC)

┌─────────────────────────────────┘
Released my current DoD image testing script at Github. I'll probably release the generic search version in a few days. -- (talk) 22:23, 20 June 2017 (UTC)

Thank you[edit]

Thank you again for your help. I'm actually a big fan, and have edited hundreds of your uploads before adding them to a Wikipedia article, see [1][2][3]. Cheers. Magnolia677 (talk) 21:34, 21 June 2017 (UTC)

Me, too. Wish there were a good automatic categorizer with errors under 50% but I find many illustrations for unillustrated articles. Jim.henderson (talk) 02:19, 25 June 2017 (UTC)
Thanks for the feedback! :-) -- (talk) 12:53, 25 June 2017 (UTC)

Goat figurine[edit]

It has the license information, source, and attribution directly on the page and was cropped from another image that also has all that information on its page, why did I just get templated? The link to the page is right there, it says plain as day "RELEASED UNDER CC Attribution/SHARE ALIKE", I only cropped it but I double-checked and the license information is correctly linked, the first uploader was very thorough, so I'm assuming this was just a pre-coffee mistake or something and removing the template. Seraphim System (talk) 13:23, 12 July 2017 (UTC)

I thought I must have skipped my coffee but it turns out my memory is correct and you are the original uploader. So can you explain what on earth possessed you to template for deletion the cropped version of an image when you, the license reviewer, uploaded the licensing information yourself? Seraphim System (talk) 13:47, 12 July 2017 (UTC)
Sorry mobile phone accident. Tablets and phones are terrible for this. -- (talk) 14:21, 12 July 2017 (UTC)

File:Barcelona (4678045000).jpg[edit]

File:Barcelona (4678045000).jpg (edit|talk|history|links|watch|logs)
Commons:Deletion requests/File:Barcelona (4678045000).jpg EugeneZelenko (talk) 14:41, 12 July 2017 (UTC)

File:Hazy landscape.jpg[edit]

File:Hazy landscape.jpg (edit|talk|history|links|watch|logs)
Commons:Deletion requests/File:Hazy landscape.jpg Castillo blanco (talk) 08:59, 13 July 2017 (UTC)

Notification about possible deletion[edit]

Bundle DR:
Commons:Deletion requests/Files in Category:War art at the Imperial War Museum

Affected:

And also:

Yours sincerely, Labattblueboy (talk) 22:27, 13 July 2017 (UTC)

File:Heroes of the Xxth Century- Petain Art.IWMART165855.jpg[edit]

File:Heroes of the Xxth Century- Petain Art.IWMART165855.jpg (edit|talk|history|links|watch|logs)
Commons:Deletion requests/File:Heroes of the Xxth Century- Petain Art.IWMART165855.jpg Labattblueboy (talk) 22:37, 13 July 2017 (UTC)

File:Heroes of the Xxth Century- Stalin Art.IWMART165852.jpg[edit]

File:Heroes of the Xxth Century- Stalin Art.IWMART165852.jpg (edit|talk|history|links|watch|logs)
Commons:Deletion requests/File:Heroes of the Xxth Century- Stalin Art.IWMART165852.jpg Labattblueboy (talk) 22:37, 13 July 2017 (UTC)

File:Heroes of the Xxth Century- Lenin Art.IWMART165856.jpg[edit]

File:Heroes of the Xxth Century- Lenin Art.IWMART165856.jpg (edit|talk|history|links|watch|logs)
Commons:Deletion requests/File:Heroes of the Xxth Century- Lenin Art.IWMART165856.jpg Labattblueboy (talk) 22:37, 13 July 2017 (UTC)

File:Heroes of the Xxth Century- Lenin Art.IWMART165853.jpg[edit]

File:Heroes of the Xxth Century- Lenin Art.IWMART165853.jpg (edit|talk|history|links|watch|logs)
Commons:Deletion requests/File:Heroes of the Xxth Century- Lenin Art.IWMART165853.jpg Labattblueboy (talk) 22:38, 13 July 2017 (UTC)

Historiae Animalium (1551)[edit]

Hello Fæ, could you please check the automatic uploads by you on Category:Historiae Animalium (1551)? I think all the images are bad cropped, comparing them with their source at archive.org. The way they are cropped , they don't look useful for Commons, maybe they can be reuploaded. Thanks. --UAwiki (talk) 22:42, 13 July 2017 (UTC)

@UAwiki: ✓ Done

File:Jacek Namieśnik 2016.jpg[edit]

File:Jacek Namieśnik 2016.jpg (edit|talk|history|links|watch|logs)
Commons:Deletion requests/File:Jacek Namieśnik 2016.jpg Yann (talk) 13:16, 14 July 2017 (UTC)

"^ldquo,"/"^rdquo,"/"^rsquo," filenames[edit]

I have discovered that there are several hundred files containing "^ldquo,"TEXT"^rdquo," or "^rsquo," strings, like:

File:US Navy 020514-N-2686W-001 The Navy^rsquo,s Moral Welfare and Recreation (MWR) dining facility ^ldquo,The Helmsman Club^rdquo,.jpg

All of these come from mass uploads of U.S. Navy images, and the strings are supposed to be quotation marks and apostrophes. Is it possible to automate the mass-renaming of these files to strip this unruly syntax? Cheers! BD2412 T 03:45, 18 July 2017 (UTC)

I have kicked off a script, it may take a while. It replaces these with the vanilla " and '. I'm also travelling from today, so may not be able to touch it again until next week. search1 search2. A good test file is File:US Navy 020612-N-0275F-002 Damage Controlmen assigned to the U.S. Navy^rsquo,s ^ldquo,Farrier Fire Fighting School^rdquo.jpg. -- (talk) 06:31, 18 July 2017 (UTC)
Excellent - thanks! BD2412 T 11:00, 18 July 2017 (UTC)

File:The Royal Russian family during the First World War.jpg[edit]

Hello. I recently purchased the above image in high resolution and was wondering if it's OK to upload it here on Wikimedia? The IWM non-commercial licence has me confused, even though there are tons of IWM images on Wikimedia. Wolcott (talk) 16:56, 18 July 2017 (UTC)

Yes, the photograph is public domain. The IWM are well known for their persistent copyfraud, calling everything the archive commercial copyright, failing to make any corrections even after the facts have been presented in writing.
The photograph is especially interesting as it was taken days before the assassination, the IWM date probably a few days late. As the date is given in (Russian) Julian form, and is very specific, it looks highly credible.
The photograph was most likely taken in Russia, but by a British officer. Beatty took HMS Lion on 23 June to Kronstadt, based on other records, and it would be pretty easy to track down the exact movements of the Russian Royal family to confirm they inspected Beatty's ship.
I see two arguments for public domain status. The first where we presume a British officer was the photographer, in which case the status is expired Crown copyright under British law. The second would be a Russian official photographer, in which case the work is public domain as is any Russian photograph pre-1917.
If you have a high resolution version, there can be no copyright claim by anyone on it. Faithful reproductions are copyright free and nobody should have any concern about uploading it to Commons. Thanks -- (talk) 06:04, 19 July 2017 (UTC)
Many thanks for your response! The image came with a PDF of the terms & conditions. Normally I don't read such things, but since I bought an image from their collections, that made me hesitate about reusing it on social media as well as on Wikimedia. All this copyright mumbo jumbo from years of using Wikimedia drives me nuts. Thank you for clearing this one up! Wolcott (talk) 12:20, 19 July 2017 (UTC)
Forgot to add, for public domain the IWM sure charges quite a bit of money for high res images. I suppose it's fair considering they have to scan it and then upload it before sending it to the customer via FTP. Though the image I bought indicates under 'Properties' that it was scanned in March 2013. Wolcott (talk) 15:24, 19 July 2017 (UTC)
Yes, I agree with the right to make one-off administrative charges for reproduction of images from the archives. This may include the cost of digitizing originals at research quality. I would support a proposal from the IWM to ask the WMF for a grant to help with digitization for the public benefit. In reality these costs may be as much as 5 quid, anything more is probably profit. Retrospectively attempting to claim that public domain images are now copyright of the IWM, and the IWM should be paid copyright fees for every possible global use, is misleading and direct copyfraud. -- (talk) 10:25, 20 July 2017 (UTC)

Change the category of a list of images obtained by a search request[edit]

Hello,

I need to change the category of around 2000 images actually categorized in "ESA images (review needed)" :

  • The 1107 images obtained by the following search request "Mars Express" incategory:"ESA images (review needed)" must be moved to the category "Photos by Mars Express"
  • The 729 obtained by the following search request Envisat incategory:"ESA images (review needed)" must be moved to the category "Envisat"

Cat a lot seems not to work in such a case. Is it possible to do it with an another tool ? Thank you ! --Pline (talk) 19:14, 18 July 2017 (UTC)

A barnstar for you![edit]

Tireless Contributor Barnstar Hires.gif The Tireless Contributor Barnstar
Thanks for cleaning up all those file names - great job! BD2412 T 00:25, 19 July 2017 (UTC)

File:Rolf Annerberg TFI konferens Kopenhamn 2010-09-15.jpg[edit]

File:Rolf Annerberg TFI konferens Kopenhamn 2010-09-15.jpg (edit|talk|history|links|watch|logs)
Commons:Deletion requests/File:Rolf Annerberg TFI konferens Kopenhamn 2010-09-15.jpg Yger (talk) 12:49, 19 July 2017 (UTC)

File:CabinCookingforMay SJ (14167027521).jpg[edit]

File:CabinCookingforMay SJ (14167027521).jpg (edit|talk|history|links|watch|logs)
Commons:Deletion requests/File:CabinCookingforMay SJ (14167027521).jpg E4024 (talk) 11:28, 20 July 2017 (UTC)

File:Cabin Cooking with Shannon Kiptopeke Virginia State Parks (13248945463).jpg[edit]

File:Cabin Cooking with Shannon Kiptopeke Virginia State Parks (13248945463).jpg (edit|talk|history|links|watch|logs)
Commons:Deletion requests/File:Cabin Cooking with Shannon Kiptopeke Virginia State Parks (13248945463).jpg E4024 (talk) 12:19, 20 July 2017 (UTC)

Batch uploaded pictures out of the hidden Category: Photographs by Isles Yacht Club, partially out of scope[edit]

Stumbled across this by your script batch uploaded pictures from Flickr while browsing through Media needing categories as of 24th February 2017. There are many of "Kids camp days" and some other yacht club internal events that don't seem to have an educational value/are out of scope and some are of quite low resolution too (hidden category from the title wher these are located again). I estimate the number of pictures don't meeting this criteria in the mid three-digit zone and wanted to ask if it's OK for you if I file a mass deletion request for those pictures. --Steinfeld-feld (talk) 20:44, 20 July 2017 (UTC)

File:2.1.16 1 Whalley 47 (23509470374).jpg[edit]

File:2.1.16 1 Whalley 47 (23509470374).jpg (edit|talk|history|links|watch|logs)
Commons:Deletion requests/File:2.1.16 1 Whalley 47 (23509470374).jpg Stopselliese (talk) 22:17, 20 July 2017 (UTC)