News:

Support for jDownloads 3 has been ended
Since 17 August 2023 Joomla.org has discontinued support for Joomla 3.x. Therefore, we will no longer offer official support for our Joomla 3 jDownloads version 3.9.x from January 2024.
Please update your website to the latest Joomla version (Joomla 4 or Joomla 5) as soon as possible. Afterwards, please update jDownloads to the latest published version. The longer you delay, the more difficult the upgrade process for your website is likely to be.

Main Menu
Support-Forum

Bug with encoding filenames to UTF-8 in jD3.2.16 [Fixed in 3.2.18]

Started by Makulia, 14.11.2014 10:35:50

Previous topic - Next topic

0 Members and 1 Guest are viewing this topic.

Arno

Hi,
i had used a server from my hoster for my earlier tests.
And on this i can not change the php version. But i will contact my hoster.

And locally i use only xampp on windows 7. And i am not sure that it is useful to test it on a windows system.  ::)

When you have all this problems, could it be an alternate when we use your russian tranliteration file for your filenames?
Best Regards / Gruß
Arno
Please make a Donation for jDownloads and/or write a review on the Joomla! Extensions directory!
  •  

Makulia

Quote from: Arno on 27.11.2014 12:37:54
Hi,
i had used a server from my hoster for my earlier tests.
And on this i can not change the php version. But i will contact my hoster.

And locally i use only xampp on windows 7. And i am not sure that it is useful to test it on a windows system.  ::)

When you have all this problems, could it be an alternate when we use your russian tranliteration file for your filenames?

Try to setup virtual machine with Ubuntu server or desktop and test it there. It is really awesome to quickly shift between different setups. Also I recommend phpbrew version manager https://github.com/phpbrew/phpbrew for testing different php builds.
I have thought about it as the last option, but rejected. We live in a multilingual environment and sticking to ugly transliteration is not a good way of treating users. Let us see a use case:
1) Users stores all files in there native lang, because not all of users are comfortable with English (Such a pity :( )
2) Users upload files with native names, then download it with transit names. After they need to rename file back to native to make there file-structure consistent. The they modify files and upload them again.

Good example here is wikimedia. This CMS from the start has no problem with Cyrillic file names, Cyrillic seo-links or any other language. I believe it is the problem of implementation, because files itself uploaded to server hdd correctly. Only problem is saving there names into the DB. I think we should use mbstring functions to deal with file names! BTW, have you tried it?

If I can help you in any way with investigating this issue, just let me know. If need be, can discuss this problem on skype.
  •  

Arno

Hi.
QuoteWe live in a multilingual environment and sticking to ugly transliteration is not a good way of treating users.
Exactly. This is the reason why i have implemented this possibility for utf-8 filenames.

QuoteOnly problem is saving there names into the DB. I think we should use mbstring functions to deal with file names! BTW, have you tried it?
I must not try this as i have not any problems to store the (cyrillc) filenames in the DB.   :)
And i have seen that i use not any additional mbstring options in the .htaccess file (different to you). So on my server are only the early posted options used.
I have an answer from my hoster and i can self change my php version in the .htaccess file like this:
- AddHandler php56-cgi .php (when supported from server)
- AddHandler php55-cgi .php (when supported from server)
- AddHandler php54-cgi .php (when supported from server)
- AddHandler php53-cgi .php
- AddHandler php52-cgi .php
But what should i do here since i have not your problems here?
Shall be the target whether i also get your problems when i change to your used php version?  ::)

QuoteIf need be, can discuss this problem on skype.
Sorry but skype is not a choice for me in this case. I can write english passable but it is not good enough for a live talk.  :-\
Best Regards / Gruß
Arno
Please make a Donation for jDownloads and/or write a review on the Joomla! Extensions directory!
  •  

Makulia

Quote from: Arno on 27.11.2014 16:10:49
Shall be the target whether i also get your problems when i change to your used php version?  ::)


Yes, please try with php 5.5 and php 5.6 and tell, if you will have any problems, similar to mine.
See, a lot of setups has php 5.4+ because it is default in ubuntu 13.10 and higher. And it will be good to know, that problem is not in php 5.3 and its mbstring realization.
I have used additional .htaccess mb_string configs just for the test. Without them problem stays the same!
I think, the main goal here is to make Jdownloads as much compatible with different kinds of production environments as possible.
  •  

Makulia

I have made the titanic work  ;) and finally, problem is located!!!

Problem description

It is 100% problem of using php basename function! I am sure now, that you can not replicate this bug because on Windows this seems to work fine, but on some Linux php builds unicode characters where inadvertently removed!

QuoteFrom official php manual: http://php.net/manual/en/function.basename.php

On Windows, both slash (/) and backslash (\) are used as directory separator character. In other environments, it is the forward slash (/).
If the name component ends in suffix this will also be cut off.
basename() is locale aware, so for it to see the correct basename with multibyte character paths, the matching locale must be set using the setlocale() function.

I have tried official Ubuntu PPA as well as Ondřej Surý PPA. Builds from both of them suffer from basename bug. You can 100% replicate this bug if you install Ubuntu 14.04|14.10 on virtual machine and inside it php through apt-get. Way of calling php (mod-fastcgi or mod_php) doesn't matter.
Then I have installed several php 5.4 and 5.5 build through PhpBrew. This builds have not suffer from this bug (because they were build from source code).
So all official Ubuntu php builds installed trough apt-get suffer from this bug!

Doesn't matter:

1) Php version and php.ini settings have absolutely no positive effect.
2) Ubuntu version also doesn't matter.
3) Even DB encoding dosen't matter. I have tested with utf8-general-ci and latin-1. With this bug fix names stores correctly even in latin1_swedish_ci!

Please, look to this topic through google translator.

Topic starter has the same problem with uploading and getting filepath using basename!

Problem solutions

I have tried several solutions. Hear is the results:
1) Do not use default basename function and write our own function


 function basename2($path){
return substr(strrchr($path, "/"), 1);
}
  $this->url_download = basename2($target_path);


2) Use basename function but set locale before it


defined('_JEXEC') or die('Restricted access');
setlocale(LC_ALL, 'C.UTF-8', 'C');

I have tested both approaches and all of them worked!

IMPORTANT!
In order to correctly work with locale we must insert this code both in frontend and backend download handlers!

Final thoughts

Simple googling and reading http://php.net/basename user comments about utf8 and basename turns into understanding that basename function is not UTF8 save.
So if we want to use basename we MUST set userlocale first!
bugs.php.net/bug.php?id=60554c

Quotebasename() features this warning:

basename() is locale aware, so for it to see the correct basename with multibyte character paths, the matching locale must be set using the setlocale() function.

As I understand, you're passing it utf-8 data while setting the locale to something else; this is not allowed.

Quote
There is a real problem when using this function on *nix servers, since it does not handle Windows paths (using the \ as a separator). Why would this be an issue on *nix servers? What if you need to handle file uploads from MS IE? In fact, the manual section "Handling file uploads" uses basename() in an example, but this will NOT extract the file name from a Windows path such as C:\My Documents\My Name\filename.ext. After much frustrated coding, here is how I handled it (might not be the best, but it works):

<?php
$filen = stripslashes($_FILES['userfile']['name']);
$newfile = basename($filen);
if (strpos($newfile,'\\') !== false) {
 $tmp = preg_split("[\\\]",$newfile);
 $newfile = $tmp[count($tmp) - 1];
}
?>

$newfile will now contain only the file name and extension, even if the POSTed file name included a full Windows path.

What do you think of it?

Some additional links for reference:
http://www.sitepoint.com/localizing-php-applications-2/
  •  

Arno

Hi Makulia,
congratulations that you have found find the causer.  8) ;) :) :D ;D
I know that you have spend very much time to find it. Many thanks for your help.

I think i will add your first solution with an own 'basename' function.

But a last question: i use the 'basename' function in many functions from the source code.
Have you replaced for your tests ALL this with the own basename2() function?
Best Regards / Gruß
Arno
Please make a Donation for jDownloads and/or write a review on the Joomla! Extensions directory!
  •  

Makulia

Quote from: Arno on 01.12.2014 10:43:50
Have you replaced for your tests ALL this with the own basename2() function?

No, I have not, but it can be easily done with find replace using IDE or notepad++. Basename includes can be found using something like Total Commander.
But I strongly suggest you to stick with second solution. It is right approach according to php manual! And you don't need to worry about basename2 function work.
http://php.net/manual/en/function.basename.php
Quotebasename() is locale aware, so for it to see the correct basename with multibyte character paths, the matching locale must be set using the setlocale() function.
  •  

Arno

QuoteNo, I have not, but it can be easily done with find replace using IDE or notepad++.
This is not a problem for me.
I would only know what you have doing for your tests.  ;)

QuoteBut I strongly suggest you to stick with second solution.
Quotebasename() is locale aware, so for it to see the correct basename with multibyte character paths, the matching locale must be set using the setlocale() function.
Hm... but what is 'the matching locale must be set using the setlocale() function'. How can i be sure that it is always correctly?
I have no experiences with the setlocale function.

You have use in your example:
setlocale(LC_ALL, 'C.UTF-8', 'C');
Is this setting for all users correctly? Or only in your case?  ::)
Best Regards / Gruß
Arno
Please make a Donation for jDownloads and/or write a review on the Joomla! Extensions directory!
  •  

Makulia

Quote from: Arno on 01.12.2014 11:07:56
Is this setting for all users correctly? Or only in your case?  ::)

I have not tested it, but 99% sure it is, because it is the way, php developers mean this function to work!
You can easily test it. Download some files in other langs and try to upload them.
I will also check this out.

  •  

Arno

Best Regards / Gruß
Arno
Please make a Donation for jDownloads and/or write a review on the Joomla! Extensions directory!
  •  

Makulia

Thank you too! In my opinion, Jdowloads is the best filestorage solution for Joomla. BTW, what about translation? I have PM you my offer :)
P.S. When do you plan to release fixed build of jDownloads?
  •  

Arno

Sorry but i was in the last days very busy with some bug fixes and the new jD content plugin.
I will publish all this very soon (tomorrow?).

QuoteI can translate and send to you .ini files or work through Transifex.
Every help also here is very welcomed. Please use for it the jD 3.x translations group on transifex.
It exist already a russian group so you must not start with null:
https://www.transifex.com/projects/p/jdownloads-s3/language/ru_RU/

Send here a request and i will add you to this team.  8)

Best Regards / Gruß
Arno
Please make a Donation for jDownloads and/or write a review on the Joomla! Extensions directory!
  •  

Makulia

  •